Title: AgoRapide, eliminating impedance mismatch through the property stream

Subtitle: Creating, storing and transporting data at the key-value level

Author: Bjørn Erling Fløtten (Trondheim, Norway November 2020)

Updated 31 Mar 2021 with documentation links

Abstract

A traditional RDBMS (Relational Database Management System) presents data at an abstraction level well suited for querying (It can serve "any" query). But it is not so flexible with regard to propagation of data and transformation to and from object oriented representations.
We propose creating, storing and transporting data at the key-value level instead (using the term "property stream"), and to redelegate the RDBMS to a secondary level where it can still ensure data consistency and process queries.
Advantages of our method are: Less impedance mismatch between data storage and object model, easier prototyping, inherent scalability, reduced egress costs of data

An actual working demonstration implementation of AgoRapide in .NET is found at
https://bitbucket.org/BjornErlingFloetten/ARCore
with documentation at
http://ARNorthwind.AgoRapide.com/RQ/doc/toc
This document is found at
http://bef.no/AgoRapide

The author welcomes implementations on other platforms. The demonstration implementation linked to above can be used as a guideline and a foundation for a standardized approach among different implementations.

Text

1. Property stream

"Property stream" in AgoRapide is the concept of how all data is broken down into single 'key and value' pairs which are stored sequentially as plain text format lines and then never change.

This format is, by its very nature, easy to distribute and cache.

It is also easy to convert into an object oriented format, lessening the 'impedance mismatch' issue between databases and object oriented languages

This basic key-value level of storage (the "property stream") is considered flexible enough to not impede on any specific desired implementation on top of core data.

1.1. Property stream, encoding of individual items

Each "line" in the property stream is independent of the other lines.

Example:

dt/Customer/42/FirstName = John dt/Customer/42/LastName = Smith dt/Order/43/CustomerId = 42

We propose a human readable format for the property stream with a minimum of encoding.

1.2. Timestamps

Timestamps are proposed inserted into the property stream by a suitably responsible node according to the time resolution needed.

Example:

Timestamp = 2020-10-29 11:53 dt/Customer/42/FirstName = John dt/Customer/42/LastName = Smith Timestamp = 2020-10-29 11:54 dt/Order/43/CustomerId = 42

A customer was created at time 11:53, an order placed at time 11:54. A resolution of 1 minute is used.

2. Data flow

The proposed property stream is inherently flexible regarding distribution of data.

This means that data can flow in any manner between different nodes in a given implementation.

One typical implementation could look like this:

[FIGURE]

2.1. Core node

The Core node is responsible for the actual data storage. The actual storage is simple files in text format of a suitable size. Storing new data is done by just appending to a storage file. Throughput is therefore quite high. The read load on the Core node is low because each connecting node has to receive the same data only once (because each node caches the received data).

2.2. RDBMS node

The PostgreSQL node subscribe to all or part of the property stream and updates the corresponding database storage. Clients connect to this database again in an ordinary fashion in order to query the database.

2.3. NoSQL node

MongoDB node: Our proposal also makes it possible to use alternatives to RDBMS, like "NoSQL", in parallell with a traditional RDBMS. This is exemplified by the MongoDB node, which has access to the same data, with the same conditions, as the PostgreSQL node.

2.4. Processor node

The Processor node exemplifies how impedance mismatch is eliminated.
It implements its own in-memory database which provides ordinary object oriented access to the data. That is, the property stream is converted into ordinary objects with properties. The in memory database is "always" up to date, that is, the Processor node has no need for making further queries towards the Core node once some data is received.
This node may have a full or a partial subscription to the property stream.

2.5. API node

The API node also implements its own in-memory database. This means that it is always updated and can serve API queries directly from memory. The API clients receive ordinary objects, in for instance JSON format.

2.5.1. API node, reading

In our working demonstration implementation of AgoRapide the queries can

1) Peek directly into the in-memory database like

http://yoursite.com/RQ/Customer/42

or

2) They can be more SQL-like:

http://yoursite.com/RQ/Customer/WHERE FirstName = John/SELECT FirstName, LastName/ORDER BY LastName, FirstName

2.5.2. API node, writing

The API node can also accept new data. In its simplest form this is just a property stream line given directly in the URL like:

http://yoursite.com/Add/Customer/42/FirstName = John

2.5.3. API node, schema and security

Note how the API implementation can be constructed without ANY application specific code.

By using the proposed Property access ( IP ) mechanism and just mirroring the subscribed property stream, it automatically exposes the Property stream to the outside via HTTP and JSON / HTML.

This is ideal for prototyping. As the need for more granular client security arises sensitive data can be cut off from the API altogether by just changing the node's Subscription parameters.

An RDBMS node can then be introduced, which subscribes to any sensitive data, with correspondingly established security mechanisms used for client access.

2.6. Command node

The Command node communicates with external devices (for instance in an IoT (Internet of things) scenario).

It injects incoming data to the property stream through the Core node. It can also send commands to the external devices.

It only has to subscribe to a miniscule amount of the property stream in order to know how to route messages (in order to know basic mappings between internal and external ids for instance). Like the Processor node it has no need for making queries whenever it does some local processing.

2.7. Cache node

The Cache node is just a common collector used by further downstream nodes, in order to reduce load on the Core storage node.

2.8. Other nodes

Some examples of further downstream nodes are:

The Backup node subscribes to the whole property stream and just stores it locally (as a backup copy).

The Log node and the Management node are hints of how administrative tasks can be performed by listening to the property stream, do log filtering and issuing configuration orders for instance.

Note that multiple log nodes can be started ad-hoc whenever needed, that is, plugged into the property stream with the relevant subscription.

3. Routing and caching of data

The property stream is inherently very easy to route ("as easy as routing water in a building").

Since it is also very easy to store locally, and therefore cache between node shutdowns, a typical node never has to ask twice for the same data. This drastically minimizes egress costs, something which is becoming more of a concern with modern cloud architecture.

4. Cardinality

In order to reduce number of relations we propose a cardinality concept in order to represent some common structures:

4.1 Cardinality, individual items

Example, individual items:
Values are supposed to be set and 'deleted' individually.
Typical example could be Customer/PhoneNumber
The PropertyStream / API-calls should look something like this:

Customer/42/PhoneNumber/90534333 // Add number Customer/42/PhoneNumber/40178178 // Add number Customer/42/PhoneNumber/90534333.Invalid = 2020-03-20 // 'Delete' individual number Customer/42/PhoneNumber/Invalid = 2020-03-21 // 'Delete' all numbers

4.2 Cardinality, whole collection

Example, whole collection:
Values are always set as a whole (there is no need or no meaning in setting or deleting individual items)
Typical example could be PizzaOrder/Extra
The PropertyStream / API-calls should look something like this:

PizzaOrder/42/Extra = Pepperoni;Cheese;Sauce // Set whole collection. PizzaOrder/42/Extra/Invalid = 2020-03-20 // 'Delete' all items.

5. Property stream line prefix

The proposed property stream format is very simple and low level. It actually looks a lot like an MQTT message.

Since the format can be used to carry not only data but also communication in general, we propose the use of prefixes as follows:

We propose to use the prefix "dt" for data, like

dt/Customer/42/FirstName = John

and "cmd" (Command) for instructions, like

cmd/Device/43/TurnOn

This is already established industry practice, for instance in the MQTT world.

We also propose the prefixes 'app' (Application state / logging) and 'doc' (Documentation), that is, to include exposing of application state, logging and documentation in the basic stream propagation.

This enables mixing of content in the same stream, meaning that communication infrastructure and storage mechanisms can be utilized for more than just storage and dissemination of the core applicatioin data.

Our working demonstration implementation of AgoRapide for instance exposes internal application state in the property stream. This simplifies debugging and management of individual nodes in a given system.

See also PSPrefix and ExposingApplicationState

6. Subscription

The inherent flexibility of the property strema actual encourages multiple nodes, each processing a small subset of the total data.

We therefore propose a system for subscription

A subscription consists of one or more lines as follows:
First character is either + (plus, add this data) or - (minus, filter out this data).
Then the actual hierarchical level is specified, with * (asterix) used as wildcard.

Example:

+* // Subscribe to everything +dt/Customer/42/* // Subscribe to all properties related to customer with id 42 +dt/Customer/*/FirstName/* // Subscribe to all occurrences of Customer.FirstName +log/* // Subscribe to everything beginning with log -log/API: // Do not include properties beginning with 'log/API/ in subscription

See also Subscription and Subscription syntax

Whenever the client connects to an upstream node, it sends its subscription request:

Example:

SubscriptionAsRequestedByClient/+* SubscriptionAsRequestedByClient/-log/*

(Subscribe to everything except log data.)

6.1. Client update position

In order for the server to know from where (actually from when) in the property stream to start sending data, the client has to specify a "client update position". This is ordinarily just a key given by the server, and sent with every new property stream line of the connection.

We propose to separate the client update position and the actual property stream line with a :: (double colon)

Example:

File00042,414546::dt/Customer/42/FirstName = John

7. Tagging of property keys (schema) and property access

The property stream itself and the corresponding storage mechanism is not aware of, nor dependent of, any schema.

For in-memory consumption of data however, and exposing of data through APIs, some schema and standardized object structure is desired.

We propose the term "tagging of property keys" in order to specify a schema for the data in the property stream. We propose that all schema data related to a given key shall reside "together" with the definition of the key. This ensures at-a-glance assession of the key's usage throughout the application.

8. Meta properties

In order to ensure tracking of changes we propose the use of meta properties as follows:
Created: Timestamp when created in database.
Cid: Creator id: Creator id (entity which created this property).
Valid: Timestamp when last known valid.
Vid: Validator id, that is entity which set Valid for this property.
Invalid: Timestamp when invalidated / 'deleted'.
IId: Invalidator id, that is entity which set Invalid for this property

Note that "Invalid" is proposed to be equivalent to SQL DELETE or SQL UPDATE (set value to NULL). Standard transformations for the other meta properties are not proposed in this document.

See also Meta properties for more details.

9. Conversion from property stream format to SQL

A traditional RDBMS can be positioned as a "subscriber" to the property stream.

Each property stream line will be converted to SQL as follows:

9.1. Conversion, ordinary properties

Ordinary change of property:

Customer/42/FirstName = John UPDATE Customer SET FirstName = John WHERE CustomerId = 42

(alternatively, "INSERT INTO Customer (CustomerId, Name) VALUES (42, 'John')" if this is a new customer)

9.2. Conversion, meta properties

Meta-property [Invalid] result in DELETE or UPDATE of the corresponding key.

Example:

Customer/42/Invalid = 2020-10-28 15:11:00 DELETE FROM Customer WHERE CustomerId = 42 Customer/42/MiddleName/Invalid = 2020-10-28 15:11:00 UPDATE Customer SET MiddleName = NULL WHERE CustomerId = 42

10. Transactions

Transactions can be supported by inserting SQL-style BEGIN, COMMIT / ABORT into the property stream, together with a corresponding transaction id, to signal start and end of transactions. Every property stream line (each data point) belonging to that transaction would then be tag'ed with the transaction id.

A node (client) seeing a BEGIN would then know to wait for a COMMIT / ABORT with the same transaction id, before considering the belonging data points for further processing.

Example, moving money from one account to another:

Transaction/123abc456def/BEGIN Transaction/123abc456def/Account/42/Subtract = 1000 EUR Transaction/123abc456def/Account/43/Add = 1000 EUR Transaction/123abc456def/COMMIT

(note that in the actual property stream, other data may be interspersed with the lines shown above.)

11. Security

This document does not describe security in any detail.

If secure access to the basic property stream is necessary we envisage that off-the-shelf solutions can be used, like SSL with certificates for instance.

For secure client queries (when more than just a single set of common administrative credentials are needed) we have proposed to use established stacks, starting from an established RDBMS node (which subscribes to the Property stream).

12. Scalability and fault-tolerance (redundancy)

Note: Apache Kafka already offers a property stream based fault tolerant and scaleable storage mechanism.
For advanced needs, as of the time of writing (2020) the author recommends to use Kafka as the storage mechanism, and connect AgoRapide nodes to Kafka (the subscription paradigm will be the same).
Key-value databases like DynamoDB and Cosmos DB are also well suited as the storage mechanism for advanced needs.

We envisage the AgoRapide concept as inherently scalable. Data can for instance be sharded easily, by using corresponding Subscription parameters.

Creation of new processing nodes as the load increases, and corresponding termination as the load decreases, can be done through a dedicated management Node. The configuration commands can themselves be part of the property stream.

13. Containerization

Each node is very simple in its construction. Our demonstration implementation for instance uses no external functionality (libraries) except what is found in .NET Standard.

This should make the AgoRapide concept ideal for containerization.

14. Other ideas

The documentation for our working implementation of AgoRapide contains some more ideas.

See especially AgoRapide concepts or, start reading from documentation root level

Note that the documentation referenced throughout in this document is itself created with the help of AgoRapide. In this case as static .HTML-files, but it could just as well have been an online API: If you download the AgoRapide working implementation, from https://bitbucket.org/BjornErlingFloetten/arcore and start the application ARAAPI, you will see this same documentation exposed through an online API peeking directly into the documentation database.