Under the hood

The Stream Analyze Platform provides easy and interactive development, deployment, and evolution of applications that process and analyze real-time streams of data in distributed and edge environments. The platform allows non-expert programmers such as analysts and engineers to interactively change, query, and define mathematical models on the fly, without the need for deep programming skills. User friendliness and functionality are provided without any performance loss. This is achieved by combining advanced database and computer algebra optimization with state-of-the-art dynamic compilation into native binary code. Altogether, this makes the platform uniquely powerful and efficient for developing, deploying, and evolving intelligent analytics models (including machine learning) on edge devices of any kind and size.

This page provides an extended overview of the Stream Analyze Engine, the kernel system powering the platform. The SA Engine has a highly optimized architecture, making it independent of operating systems and other software, resulting in an extremely small and efficient footprint.

Explore the following sections to understand the architecture of the SA Engine and its role in enabling the robust performance of the Stream Analyze Platform.

1. Introduction

The Stream Analyze Platform provides interactive search and analysis of large data streams in real time, directly on devices without relying on the cloud. Streams that are produced by sensors and other data sources on mobile or edge devices can be analyzed on-line interactively. An edge device can be, e.g., an Android unit, a desktop computer, a Raspberry Pi, or MCUs such as ARM Cortex M0/3/4/7 or RiscV. This is possible since the kernel of the platform, the Stream Analyze Engine, has a very small footprint (from 17kB to 8MB depending on configuration) and is hardware and OS agnostic and fully independent of any third-party software. 

The combination in SA Engine of a main-memory database, a computational engine, a data stream processor, and an inference engine running on edge devices allows edge analytics directly on the edge devices rather than the contemporary approach of first uploading all data from devices to a server and then do the analytics centrally on a server or in the cloud. The approach to process data directly when and where it happens allows for drastic data reduction [1]. Reducing the amount of data streamed to a central location (e.g. the cloud) substantially improves scalability across large fleets of edge devices. Then, only the data required to perform device management and population analysis needs to be streamed from the edge devices. 

Stream Analyze Engine systems can be configured as stream servers for collecting and combining data from edges and other systems. For example, an analysis model in SA Engine on some edge device can do computations that detect strong vibrations based on real-time readings from an accelerometer. When strong vibrations are detected, a data stream containing the frequency spectrum of the vibrations along with the geographical position of the device is transmitted to a stream server. If the stream server receives many such streams at the same time from devices in geographical positions close to each other, it may indicate an earthquake. The stream server may furthermore forward the received data to other systems, e.g. for permanent central storage or batch analysis.

To analyze streaming data interactively on a very high and user-oriented level, SA Engine allows analysts and engineers to develop computations, filters and transformations over data using a computational query language called OSQL (Object Stream Query Language) [18], which seamlessly extends queries and updates in the standard query language SQL with functions, numerical objects and data streams for defining numerical models over streaming data. Using computational queries, computations and filters over real-time streaming data are defined as mathematical formulas and expressions, called stream models. A stream model is a set of definitions of mathematical functions, filters, and other computations over streams of measurements. Different kinds of models, algorithms, and engines can be combined through such computational queries. The models are specified non-procedurally on a very high level without deep programming knowledge; the user specifies what to do rather than writing detailed programs expressing how to execute the models. The user need not worry about details on how to efficiently implement analysis programs. Instead, an advanced query optimizer together with a dynamic compiler generates optimized native binary code for the devices on-the-fly [13]. The non-procedural specifications of queries and models over dense multi-dimensional arrays [14] are shown in [20][21] to be at least as efficient as corresponding manually coded computations in C/C++, while being specified as domain-oriented formulas with at least 1/60th of the code volume compared to C/C++.

SA Engine includes a library of over 1000 predefined OSQL-functions for math/stat computations, object stream filtering and transformation, signal processing, model and data management, and much more. The function library is continuously extended for new customer needs, and it is easy to define and deploy new user functions on-the-fly.

Common machine learning algorithms such as DBSCAN, DenStream, k-NN, k-means, and random forests, are available as predefined OSQL models and the user can easily extend them with other algorithms defined as queries or as foreign functions accessing external libraries. Learning and inference are supported on both edges and servers. Inference and computations require efficient representation of multi-dimensional arrays used in models and queries. To support this efficiently, the internal array representation in SA Engine is compatible with many other systems working with dense arrays including NumPy and other BLAS based systems. This compatibility makes SA Engine fully interoperable with TensorFlow Lite and OpenVINO, including hardware-accelerated delegates, as shown in our vision-pipeline-demo¹.

Machine learning requires pre-processing of sensor data before applying the learned inference algorithm, followed by post-processing of the inferred knowledge. With SA Engine, both pre- and post-processing are easily expressed using its powerful computational queries.

SA Engine allows users to efficiently process dense sensor readings (including audio and video), numerical computations, and machine learning including neural networks. This is enabled by including in SA Engine built-in data types and operators for arrays. They provide very efficient processing of vectors, matrices, and tensors. Using SA Engine, a neural network and its inference and training are specified using queries, which are optimized and compiled into binary code [21]. For example, neural network models defined by Tensorflow/Tensorboard are represented as dense arrays and functions over them. When SA Engine executes a neural network, the weights are stored in its main memory database as dense arrays. For inference the network is executed as binary code, e.g. over local sensor readings.

The approach allows training of machine learning models including neural networks on edge devices2. Thus, centrally trained models can be further retrained and modified on edge devices to adapt their behavior to their environment3.

SA Engine is independent of other systems, while at the same time providing powerful interoperability mechanisms to enable tight integration with other software on a variety of hardware platforms. This agnosticism has enabled us to port the system to many different hardware and software environments. It has enabled us to scale down the system to run on small edge devices. The architecture provides a mechanism to keep track of all running SA Engine systems. Since the number of edge devices can be massive, the system can scale out to run in many SA Engine copies on large multi-cores, clusters, and clouds. It is always the same SA Engine kernel software running in all these system configurations.

For interoperability of the Stream Analyze Platform with other systems and infrastructures, SA Engine is tightly embedded in several common programming languages, such as Python [17], C [1][12], C++ [15], Lisp [19], and Java [16]. This enables existing algorithms and libraries implemented in those languages to be plugged into the system as foreign functions. The foreign functions implement OSQL functions in an external programming language using language specific application program interfaces (APIs). They can then be transparently used in queries and expressions. For example, a large library of basic mathematical, statistical, and machine learning algorithms are implemented as foreign functions in C and Lisp. New foreign functions can easily be developed.

A query that continuously computes or filters measurements in a data stream is called a continuous query, CQ. The Stream Analyze Platform allows analysts to interactively specify CQs for continuously analyzing measurements flowing through edge devices and stream servers in real-time. The result of an OSQL CQ is a real-time object stream of processed and filtered measurements, for example a CQ returning a stream of the position vectors of a device measured every second when it is close to a given geo-position.

Each SA Engine instance includes an object-oriented main memory database, which is used to store both stream models and user data. Tables stored in these databases can be queried and updated using standard SQL. The local databases are important for data stream processing, which usually involves matching in real-time fast flowing stream objects against data in a database. For example, to locally determine that the frequency spectrum of a measured vibration may later destroy an edge device, the frequencies measured by vibration sensors on a device are matched against a local database of known resonance frequencies of the device [8].

To combine object streams from several edges, the Stream Analyze Platform supports fusion queries that combine object streams [25][26]. An example of a fusion query is a CQ observing when several edge devices in an area detect strong vibrations at the same time to detect anomalies. The user is alerted when the fusion query produces results, perhaps together with a real-time visualization stream of the magnitude of the observed vibrations. A user can then interactively send new queries on-the-fly to the affected devices to find out details of their causes. Fusion queries require the integrated data streams to be comparable even though the involved object streams from different devices may represent the same or similar data in different ways. For example, one device may represent temperature in Fahrenheit while another one uses Celsius.

To be able to combine such heterogeneous data streams from different devices, the Stream Analyze Platform allows mediator models [5][10] to be defined as queries and functions that harmonize arriving heterogeneous object streams by transforming them to a universal model (ontology) in stream servers that integrate data streams from different edges. Mediation can be, e.g., mapping local names of sensors to a universally known nomenclature, measurement unit conversions, or calibrations of local measurements.

To access external data streams, SA Engine provides wrappers, which are APIs that enable processing incoming data stream objects as they arrive to inject them into the SA Engine kernel, so that the accessed streams can be used in CQs. The wrappers are defined as functions that return streams of objects from wrapped data sources. There is a library of predefined wrappers to interoperate with common data infrastructures such as relational databases through JDBC and data processing systems through Kafka, Azure IoT Hub, CANBUS or MQTT. New wrappers can easily be developed⁴.

SA Engine comes in two different versions:

• The SA Engine Prime system provides all capabilities needed for distributed real-time stream analytics. It requires 8M of RAM in its smallest configuration. Large volumes of edge devices and handled by scaling out large federations of distributed SA Engine Prime stream servers.

• The SA Engine Nano system is a scaled down version of SA Engine that in its smallest configuration can run on very small devices, as it requires only 17kB of RAM and 350kB of flash. SA Engine Nano needs occasional assistance from an SA Engine Prime system, called an SA Engine Twin. An SA Engine Twin stores meta-data about the devices it represents and compensates for their missing functionality.

[1] https://youtu.be/Nbuj76ZQd6Q

[2] https://youtu.be/lWGi6ixIrKs

[3] https://youtu.be/RBOX2P7-3L4

[4] https://studio.streamanalyze.com/docs/reference/osql-ref/external-data/

2. Stream Analyze Engine Prime

The SA Engine Prime system provides general data analytics and inference capabilities to the device or computer where it is running. It is designed to be easily integrated with other systems and extended with models and plug-ins. The figure below illustrates the SA Engine Prime system and how it is extensible by interfaces to external systems and data.

SA Engine Prime

User and client software (in green) interact with the system by specifying non-procedural low code queries and models that express computations, filters and other transformations over data from external systems and hardware.

A transparent no-code interface allows users to alternatively specify queries and models graphically with isomorphic one-to-one mappings to textual queries.

The kernel system (in red) provides general data analytics and inference capabilities to the device or computer where it is running. The kernel system is written in C99 and has no dependencies of any external software or intellectual property. It is agnostic to the hardware, OS, and communication infrastructure used by the device or the computer. This makes SA Engine portable to many kinds of devices, computers, and OSes.

A central component of the kernel system is a combined computational, inference, data stream, and database engine. The engine can process machine learning, neural networks, and other computational models over both streaming and stale data. An important component of the SA Engine Prime system is a main memory database where analysis models and device meta-data are stored. Advanced computational and AI models require efficient queries and computations of array data, and therefore the engine provides very efficient processing of multi-dimensional arrays.

The queries and models received from clients are first optimized using an advanced query re-writer and optimizer that generates execution plans, which are optimized programs that do computations over data streams. The execution plans are in turn fed into a dynamic native compiler that generates native machine code on-the-fly. This yields a highly customized program that often executes faster than manually coded C. The compiler generates binary code on-the-fly for many architectures including x86-64, ARM-T2 and AArch64.

The system is designed to be easily integrated with, and fully embeddable in, other systems by defining plug-ins (in blue). Various kinds of plug-ins can be added without changing other parts of the system.

A data producer is software or hardware that produces streams of data to be processed by SA Engine. Examples of data producers are sensors, microphones, cameras, or data streams produced by other software systems.

A data consumer is software or hardware that consumes streams of data computed by SA Engine. Examples of data consumers are actuators, loudspeakers, visualization systems, or other software systems processing data streams such as MQTT or Kafka.

The data producers generate inflowing streams of data injected into the SA Engine system through customizable source wrapper interfaces. Notice that only one such source wrapper needs to be implemented for each kind of incoming data stream; once implemented for a certain kind of stream all such streams can be queried and analyzed with OSQL. For example, our CANBUS wrapper⁵ supports any specific CANBUS API.

Analysis models and queries transform, filter, and make computations and inferences over the source streams to produce object streams fed into the data consumers through customizable sink wrapper interfaces⁶. New sink wrappers for new kinds of data consumers can easily be plugged in⁷.

In distributed configurations of SA Engine, the data consumers are located on other computers or devices than the kernel, in which case some communication infrastructure is used for the communication. There are such interfaces implemented for common infrastructures such as CSV⁸, JSON⁹, or binary streams over, e.g., RS232¹⁰, HTTPS¹¹, WSS¹², TLS/TCP¹³, MQTT¹⁴, Kafka¹⁵, and Azure EventHub¹⁶. Analogously, the outflowing streams may be sent to other systems using a communication infrastructure. Large distributed systems of SA Engine instances can be configured in which SA Engine peers to produce object streams consumed by other SA Engine peers as source streams. On edge devices, the stream elements can furthermore be sent directly to actuators mounted on the device.

Extensions to SA Engine are implemented using foreign functions in some programming language where it is embedded [1][12][15][16][17][19]. The foreign functions can implement external algorithms, e.g. numerical, statistical, and inference algorithms as well as interfaces to other systems and libraries such as databases and AI inference engines such as TensorFlow Lite. The extensions can be plugged into the kernel without modifying it.

Foreign functions written in C, C++, and Lisp have full access to the kernel system allowing very powerful addition of capabilities to the kernel with very low overhead, for example to access file systems, making OS system calls, running deep learning inference engines, or accessing complex database managers. For high performance, the extensibility of SA Engine permits the direct representation of binary C data structures as OSQL objects without any transformations. For example, the internal representation of arrays is compatible with NumPy, enabling very fast access arrays by single memory references. See details in [17].

Source and sink wrapper interfaces are defined as foreign functions where source wrappers emit object stream elements received from data producers into the SA Engine kernel, while sink wrappers emit stream elements computed by SA Engine into data consumers. The wrapper functions usually take arguments, for example to represent the identity of each stream it wraps. Often the wrapper functions are implemented as foreign functions, but in many cases, they are defined completely in OSQL. For example, standard sensor interfaces are often accessible by a file interface, which is provided out-of-the-box in SA Engine.

Different kinds of streams often represent their elements using different physical data structures, so the data stream wrappers often convert between the external data representations to internal SA Engine data formats. When needed, new physical data representations can be plugged in for highly efficient customized representation of C data structures.

There is a built-in library of data stream wrappers for common infrastructures, e.g. for Kafka, Azure IoT Hub, MQTT, CVS and JSON streams.

A data stream is infinite when it originates from a sensor. However, a data stream can also be finite. For example, SA Engine includes a JDBC data stream wrapper that handles finite results from SQL queries passed as a wrapper function parameter through JDBC. This wrapper is used for persisting peer meta-data in the nameserver. Similarly, there is a wrapper interface for MongoDB databases using its query API [9].

[5] https://studio.streamanalyze.com/docs/guides/modules/can-bus/docs/getting-started/

[6] https://studio.streamanalyze.com/docs/guides/modules/mqtt/

[7] https://github.com/streamanalyze/sa.mqtt

[8] https://studio.streamanalyze.com/docs/reference/osql-functions/csv/

[9] https://studio.streamanalyze.com/docs/reference/osql-functions/json/

[10] https://studio.streamanalyze.com/docs/guides/general-guides/using-arduino-uno-as-sensor/

[11] https://studio.streamanalyze.com/docs/reference/osql-functions/http/

[12] https://studio.streamanalyze.com/docs/reference/webhook-api/

[13] https://studio.streamanalyze.com/docs/reference/sa_lisp/#sockets

[14] https://studio.streamanalyze.com/docs/guides/modules/mqtt/

[15] https://studio.streamanalyze.com/docs/guides/modules/kafka/

[16] https://studio.streamanalyze.com/docs/guides/modules/azure-event-hub/

3. Stream Analyze Engine Nano

The SA Engine kernel system can be scaled down to run directly on small devices with limited or no OS support. The scaled-down version of SA Engine is called SA Engine Nano. The figure below illustrates the architecture of SA Engine Nano.

SA Engine Nano

To substantially reduce the need for RAM and other resources, the query re-writer and optimizer as well as the dynamic native compiler are removed from SA Engine Nano. Instead, those components are running in a separate SA Engine Prime system, called a SA Engine Twin system. An SA Engine Twin system includes a main memory meta-database that represents information about one or several devices running SA Engine Nano systems. Queries and models to be executed on these SA Engine Nano systems are sent to the SA Engine Twin system for query processing, optimization, and native code generation. The generated execution plan including native compiled code are then sent to the underlying SA Nano system that processes the query.

Since the SA Engine Nano systems include their own combined local inference, computation data stream, and database engines, they can run autonomously without being connected to their twins or other SA Engine systems in the federation. They only need to be connected when receiving new queries and models from the federation from their SA Engine Twin systems or when data is to be delivered northbound to the federation.

Each SA Engine Nano system has its own main memory local database to store received execution plans, neural inference networks, and other local data. The local database enables to represent trained data inferred while running the received queries and models over the streams. Summaries of these data can be delivered to the federation and further combined whenever the SA Engine Nano system has access to its SA Engine Twin system.

The smallest configuration of SA Engine Nano requires only 17kB RAM and 350kB flash. Additional memory is needed for representing the local main-memory database and other software running on the device. A typical small configuration of SA Engine together with an application¹⁷ therefore may require 50kB of RAM and 400kB of flash. The model can e.g. combine mathematical pre- and post-processing in OSQL with a neural network model [20] running on the device. SA Engine Nano can execute such models autonomously without being constantly connected to its SA Engine Twin system.

The scale down is possible since the core system is fully self-contained and agnostic to the surrounding hardware, OS, and communication infrastructure. In its most basic configuration SA Engine Nano requires no OS; it can run on bare metal. SA Engine Nano can run completely stand-alone on a device or computer without communicating with other systems.

Usually, SA Engine Nano is embedded in some application running on the device (Figure 3), for example a real-time motor controller¹⁸. The application program or system in which SA Nano is embedded can run in the same process and address space as SA Engine. This is usually the case when running an embedded SA Engine on an edge device.

The embedding system (e.g. a motor controller) send sensor readings to SA Engine Nano using the source wrapper interface. The computations and inferences made by the embedded SA Engine Nano system is emitted to the embedding system using the sink wrapper interface.

Full SA Engine Prime systems can also be embedded on devices where there is sufficient RAM available.

[17] https://www.youtube.com/shorts/YKS9I_aeNOc

[18] https://www.youtube.com/shorts/YKS9I_aeNOc

4. Architecture of Cloud Components

The Stream Analyze Platform needs to handle large scale edge analytics with a need for handling thousands or millions of devices. Such a highly distributed system requires careful design and implementation considerations [24]. The Stream Analyze Federation Services (SAFS) is such a reference implementation utilizing web scale tools and platforms such as Kubernetes and nginx. For persistent storage of edge state, a scalable relational database is used, such as PostgreSQL or MSSQL. SAFS is deployable on any Kubernetes cluster, on prem, in house, or fully managed.

SA Federation Services (SAFS) is made to accommodate:
• a large number of connected edge devices,
• a large number of users,
• a large number of models managed by the users across the edge devices.

Furthermore, SAFS provides functionality for the users to interactively query the connected edge devices as well as managing model deployment on fleet of devices.

SAFS is deployed on any Kubernetes cluster in its own namespace. The Kubernetes cluster can be hosted anywhere, e.g. on a public cloud, a proprietary cloud, on-prem, and on any Kubernetes.

All ingress into SAFS goes through an nginx reverse proxy service. Such traffic includes edge devices contacting the SAFS on a regular basis and users interacting with the SAFS and its registered edges. There is no egress from within SAFS unless a user explicitly streams data out to 3rd party systems e.g. for storage or further processing.

Edge devices with SA Engine installed connect to a SAFS through nginx using mutual TLS for communication. They authenticate themselves with the SAFS using a public/private key pair. The certificate authority for mutual TLS and the edge authentication can be configured according to user requirements.

Users get their own hosted SA Studio¹⁹ instances in a SAFS. It can be accessed by either using a browser, VS Code²⁰, or SA Engine’s command line interface²¹. SAFS normally authenticates the user using an externally provided authentication system such as Microsoft Entra ID. Other authentication services can be plugged in as well.

[19] https://studio.streamanalyze.com/docs/guides/products/sa-studio/intro/

[20] https://studio.streamanalyze.com/docs/guides/products/sa-vscode/intro/

[21] https://studio.streamanalyze.com/docs/guides/products/sa-cli/intro/

5. Query optimization

The distributed query processing and dynamic native code generation of SA Engine is documented in [13]. The optimization of queries and models involving dense numerical arrays is documented in [14] and benchmarked in [20]. See [21] for a detailed walk-through of the query processing steps when processing an optimized neural network with tensors fully defined in OSQL. This section provides a summarized description of the query optimizer. The steps of the query processor in SA Engine are illustrated by the figure below:

Query optimization

First a parser and type checker transform the original OSQL query or function²² into an equivalent partially procedural flat OSQL query where there are annotations added that restrict further transformations and where expressions have been unnested. The query rewriter then transforms in several steps the flat OSQL expression into an equivalent rewritten flat OSQL expression, which is simpler and better suited for further optimization. The query optimizer then transforms the rewritten flat OSQL query into a procedural execution plan. The execution plan can either be directly interpreted by the execution plan interpreter or the execution plan compiler can generate binary code for execution on the target processor. The execution plan interpreter can call the compiled binary code allowing to intermix compiled and interpreted plan fragments.

The query optimizer does the following to execute the query:

1. When the query has been parsed, a type checker infers the static types of all intermediate variables and detects type errors. This is necessary to generate a statically typed execution plan. In cases where the static type of a function call cannot be statically determined, the DTR (Dynamic Type Resolution) operator provides optimizable processing of late bound function calls in queries [3].

2. After type checking, the query optimizer generates a naïve flat OSQL query. The naïve query follows exactly the semantics of array queries [14], which is to first produce an empty array for the given shape, rank and format and then filter the query condition in the where clause. Thus, in the naïve flat OSQL query, after the empty array has been created, follows predicates that specify how the elements in the array are computed. All variables used in the naïve query are typed according to the static variable declarations in the original query. Furthermore, nested expressions in the query are unfolded.

3. The query rewriter reduces predicates in a flat OSQL query by applying general logical and computer algebra rules. In general, five kinds of rewrites are applied iteratively on predicates until they cannot be further reduced:

a. To enable the query optimizer to apply its optimization techniques also on subqueries and functions called in queries, most subqueries and functions are in-lined, i.e. substituted for their definitions.

b. Overlapping predicates are identified and bound to variables by unification [1].

c. Expressions that never change over time are evaluated at query optimization time once and for all by the query rewriter and replaced by its value, i.e. partial evaluation. For example, the expression sqrt(2 * 3.1416) is replaced with 2.5066.

d. Special rewrite rules are applied on certain expressions in the query, e.g. array [14] and numerical expressions [6] are translated into predicates that are further reduced and combined with other predicates and expressions accessing the local database are rewritten [23]. Furthermore, specific computer algebra rules are applied based on knowledge about mathematical functions. For example, for multiplication and addition these rules are applicable: x+0->x and x*1->x.

e. Redundant predicates whose values are not used anywhere are removed, a form of dead code elimination for queries [1].

4. After the query rewriter has reduced the original query, the query optimizer will reorder and transform the predicates in the query to produce a fully optimized and statically typed execution plan based on a cost model. Cost-based optimization of multi-directional foreign functions [7] utilizes knowledge about how to invert foreign functions to substantially improve the performance of numerical expressions [3] and index accesses [22]. The execution plan is a procedural program where the order of execution of operations is explicitly specified. Since optimizing large queries is expensive, the performance of the cost-based query optimization is substantially improved by the preceding rewrites. The language for representing execution plans in SA Engine is called SLOG (Stream Logic).

5. After the cost-based query optimization post-optimization rewrite rules are applied on SLOG expressions to further improve the performance of the execution plan.

6. The fully optimized execution plan is either interpreted or handed over to the execution plan compiler, which immediately generates corresponding byte code followed by specific binary code generators for each underlying hardware architecture. The interpreter will call compiled sections of the execution plan.

See [13][14] for further details on the query processing. There is a detailed walk-through how tensor functions are optimized in [21].

[22] A function is a parameterized query.

6. Compiling the execution plan

The execution plan compiler translates SLOG execution plans into machine code for various architectures. It can compile many of the SLOG operators, e. g. arithmetic on integers and reals and vector/array operations. It scans the SLOG code and when it accepts a sequence of predicates, it replaces them with a compiled code object, which the interpreter knows how to call. Compiled parts of the execution plan are thus interleaved with interpreted parts. To the SLOG interpreter, a compiled code object is just another operator.

In a binary code generation phase, the compiled code objects are translated to native binary code for the hardware architecture where it is executed, i.e. the server or edge where a query fragment is running. The SLOG compiler generates device-independent compiled code objects (using a generic byte code assembler), which is then transcribed into native machine code on target device on-the-fly. The native code generator has support for many different MCUs and instruction set architectures.

The approach to generate native binary code for target devices allows to generate very efficient code for many hardware platforms. It is because the query optimizer can produce a well typed and optimal SLOG execution plan. Using the approach, many AI and ML models can be completely compiled into native binary code with a execution speed close to or faster than C-speed implementations [20].

7. Summarizing the benefits of our approach

By utilizing rewrite rules, cost based optimizations, and partial evaluations, SA Engine can produce neural networks that are as efficient on a single core as the most efficient C code written to date [20][21]. With the benchmark [20] we show that a 1D convolutional neural network defined in 27 lines of OSQL and automatically optimized by our query optimizer is at least as fast as using the C++-based neural network runtime having thousands of code lines in the highly optimized TensorFlow Lite delegate XNNPack for efficient implementation of the convolutional operators²³. In [21] there is a detailed rundown of how the query is optimized and finally compiled to become as fast as the fastest C-implementations.

Since SA Engine is both embeddable and extensible it is often trivial to access hardware acceleration on devices. For instance, as can be seen in our vision-pipeline-demo²⁴, we can access a Google Coral TPU for inferences utilizing our efficient integration with Python in less than 14 lines of code. Accessing a GPU using TensorRT is just as easy. When working with embedded systems such as ARM Cortex-M55 combined with Ethos U55, we utilize ARM’s TensorFlow Lite for microcontrollers to access the accelerator. You can write your own wrappers of other acceleration hardware to expose it to OSQL.

This is possible since the SLOG language is very simple with very few operators, so the compiler can keep a relative low time complexity for compiling. This allows for quick on-the-fly compilations and easy porting to new architectures. Doing the logical optimizations starting from a declarative representation allows us to do much more than what is generally possible with imperative programs. Internally the execution plans are represented as S-expressions in Lisp [13], which is excellent for metaprogramming and makes it easy to add new rewrite rules which substantially improve performance [21]. Using a declarative computational query language based on an objects algebra enables us to do rewrite rules that are not available elsewhere. The rewrite rules transform logic expressions in execution plans making it easy to implement powerful transformations that substantially improve performance.

[23] When counting the lines of code for the TensorFlow implementation we only include the code implementing operators used in the benchmark.

[24] https://youtu.be/Nbuj76ZQd6Q

8. References

[1] K.Chen, J.Klang and E.Zeitler: From Publication to Production: Interactive Deployment of Forklift Activity Recognition, The 25th IEEE International Conference on Industrial Technology (ICIT), March 25, 2024, 66193576011c3dc94848e920_ICIT24-000025-final.pdf (website-files.com)

[2] Gustav Fahl and Tore Risch: Query Processing over Object Views of Relational Data, VLDB Journal, Springer Verlag, 6(4), pp 261-281, November 1997, https://link.springer.com/article/10.1007/s007780050045

[3] S.Flodin and T.Risch: Processing Object-Oriented Queries with Invertible Late Bound Functions, The 21^st International Conference on Very Large DataBases (VLDB'95)., Zurich, Switzerland, September 11-15, https://www.vldb.org/conf/1995/P335.PDF, 1995

[4] S.Flodin, K.Orsborn, and T.Risch: Using Queries with Multi-Directional Functions for Numerical Database Applications, 2nd East-European Symposium on Advances in Databases and Information Systems - ADBIS'98, Poznan (Poland), September 1998, https://link.springer.com/chapter/10.1007/BFb0057717

[5] J.Fredriksson, P.Svensson, and T.Risch: Mediator-Based Evolutionary design and Development of Image Meta-Analysis Environments, Journal of Intelligent Information Systems, 17( 2-3), pp 301-322, 2001, https://link.springer.com/article/10.1023/A:1012870100188

[6] R.Fomkin and T.Risch: Optimization and Execution of Complex Scientific Queries over Uncorrelated Experimental Data, Proc. 21st International Conference on Scientific and Statistical Database Management (SSDBM 2009), New Orleans, Louisiana USA, June 2009, https://link.springer.com/chapter/10.1007/978-3-642-02279-1_25

[7] W.Litwin and T.Risch: Main Memory Oriented Optimization of OO Queries using Typed Datalog with Foreign Functions, in special section on Main Memory Databases of IEEE Transactions on Knowledge and Data Engineering, Vol.4, No.6, Dec. 1992, https://ieeexplore.ieee.org/abstract/document/180603

[8] K.Mahmood and T.Risch: Scalable Real-Time Analytics for IoT Applications, IEEE International Conference on Smart Computing (SMARTCOMP), Irvine, CA, USA, 2021, https://ieeexplore.ieee.org/document/9556235

[9] K.Mahmood, T.Risch and K.Orsborn: Analytics of IIoT Data Using a NoSQL Datastore, IEEE International Conference on Smart Computing (SMARTCOMP), Irvine, CA, USA, 2021, https://ieeexplore.ieee.org/document/9556281

[10] M.Nyström and T.Risch: Engineering Information Integration using Object-Oriented Mediator Technology, Software - Practice and Experience J., Vol. 34, No. 10, pp 949-975, John Wiley & Sons, Ltd., 2004, https://onlinelibrary.wiley.com/doi/10.1002/spe.599

[11] Stream Analyze Sweden AB: Calling C plugins from SA Engine, https://studio.streamanalyze.com/docs/reference/sa_c_plugin_api, 2024

[12] Stream Analyze Sweden AB: Calling SA Engine from C, https://studio.streamanalyze.com/docs/reference/sa_c_client_api, 2024

[13] Stream Analyze Sweden AB: Distributed dynamic processing of strongly typed logical queries, Version 2.0, sa_QueryProcessing_2.0.pdf, 2024

[14] Stream Analyze Sweden AB: Multi-dimensional array queries, sa_ArrayQueries_2.0.pdf, 2024

[15] Stream Analyze Sweden AB: SA Engine C++ Interfaces, https://studio.streamanalyze.com/docs/reference/sa_cppapi, 2024

[16] Stream Analyze Sweden AB: SA Engine Java Interfaces, https://studio.streamanalyze.com/docs/reference/sa_javaapi, 2024

[17] Stream Analyze Sweden AB: SA Engine Python Interfaces, https://studio.streamanalyze.com/docs/reference/python-api, 2024

[18] Stream Analyze Sweden AB: SA Engine Reference, https://studio.streamanalyze.com/docs/reference/overview/, 2024

[19] Stream Analyze Sweden AB: SA Lisp User’s Guide, https://studio.streamanalyze.com/docs/reference/sa_lisp, 2024

[20] Stream Analyze Sweden AB: nn_benchmark_readme.md, also available under models/nn/test/benchmarks/readme.md of a release SA Engine system, 2024

[21] Stream Analyze Sweden AB: nn_optimization_2.0.pdf, 2024

[22] T.Truong and T.Risch: Transparent inclusion, utilization, and validation of main memory domain indexes, 27th International Conference on Scientific and Statistical Database Management, San Diego, USA, June 29-July 1, 2015, https://dl.acm.org/doi/abs/10.1145/2791347.2791375

[23] T.Truong and T.Risch: Scalable Numerical Queries by Algebraic Inequality Transformations, 19th International Conference on Database Systems for Advanced Applications, DASFAA 2014, Bali, Indonesia, 21-24 April, 2014, https://link.springer.com/chapter/10.1007/978-3-319-05810-8_7

[24] E.Zeitler and T.Risch: Massive scale-out of expensive continuous queries, Proceedings of the VLDB Endowment, ISSN 2150-8097, Vol. 4, No. 11, p 1181-1188, 2011, 37th International Conference on Very Large Databases, VLDB 2011, 2011, https://www.vldb.org/pvldb/vol4/p1181-zeitler.pdf

[25] C.Xu, E.Källström, T.Risch, John Lindström, L.Håkansson, and J.Larsson: Scalable Validation of Industrial Equipment using a Functional DSMS, Journal of Intelligent Information Systems, JIIS, Vol.48, No 3, 553â€“577, DOI 10.1007/s10844-016-0427-2, 2017, https://link.springer.com/article/10.1007/s10844-016-0427-2

[26] C.Xu, D.Wedlund, M.Helgoson, and T.Risch: Model-based Validation of Streaming Data, Proc. 7th ACM International Conference on Distributed Event-Based Systems, DEBS 2013, Arlington, Texas, USA, June 29 - July 3, 2013, https://dl.acm.org/doi/abs/10.1145/2488222.2488275

Highly advanced, yet easy to use