WSO2 Stream Processor got introduced recently at WSO2Con 2017 in London as next generation of WSO2’s analytics platform. WSO2 Stream Processor will be the successor to WSO2 Data Analytics Server (DAS) and is designed to be lightweight, lean and cloud native.
WSO2 Stream Processor allows you to create applications that analyze data in real-time utilizing Siddhi  complex event processing engine. The input can get:
- Captured via different transport protocols including WSO2Event, HTTP, TCP, JMS, MQTT, Email, Kafka etc., and in manifold formats including XML, JSON, Binary, Text, Map etc.
- Analyzed based on different analytics concepts including stream processing, complex event processing, incremental aggregation and machine learning.
- Published to different interfaces via different transport protocols including WSO2Event, HTTP, TCP, JMS, MQTT, Email, Kafka etc., and a wide range of formats including XML, JSON, Binary, Text, Map etc.
The following picture gives a high level overview about how stream processing is executed within WSO2 Stream Processor:
WSO2 Stream Processor is currently available as an alpha version, which allows you to play around and collect first impressions. In this blog, we have a closer look at how to install and configure the product.
Installing WSO2 Stream Processor
The installation of the WSO2 Stream Processor is quite simple:
- Download the latest version of the product 
- Extract the archive to a dedicated directory
- Set JAVA_HOME to point to the directory where the Java Development Kit (JDK) is installed, which must be Oracle JDK 1.8 (OpenJDK is not recommended)
Now everything is set to give WSO2 Stream Processor a first test drive.
Running WSO2 Stream Processor
One of the first things you will notice after you have installed WSO2 Stream Processor is that there is no wso2server.sh/.bat in the <SP_HOME>/bin directory anymore like it was with WSO2 Data Analytics Server. Instead there are 4 startup scripts, one for each of the following components:
- Stream Processor Studio (editor.sh/.bat)
- Stream Processor Dashboard (dashboard.sh/.bat)
- Stream Processor Worker (worker.sh/.bat)
- Stream Processor Manager (manager.sh/.bat)
The Stream Processor Studio provides a browser-based development environment for WSO2 Stream Processor. WSO2 plans to release plugins for most of the popular IDEs, like IntelliJ or Eclipse but for now, you can only use the Stream Processor Studio.
The Stream Processor Dashboard provides three browser-based user interfaces:
- The Status Dashboard to monitor performance
- The Business Rules Manager to manage business rules
- The Portal to manage custom dashboards
The Stream Processor Worker provides a resource node that does the actual work by executing one or more Siddhi applications. And finally, in a distributed setup the Stream Processor Manager is responsible for dispatching jobs in his role as job manager.
In a simple setup, you need to run at least a Stream Processor Worker node and most-likely a Stream Processor Dashboard node. There is no “all-in-one” profile like for WSO2 Data Analytics Server anymore.
Configuring WSO2 Stream Processor
To get started with the WSO2 Stream Processor the default configuration works well. However, when you aim to use the WSO2 Stream Processor in a production environment, you need to make changes to the configuration.
The configuration of the WSO2 Stream Processor is no longer stored in a set of XML files. We will see this more and more with new WSO2 products. XML is being replaced with YAML (YAML A‘int Markup Language), which is less verbose and thus more easily readable.
Each component has a YAML file named deployment.yaml that covers most of the component-specific configuration like carbon configuration parameters, data source configuration, cluster configuration, etc. In addition, there is a YAML file named netty-transport.yaml for transport-specific configurations like HTTP and HTTPS listener ports of the server.
| # data source configuration
– name: WSO2_METRICS_DB
description: The datasource used for metrics
connectionTestQuery: SELECT 1
Example of a data source configuration in deployment.yaml
WSO2 has not only reduced the number of configuration files and changed the format from XML to YAML. Also, the way that some features need to be configured has changed. A good example is the configuration of a minimum HA cluster with 2 worker nodes.
One interesting thing is that WSO2 Stream Processor no longer utilizes Hazelcast for clustering. Instead, it uses a shared database for coordination and REST APIs for direct communication between nodes.
To setup a minimum HA cluster, you must perform these steps:
- Create a MySQL database to be used as shared data source for cluster coordination
- For both nodes, copy the MySQL driver to the <SP_HOME>/lib directory
- For both nodes, apply following changes to <SP_HOME>/conf/worker/deployment.yaml
- In the carbon section, set a unique value for the server ID
- In the datasources section, configure the data source for cluster coordination
- In the config section, enable the cluster mode and set the data source name
- In the config section, set the type to ha and enable liveSync
|Note: If you want to run two nodes on the same machine, a few additional changes are required .|
WSO2 Stream Processor is a radical redesign of WSO2 Data Analytics Server and associated products. Installation is as easy as always but with the change to the YAML format, configuration is simpler and clearer.
Because of the lean and lightweight design, WSO2 Stream Processor starts very fast and has a small memory footprint. In our tests, a Stream Processor Worker node started in around 7 seconds and was running with less than 300 MB of memory. In comparison, a Data Analytics Server “all-in-one” node needed almost 1 minute to start and allocated more than 1.1 GB of memory.
The current alpha version looks very promising and we will keep you posted about upcoming releases. https://github.com/wso2/siddhi