Engineering Blog

This month’s update revolves about two main O2MC I/O platform capabilities which revolve around more connectivity options and overall robustness

Camel

Apache Camel provides a library of connections to several sources such as FTP, Dropbox, DNS lookup, Github, Gmail, Kafka, Splunk, Websockets and many more. By installing this one framework, all these connections are now available in native DimML code. The capability allows the developer to very quickly connect to the sources that are a part of the Apache Camel framework and use the connection in DimML applications. Applications which use camel have already been created for getting XML files of a FTP server and providing web analytics data in a file in Dropbox. More details on the camel integration can be found at http://documentation.dimml.io/flows/#camel. The documentation is an exact place for the definition of all language elements as well as use cases/copy paste examples of DimML applications.

Store flow element

Over the last weeks we have experienced several challenges in delivering data to files on (s)FTP servers. This was caused by the fact that the O2MC I/O platform distributes processing across several servers in parallel and each of the sends data to the destination server individually. While the FTP protocol supports a lot of data being transferred sequentially, it does not like working with a lot of data in parallel.

The way typical cloud based solutions deal with this, is that their entire cloud consists of several different roles for the different servers. For instance, one of the servers deals with collecting the data, one with processing and one with sending the data. The server specification can be optimized based on the specific role it has.

We have a different setup for our cloud. I feel it’s a far better choice to not be dependent on specific roles in the architecture. Instead with each new server added, communication takes place to define a (common strategy) which is executed. Each server has the same role initially and work is distributed based on the necessary workload. This is a methodology which is more difficult to implement, but has a lot of scalability advantages. Adding a new server does not require a lengthly investigation of what the best role is, nor a possible incorrect choice. Maintenance is relatively easy using the auto scaling functionality already part of the O2MC I/O platform.

The store element as well as the existing ftp element now use a new mechanism for sending files. An intermediate step is introduced where each of the data processing servers produces files based on the data he received. At the end of the time frame, so at the moment the file should be available, one server is chosen to combine all the files and place it on the destination location. Additionally, exponential back off is implemented. This means that if the destination location is not available, the file is stored and a lot of attempts to resend the file. These attempts are more frequent in the beginning than at the end, resulting in the term exponential

I feel it’s an excellent robust feature, basically guaranteeing data always being available if you take less than a week to solve any connection problem.

iStock_000034141852_Double_10246HJtLkJSDqjqlE2NipEu_macbook_1024