Extensible Storage Engine (ESE) : Storage option for large Sensor data

Currently I am evaluating the storage options to store large volume of data generated by sensors ( temperature,pressure,pH  … ) .  These sensor generate more than 1,000,000 data samples each day.  Goal is to select efficient storage option which is simple to deploy , easy to implement, and provides reliability against the application crashes.

Obvious choices include simple comma separated file ( CSV ) or embedded database solution. Each of the above mentioned options have its own pros/cons.

On searching  the internet for alternative solution found article on Extensible Storage Engine ( ESE ) which seems to be promising option . In this blog post I will discuss the same.

Extensible Storage Engine (ESE) is storage engine technology built into Windows since Windows 2000 . It is distributed has library (ESENT.DLL ) . It is used extensively inside many Microsoft  products  such as Active Directory, Microsoft Exchange  and Windows Desktop search. Outside Microsoft  it is  used in  RavenDB – a NoSQL document database built on top of “ESE”.

At high level “ESE” has following advantage over CSV and RDBMS

  • Zero-deployment  – Built into windows.
  • Extensively tested – Used inside Microsoft products and open source projects.
  • Easy to administer  –  Use file manipulation tools such as copy/move to backup the files.
  • Easy to use    – Managed .NET wrapper with LINQ support is available.
  • Others – Robust crash recovery, intelligent cashing, concurrent access to data and many more.

Even though ESE library is “C” API , there is managed .NET API available on codeplex website.

Codeplex project also provides an implementation of Persistent Dictionary . “PersistentDictionary” is a database-backed dictionary built on top of the ESENT database engine. It is a drop-in compatible replacement for the generic Dictionary<TKey,TValue>.

I did some  prototype implementation based on the sample code available at codeplex project  .This implementation  uses persistent dictionary to stores sensor data with key being the date time and value being the actual data.Initial performance results are good.

Here is the details about the sample application. This application generates 1,000,000 random sample data along with the timestamp. It uses date time has key and sensor data has value.

Make sure that you use the nuget to install managed ESENT .NET library.

Sample code

using Microsoft.Isam.Esent.Collections.Generic;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

namespace SensorStorage
{
    ///
/// value type that stores sensor id
    /// and the actual sensor value.
    ///
    [Serializable]
    struct Sensor
    {
        public string Id;
        public float  Value;
    }

    class StoragePerfTest
    {
        public static int MAX_RECORDS = 1000000;
        public static int MAX_SENSORS = 16;
        // sensor id list
        public static List SensorIdList = new List
        {
            "46d77be8-1b6c-40e4-be90-657bba696db4",
            "84ccd877-e8be-4da8-81ae-1d7bda0ff2d7",
            "49be5ff1-bc44-4173-847f-33a9394d8787",
            "c41f8e8e-ca42-4248-8e2a-f047db76efcc",
            "b94bd698-9deb-4857-a685-f2b4920765ad",

            "3ff45aed-6c31-4079-8d83-6b0a6d2516ec",
            "d53d0cc8-debf-43a8-8ecf-37a9ff5a8afc",
            "2cb6ec16-da28-498f-baf3-9bcc661cd18e",
            "5b9843a4-db0d-4887-99f4-1b460eaa3c05",
            "7e3b9124-9816-4ecc-8d1c-95030bb93fc7",

            "01487c6a-cf7f-47ea-b870-58479ec5fd61",
            "ab271b46-b224-45e6-a8b7-1d04e20e0de2",
            "88c9d2e3-a411-446e-bf0a-d34e9278764d",

            "46a8c28f-07ab-4559-a9c7-5986d6e84baa",

            "df723ed8-70ee-4ca7-b8ae-df0902d481a3",
            "98e376bf-82e7-4bf3-96ab-f66cde5537bd"
        };

        ///
/// generate random value simulating the sensor data.
        ///
        ///
        static float NextFloat(Random random)
        {
            var buffer = new byte[4];
            random.NextBytes(buffer);
            return BitConverter.ToSingle(buffer, 0);
        }
        ///
/// gets next sensor id from the list simulating
        /// random sensor sending value to application
        ///
        ///
        ///
       static string GetNextSensorId(Random random)
        {
            int index = random.Next(0, MAX_SENSORS - 1);
            return StoragePerfTest.SensorIdList[index];
        }

        static void Main(string[] args)
        {

            if ( !PersistentDictionaryFile.Exists("SensorData"))
            {
                Console.WriteLine("No sensor data available");
                return;
            }

            if (args.Length == 0)
            {
                GenerateSensorData();

            }
            else if (args.Length == 2)
            {
                QueryRecords(DateTime.Parse(args[0]), DateTime.Parse(args[1]));
                return;
            }

            Console.WriteLine("Usage : StoragePerfTest.exe [<start_time> <end_time>] ");
            Console.WriteLine("To Generate sample data execute StoragePerfTest without any arguments . Example : StoragePerfTest.exe  ");
            Console.WriteLine("To query . Example : StoragePerfTest.exe \"7/27/2013 05:42:00 AM\" \"7/27/2013 05:44:00 AM\"");
            return;
        }
        ///
///  generate random records
        ///
        private static void GenerateSensorData()
        {
            Console.WriteLine("Deleting old sensor and generating new data");
            PersistentDictionaryFile.DeleteFiles("SensorData");
            Random random = new Random();
            Stopwatch watch = new Stopwatch();

            using (var dictionary = new PersistentDictionary<DateTime, Sensor>("SensorData"))
            {

                watch.Start();
                DateTime curr = DateTime.Now;

                for (int i = 0; i < MAX_RECORDS; i++)
                {
                    curr = curr.AddSeconds(1);
                    dictionary.Add(curr, new Sensor { Id = GetNextSensorId(random), Value = NextFloat(random) });

                }
                watch.Stop();
            }
            Console.WriteLine("Inserted {0} records in {1} miliseconds", MAX_RECORDS, watch.ElapsedMilliseconds);
        }

        ///
/// queries record
        ///
        ///
        ///
        private static void QueryRecords(DateTime st,DateTime et)
        {
            Stopwatch watch = new Stopwatch();
            using (var dictionary = new PersistentDictionary<DateTime, Sensor>("SensorData"))
            {
                watch.Start();
                var samples = from x in dictionary
                              where x.Key >= st && x.Key <= et
                              select x;
                int count = samples.Count();
                watch.Stop();
                Console.WriteLine("{0} Records Found for the query in {1} miliseconds", count, watch.ElapsedMilliseconds);
                foreach (var s in samples)
                {
                    Console.WriteLine("Key {0} = [{1},{2}]", s.Key, s.Value.Id, s.Value.Value);
                }
            }
        }

    }
}

Performance results

One my laptop ( 5400 rpm hd, dual core cpu with 4gb ) , I got following results.

  • Total records inserted : 1,000,000.
  • Database Size : 215MB
  • Add speed: ~18000 records/seconds into persistent dictionary.
  • Query speed : 120 records in 80 milliseconds to query the data in the middle of the sample data.

I will be doing further investigation to see the suitability of the library for the project .

Resource

  1. Codeplex project.
  2. MSDN Documentation.
  3. ESE Database Viewer.
  4. RavenDB developer blog on ESE

.

Advertisements

Install/Configure Jenkins continuous integration server on Linux.

In this blog post I will describe installing and configuring Jenkins Continuous integration server. This set-up was used for one of  my current project to build Linux Application software.

I used Ubuntu 12.04 virtual machine running inside VirtualBox for this tutorial.

Jenkins is an continuous integration server written Java . Jenkins monitor configuration management ( CVS,SVN,Perforce.. ) servers  for changes , such as source code check-ins. Once it detects changes it will update the local working directory with the source from CM and performs series of build steps defined by the user on the source code. These build steps could be simple as invoking the shell script or  build script such as make,ant .

Jenkins has many plug-ins available to extend its feature set and to integrate with other Software tools ( unit test, code coverage, code analysis ).

Jenkins is Java based software , hence it requires Java Runtime as prerequisite on a system.

Jenkins software is available as Debian package . To Install latest Jenkins software on Ubuntu  execute following steps from command line

image

Once the installation completes make sure that Jenkins server up and running by opening web browser and pointing to http://localhost:8080 .

If Jenkins running you should see following page in the web browser . This completes the installation.

image

Configuring Jenkins  with Perforce Configuration Management.

Jenkins natively support CVS and SVN CM. In my application I use Perforce for configuration management. Since Jenkins natively does not support Perforce CM I need to install Perforce plugin for Jenkins.   Installing plugin is easy, Select “Jenkins->Manage Jenkins” option from the menu , select available tab to list the plugin and select and install perforce plugin. Following picture shows the “Manage Jenkins” page.

Jenkins-inst-4-perforce

You can discover and install additional plug-in similar to perforce plugin from the page.

Configure source code build.

From the Jenkins main page select NewJob to create simple build step.

In the next page enter job name and select “Build free style software project” option . You can read about the various option by selecting help icon next to option.

Jenkins-inst-4-perforce-1

Next page allows user to enter Configuration management specific details, in case Perforce CM, it user name,password and server details as shown below. These options differ based on the selected Configuration management .

Jenkins-inst-4-perforce-2

Other configuration details such as source code depot ( Perforce specific ), workspace details , Poll interval ( how often Jenkins should poll CM for changes ), where to copy the build output   are entered on the same page . Once the Configuration Management specific details are entered next step is provide instruction Jenkins to build software.

For this demo I have selected option to “Execute Shell” option to enter build command. With this option you can enter any Linux shell command . I will Apache ant to build my source code . Jenkins will execute these command once it check-out source code from the CM . I have entered following command to illustrate the build setup

cd projecta-src/

echo “Build started…”

ant BuildRelease

echo “Build Ended”

Jenkins-inst-4-perforce-4

After saving this step Jenkins will start monitoring  Configuration Management for changes, if changes detected it will pull changes to local workspace and  execute the build scripts.

Resources:

1. Jenkins website.

2. Jenkins Plugin repository.