Extensible Storage Engine (ESE) : Storage option for large Sensor data

Currently I am evaluating the storage options to store large volume of data generated by sensors ( temperature,pressure,pH  … ) .  These sensor generate more than 1,000,000 data samples each day.  Goal is to select efficient storage option which is simple to deploy , easy to implement, and provides reliability against the application crashes.

Obvious choices include simple comma separated file ( CSV ) or embedded database solution. Each of the above mentioned options have its own pros/cons.

On searching  the internet for alternative solution found article on Extensible Storage Engine ( ESE ) which seems to be promising option . In this blog post I will discuss the same.

Extensible Storage Engine (ESE) is storage engine technology built into Windows since Windows 2000 . It is distributed has library (ESENT.DLL ) . It is used extensively inside many Microsoft  products  such as Active Directory, Microsoft Exchange  and Windows Desktop search. Outside Microsoft  it is  used in  RavenDB – a NoSQL document database built on top of “ESE”.

At high level “ESE” has following advantage over CSV and RDBMS

  • Zero-deployment  – Built into windows.
  • Extensively tested – Used inside Microsoft products and open source projects.
  • Easy to administer  –  Use file manipulation tools such as copy/move to backup the files.
  • Easy to use    – Managed .NET wrapper with LINQ support is available.
  • Others – Robust crash recovery, intelligent cashing, concurrent access to data and many more.

Even though ESE library is “C” API , there is managed .NET API available on codeplex website.

Codeplex project also provides an implementation of Persistent Dictionary . “PersistentDictionary” is a database-backed dictionary built on top of the ESENT database engine. It is a drop-in compatible replacement for the generic Dictionary<TKey,TValue>.

I did some  prototype implementation based on the sample code available at codeplex project  .This implementation  uses persistent dictionary to stores sensor data with key being the date time and value being the actual data.Initial performance results are good.

Here is the details about the sample application. This application generates 1,000,000 random sample data along with the timestamp. It uses date time has key and sensor data has value.

Make sure that you use the nuget to install managed ESENT .NET library.

Sample code

using Microsoft.Isam.Esent.Collections.Generic;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

namespace SensorStorage
{
    ///
/// value type that stores sensor id
    /// and the actual sensor value.
    ///
    [Serializable]
    struct Sensor
    {
        public string Id;
        public float  Value;
    }

    class StoragePerfTest
    {
        public static int MAX_RECORDS = 1000000;
        public static int MAX_SENSORS = 16;
        // sensor id list
        public static List SensorIdList = new List
        {
            "46d77be8-1b6c-40e4-be90-657bba696db4",
            "84ccd877-e8be-4da8-81ae-1d7bda0ff2d7",
            "49be5ff1-bc44-4173-847f-33a9394d8787",
            "c41f8e8e-ca42-4248-8e2a-f047db76efcc",
            "b94bd698-9deb-4857-a685-f2b4920765ad",

            "3ff45aed-6c31-4079-8d83-6b0a6d2516ec",
            "d53d0cc8-debf-43a8-8ecf-37a9ff5a8afc",
            "2cb6ec16-da28-498f-baf3-9bcc661cd18e",
            "5b9843a4-db0d-4887-99f4-1b460eaa3c05",
            "7e3b9124-9816-4ecc-8d1c-95030bb93fc7",

            "01487c6a-cf7f-47ea-b870-58479ec5fd61",
            "ab271b46-b224-45e6-a8b7-1d04e20e0de2",
            "88c9d2e3-a411-446e-bf0a-d34e9278764d",

            "46a8c28f-07ab-4559-a9c7-5986d6e84baa",

            "df723ed8-70ee-4ca7-b8ae-df0902d481a3",
            "98e376bf-82e7-4bf3-96ab-f66cde5537bd"
        };

        ///
/// generate random value simulating the sensor data.
        ///
        ///
        static float NextFloat(Random random)
        {
            var buffer = new byte[4];
            random.NextBytes(buffer);
            return BitConverter.ToSingle(buffer, 0);
        }
        ///
/// gets next sensor id from the list simulating
        /// random sensor sending value to application
        ///
        ///
        ///
       static string GetNextSensorId(Random random)
        {
            int index = random.Next(0, MAX_SENSORS - 1);
            return StoragePerfTest.SensorIdList[index];
        }

        static void Main(string[] args)
        {

            if ( !PersistentDictionaryFile.Exists("SensorData"))
            {
                Console.WriteLine("No sensor data available");
                return;
            }

            if (args.Length == 0)
            {
                GenerateSensorData();

            }
            else if (args.Length == 2)
            {
                QueryRecords(DateTime.Parse(args[0]), DateTime.Parse(args[1]));
                return;
            }

            Console.WriteLine("Usage : StoragePerfTest.exe [<start_time> <end_time>] ");
            Console.WriteLine("To Generate sample data execute StoragePerfTest without any arguments . Example : StoragePerfTest.exe  ");
            Console.WriteLine("To query . Example : StoragePerfTest.exe \"7/27/2013 05:42:00 AM\" \"7/27/2013 05:44:00 AM\"");
            return;
        }
        ///
///  generate random records
        ///
        private static void GenerateSensorData()
        {
            Console.WriteLine("Deleting old sensor and generating new data");
            PersistentDictionaryFile.DeleteFiles("SensorData");
            Random random = new Random();
            Stopwatch watch = new Stopwatch();

            using (var dictionary = new PersistentDictionary<DateTime, Sensor>("SensorData"))
            {

                watch.Start();
                DateTime curr = DateTime.Now;

                for (int i = 0; i < MAX_RECORDS; i++)
                {
                    curr = curr.AddSeconds(1);
                    dictionary.Add(curr, new Sensor { Id = GetNextSensorId(random), Value = NextFloat(random) });

                }
                watch.Stop();
            }
            Console.WriteLine("Inserted {0} records in {1} miliseconds", MAX_RECORDS, watch.ElapsedMilliseconds);
        }

        ///
/// queries record
        ///
        ///
        ///
        private static void QueryRecords(DateTime st,DateTime et)
        {
            Stopwatch watch = new Stopwatch();
            using (var dictionary = new PersistentDictionary<DateTime, Sensor>("SensorData"))
            {
                watch.Start();
                var samples = from x in dictionary
                              where x.Key >= st && x.Key <= et
                              select x;
                int count = samples.Count();
                watch.Stop();
                Console.WriteLine("{0} Records Found for the query in {1} miliseconds", count, watch.ElapsedMilliseconds);
                foreach (var s in samples)
                {
                    Console.WriteLine("Key {0} = [{1},{2}]", s.Key, s.Value.Id, s.Value.Value);
                }
            }
        }

    }
}

Performance results

One my laptop ( 5400 rpm hd, dual core cpu with 4gb ) , I got following results.

  • Total records inserted : 1,000,000.
  • Database Size : 215MB
  • Add speed: ~18000 records/seconds into persistent dictionary.
  • Query speed : 120 records in 80 milliseconds to query the data in the middle of the sample data.

I will be doing further investigation to see the suitability of the library for the project .

Resource

  1. Codeplex project.
  2. MSDN Documentation.
  3. ESE Database Viewer.
  4. RavenDB developer blog on ESE

.

Advertisements