Database, meet realtime data logging.

| 7 Comments | No TrackBacks

Gridspy Dashboard Banner

I have been programming in C++ for ages, so it would come as no surprise that I am comfortable with it. That however, does not make me god’s gift to database programming. I have tinkered with these mysterious beasts several times and always come away impressed at how much I could do with so little effort. With just a little SQL, an amazing amount of data just leaps from the hard disk into a handy table. So, with a little trepidation, I set out to design a database schema for the Gridspy power-monitoring system.

Of course, back then it hardly had a name. I personally called it ActiveCore, because it was the active core of your house. I think that Dad called it something cute, like 5301. He has a tendency to pick names like that.

So… have you heard of the Second System Effect?

If you have been lucky enough to work on a first system and found it enough of a hack - god help you if someone asks you to make a second system from scratch. It always turns into a huge heap of over engineering. This was true from the database schema to the code in C++ and it quickly turned what should have been a simple program into a nightmare. As it turns out, knowing about the fallacy does nothing to prevent you from committing it. There was a table for each idea, and expensive joins left, right and centre. You needed to join about 5 tables to determine which inputs to monitor!

It doesn’t help that I was up against a challenging problem. I wanted to monitor many sensors for a long time, at a high resolution. There are 86,400 seconds in the day - this multiplies quickly if you have 10 sensors and 365 days of history. Of course, this data should not be lost of the user decides to move or upgrade the sensor. The finished data table had indexes on all the fields I used (date, value, id, sensor id, and more!). At one point 5mb of data (for a single building no less) had over 100mb of extra data just to (in theory) accelerate lookups.

So, we have an over-engineered schema and C++ deamon. Now we just need to throw some real time demands into the mix. Naturally the design calls for a way to see how much power you are using this minute. On paper it looked great to put the database between the deamon and the website. In reality this led to the (relatively underpowered) computer maxing out the CPU.

5030 lights banner.png We also have the ability to control outputs and switch lights. This was done by writing the new output state into a row in the database. However, I knew we had a problem when the deamon was taking ten seconds at a time to turn on outputs in response to inputs. At one point Dad turned on a light to go up the stairs and waited a little. He soon set off up the dark stairs and returned from his errand in time to see the light switch on. Trust me, this is totally sorted out now.

In retrospect it is so obvious. Instead of putting the database at the middle of the system it has now been relegated to its proper location - a permanent data store. While preferences and historical data is stored in the DB, live data and communications with our sensors are done directly with a custom server written on the Twisted framework, rather than having our sensors send many individual Django page requests. Each sensor establishes an ongoing connection with Twisted and streams data and other info. Twisted can then send back commands to the sensors, including setting outputs.

Because Twisted has an ongoing connection with the sensor, it handily punches straight through firewalls and NATs. I knew I had made the right decision about using Twisted when I controlled my first output. My http request went to Django, who passed a XML request to Twisted, which sent a custom message to the Nexus. The Nexus had instructed the output to turn on so fast that it seemed totally instant. No Database could do it quite that fast, not even with triggers.

I couldn’t recommend a language and framework more highly than Twisted Python. Now that I am feeling the power of Python, writing C++ is a total bore by comparison. Of course, as soon as something needs to execute super- fast I’ll be back into C++ for that little bit, but no need yet!

I now have a solution that can scale into thousands of sensors. Lets just hope I am onto the third system now.

Further reading:

My replacement for the database : Live power monitoring

No TrackBacks

TrackBack URL: http://blog.gridspy.co.nz/cgi-bin/mt-tb.cgi/4

7 Comments

"GridSpy" is a great name! Desirable upon first hearing it. Who wouldn't want to spy on how much power is coming from the power grid?

If you'd chosen MeterSpy, you'd get lots of people spelling it wrong: "Is it Metre or Meter?" etc :)

You say: "Each sensor establishes an ongoing connection with Twisted and streams data and other info".

Does this mean that in sensor you need to have intelligent "data streamer" or do you have one machine that monitors these sensors and manages these "data streamers" as different processes?

Ah.. cool.
I couldn't figure out that sensor == Nexus aka the 8051-based custom thingy you had developed ;)

I'm a bit concerned about the usage of TCP/IP. Does it mean that one still needs to have a router acting as dhcp server (and whole TCP/IP networking setup..)?

I think you could sell this to people who are running there own power system like solar and wind. I would love a system like this that I could link up to my PC and monitor so I could make adjustments to improve the power systems performance etc.

Leave a comment