Monday, February 4, 2013

Scale-out with Lucene and Azure Table Storage

Initially Socedo used SQL Azure as its main storage.  But after onboarding a couple dozen customers, we realized SQL Azure no longer met our needs:
- our data size is growing at 5-8GB a day.  We were fast approaching the maximum 150GB single Azure database size limit.
- potentially we could use the recently release sharding feature.  But IMHO the feature isn't ready for prime time yet because developers would have to write a lot of plumbing code -- from auto-partitioning/balancing, to fan-out query to middle-tier integration figuring out which partition to query.
- SQL Azure still doesn't support full-text indexing, which customers have been begging for years. What a pity.
- finally, cost is the killer.  SQL Azure costs $1.76 per GB (for 100GB size of database).

After some research, Socedo team spent two weeks building the new backend on top of Azure Table Storage and Lucene.  Lucene technology powers many sites and applications, including the new Twitter search.  The new backend has been running smoothly for a month now and we're very pleased with the results so far.
- we lower the cost by 25 times from $1.76 per GB down to $0.07 per GB (local redundant)
- the maximum data size per table is 100TB (667 times bigger compared to 150GB SQL Azure database size limit!)
- we can easily scale out to onboard new customers by adding new partitions to the table storage
- Lucene allows us to score the leads dynamically at query time very efficiently