Featuring: Billy Bosworth (DataStax, @datastax), Jeremy Edberg (Netflix, @jedberg), STS Prasad (Walmart, @stsprasad), Ed Anuff (Apigee, @edanuff) DataStax CEO Billy Bosworth moderated this panel about the business motives behind and effects of making the distributed computing jump. Edberg, Prasad and Anuff all said it was a matter of sink or swim (or scale up)--they couldn't go on supporting their customers in a meaningful way if they stuck with a data warehouse. Netflix needed a distributed, resilient system; Walmart needed to rapidly process data into what Prasad called "the social genome." All three companies ended up choosing Cassandra.

Challenges in moving to distributed computing:

  • Single-node loss, because it overloaded neighbor nodes
  • Rethinking ways to find and query data
  • Compaction--it causes a performance hit, so Walmart ended up using SSDs to compensate.

Nice surprises:

  • Counters made it easy to implement a system with real-time reporting.
  • The ability to administer Cassandra and get the best from it hasn't translated into need to increase hiring.
  • No need to worry about disappearing data.
  • Speed of negative lookups is much faster than expected.

In conclusion:

  • The more data you put into your system, the better the system gets. (This has been a popular refrain).
  • Relational databases aren't going away--there's a place for both relational and distributed, and most companies will need both. NoSQL for real-time; SQL for batch-processing.
  • Just having the data isn't enough--you need the ability to rapidly extract insight from it.
  • Look to see more innovation in the business application side

Interesting fact: Netflix uses multi-region rings--one cassandra cluster across multiple geographic regions--both for resilience and so its US customers can travel abroad without loss of service.

 

1 Comment