Featuring: Billy Bosworth (DataStax, @datastax), Jeremy Edberg (Netflix, @jedberg), STS Prasad (Walmart, @stsprasad), Ed Anuff (Apigee, @edanuff) DataStax CEO Billy Bosworth moderated this panel about the business motives behind and effects of making the distributed computing jump. Edberg, Prasad and Anuff all said it was a matter of sink or swim (or scale up)--they couldn't go on supporting their customers in a meaningful way if they stuck with a data warehouse. Netflix needed a distributed, resilient system; Walmart needed to rapidly process data into what Prasad called "the social genome." All three companies ended up choosing Cassandra.
Challenges in moving to distributed computing:
- Single-node loss, because it overloaded neighbor nodes
- Rethinking ways to find and query data
- Compaction--it causes a performance hit, so Walmart ended up using SSDs to compensate.
- Counters made it easy to implement a system with real-time reporting.
- The ability to administer Cassandra and get the best from it hasn't translated into need to increase hiring.
- No need to worry about disappearing data.
- Speed of negative lookups is much faster than expected.
- The more data you put into your system, the better the system gets. (This has been a popular refrain).
- Relational databases aren't going away--there's a place for both relational and distributed, and most companies will need both. NoSQL for real-time; SQL for batch-processing.
- Just having the data isn't enough--you need the ability to rapidly extract insight from it.
- Look to see more innovation in the business application side
Interesting fact: Netflix uses multi-region rings--one cassandra cluster across multiple geographic regions--both for resilience and so its US customers can travel abroad without loss of service.