[Image via Tech Crunch]

By this point, I’m guessing you’ve heard the term “big data” bandied about a few times. You’ve probably seen more than your fair share of tweets, blog posts, and Wall Street Journal articles with titles like “What’s So Big about Big Data,” and “Hadoop: the (New) Elephant in the Room”. In case you haven’t bothered to read any of them, what’s so big about big data is: insight. Insight that goes well beyond site A had 35 unique visitors on Monday April 17 and salesperson B sold 14 chickens last quarter. Insight that can tell you: site A will have around 49 unique visitors on Monday, April 23 if you write an article about chickens and put it to the right above the fold. Insight that incites revenue-generating actions.

The question is: what is the best way to attain this insight?

There are a few schools of thought on this. The first prioritizes sheer volume of data. The second wants only high-quality data. The third says: data schmata, all I need is a killer algorithm. Which camp is right? During this meetup, we’ll attempt to find out. To help us make the decision, we’ve rounded up some of Boston’s and one of San Francisco’s preeminent data scientists, who will present reasons and real-world scenarios for why more data, better data, and/or better algorithms are the key(s) to ecumenical insight. Their sessions and bios are as follows:

  • Speaker: Paolo GaudianoIcosystem:
  • Arguing for: Better Algorithms
  • Session: It is often thought that the accuracy of a model depends heavily on data quality and quantity. However, the notion that numerical data are the only type of information needed to build an accurate model is flawed. We present a modeling approach that combines domain expertise and quantitative data to demonstrate that predictive models can be developed without quantitative data, and that in general any model built with both quantitative data and domain expertise will outperform models developed with either type of information alone. We will also mention real-world situations where this approach has been applied successfully.
  • Bio: Paolo Gaudiano is President and CTO of Icosystem, where he enjoys solving challenging business and technology problems for clients, while striving to ensure that Icosystem continues to be a stimulating, productive and fun company. He also serves as interim CEO of Infomous, Inc. and President of Concentric, Inc., two spinoffs created by Icosystem. After starting an academic career at Boston University, Paolo left his tenured position to pursue entrepreneurial opportunities with two start-ups, Artificial Life (as Chief Scientist) and Aliseo (as Founder and CEO). In 2001 he joined Icosystem, where he is able to nourish his multifaceted, interdisciplinary interests. He also continues to satisfy his passion for teaching through a position as Senior Lecturer at The Gordon Institute of Tufts University, and through a variety of speaking engagements. Paolo holds a B.S. in Applied Mathematics, an M.S. in Aerospace Engineering and a Ph.D. in Cognitive and Neural Systems.
  • Speaker: Christopher Bingham, Crimson Hexagon
  • Arguing for: Better Algorithms on More Data
  • Session: Often, analyzing more and more data doesn't improve your results: you just make the same mistakes at a larger scale.  We'll discuss several techniques that leverage the quantity of data, increasing accuracy as you scale.  Big data can thus lead to better analysis--not just bigger analysis.
  • Bio: Chris Bingham is the CTO and first employee of Crimson Hexagon, a leading provider of business intelligence based on social media analysis.
  • Speaker: Jeremy Rishel, Bluefin Labs
  • Arguing for: "D: All of the Above"
  • Session: At Bluefin Labs we analyze social TV at large scale, with 24/7 realtime systems looking at the content on over 100 networks and the conversation and audience dynamics about brands, advertising, shows, and more in public social media. The analytics derived about engagement patterns and audiences provide rich insights for brands, agencies, and TV networks. To do this we pursue "all of the above": more data, better data, and better algorithms. "More data" comes in many forms, including richer content streams and more granular sources. By including the broadest spectrum of data we're able to gain insights not possible in other ways. "Better data" in our world comes from a fundamental approach of human-machine collaboration and data management that permits us to achieve consistent high data quality. Finally we are always pursuing "better algorithms", for example in understanding the connections between audiences, as both we learn more about social TV patterns and engagement dynamics evolve. I'll be discussing some examples of each from the Bluefin platform and why all three - more data, better data, and better algorithms - are necessary.
  • Bio: Jeremy heads up Bluefin Labs' engineering, product, and data efforts. Jeremy was formerly the CTO and VP of Engineering at aPriori Technologies, which developed a groundbreaking approach to real-time analysis of complex design and manufacturing data to predict manufacturing methods and costs. Prior to that he led teams at i2 focused on transportation planning and optimization. Rishel earned BS degrees in Computer Science and Philosophy from MIT and served in the US Marine Corps for seven years, leaving active duty as a Captain.
  • Speaker: Josh Wills, Cloudera
  • Arguing for: Better Data
  • Session: When people are first introduced to Hadoop, one of the most common questions is, "when should I use Hadoop instead of a relational database?" In this talk, we'll walk through several use cases where Hadoop can solve problems better and faster than a relational database, even on relatively small data sets, in order to illustrate how Hadoop complements traditional data warehousing solutions.
  • Bio: Josh Wills is Cloudera's Director of Data Science, working with customers and engineers to develop Hadoop-based solutions across a wide-range of industries. Prior to joining Cloudera, Josh worked at Google, where he worked on the ad auction system and then led the development of the analytics infrastructure used in Google+. He earned his Bachelor's degree in Mathematics from Duke University and his Master's in Operations Research from The University of Texas at Austin.


The meetup will take place in the Boylston Room at the Copley Marriott, 110 Huntington Avenue, Boston, from 6-8pm. As noted on the meetup page, this event is currently full, but you are welcome to add your name to the wait list!

After the meetup, we will adjourn to a nearby bar—location to be posted asap.

See you there!