gephi twitter hack

Twitter is often held up as reason no.1 why launching at SXSW can work. Sometimes, it is held up as reason no. 1 why launching at SXSW can work if you have: a) an app that makes life at Sx easier, b) enough ca$h money to really show it off, and/or c) an founding team of tech wunderkidz.

A)   is the best reason, and it’s as true today as it was back in ’07. If you’re trying to build an event-based network on the fly, Twitter is your sugar mama. Exactly how sugary depends partially on partially on the quality and quantity of people you find, and partially on your own powers of communication once you’ve found them.

This post discusses three ways to find people on Twitter, presented in degrees of difficulty from easy-peasy to pretty easy.

Method 1: Advanced Twitter search

twitteradvanced.png

Twitter’s basic search is pretty crappy, but the advanced search is really useful—its only downsides are small quantity and lack of ability to export results (hence methods 2&3).

1)   Go tohttps://twitter.com/search-advanced

2)   Enter your parameters, eg : Any of these words: SXSW, SXSWi, near: Austin

3)   Click “search”

4)   Voila!

Alternatively, you can just type the query url, using this formula: https://twitter.com/search?q=SEARCHTERM%2C%20OR%20SEARCHTERM2%20near%3A%22CITY%22%20within%3ANUMBERMILESFROMCITYmi&src=typd

Method 2: ScraperWiki to Excel

ScraperWiki is a site where you can write or reuse web scraping scripts and export the results. As long as you find a scraper that does what you want, you don’t need to write any code.

Let’s say you want to run the query from method 1, because you want more results and you want to save them. For fun, let’s also query @ mentions, since they’ll give you more names to work with.

1. Go to scraperwiki.com and sign up for an account. They’re free.

2. Go to: https://scraperwiki.com/scrapers/sxswi_2/.  You should see this:

import scraperwiki
import simplejson
import urllib2 
 
 QUERY = 'sxswi'
 GEOINFO = '30.267153,-97.743061,50km'
 RESULTS_PER_PAGE = '100'
 LANGUAGE = 'en'
 NUM_PAGES = 2
 
 for page in range(1, NUM_PAGES+1):
     base_url = 'http://search.twitter.com/search.json?q=%s&rpp=%s&lang=%s&page=%s' 
         % (urllib2.quote(QUERY), RESULTS_PER_PAGE, LANGUAGE, page)
     try:
         results_json = simplejson.loads(scraperwiki.scrape(base_url))
         for result in results_json['results']:
             data = {}
             data['id'] = result['id']
             data['text'] = result['text']
             data['from_user'] = result['from_user']
             data['text'] = result['text']
 
             print data ['from_user'], ['text']
             scraperwiki.sqlite.save(["id"], data)
     except:
         print 'Oh dear, failed to scrape %s' % base_url
 
         
This is a script that will perform the same query as in method 1, plus retrieve any @ mentions in the tweets.

RLY IMPORTANT CAVEAT: As you may have heard, Twitter is getting kind of parsimonious vis-à-vis datamining, and its new 1.1 api requires authentication to grab user data. Luckily for you, this script will still work for now and through the conference, but I’ll have to update it once Twitter shutters 1.0 forever.

3. Click “copy.” This will give you your own fork of the script to edit as you wish.  The first thing you might want to do is adjust the coordinates—mine are set to Boston. Austin’s are 30.267153,-97.743061. If you want tweets to be nearer than 50km, adjust that well. If you want to search for SXSWi AND Robots, you can do that too. Tis yer erster.

4. Once you’ve made your tweaks, click Run. You should see results start to stream through the console, and the count in the data tab should go up.

5. Once the script is finished, click “Back to scraper overview,” and then click “download.” You have the option to download as a SQLite db, CSV, or JSON file. Go with the CSV in this method.

6. Last step: open the csv in Excel. Voila: you have your data!

Method 3: CSV to Gephi.

The big problem with spreadsheets is they are uber dry to look at. Gephi is a really neat open-source visualization tool that transforms csv and gefx files into network graphs.

1)   Go to http://gephi.org/ and download it. Gephi works on Windows, Linux, and OS X. If you’re running Leopard+, go to utilitiesàjava preferences. If it prompts you to install Java 6, do it; if not, you’re good to go.

2)   Put Gephi in your applications folder and open it.

3)   Now, go back to good ole scraperwiki and make a fork of this scripthttps://scraperwiki.com/views/example_twitter_hashtag_user_friendship_network_10. This performs the same query as in method 2, only it gives you the output in a gefx file, which is totally the way to go, because you will have to go over your csv with a fine tooth comb to get Gephi to take it.

4)   Run the script. When it’s done, you’ll see an XML based GEXF file in the console. I would replicate it right now, but I have exceeded my paltry rate calls. But anyways, it’s the second gefx link, the one with viz in it. Open it up, and copy the output to a plain-text text edit. Save it with a .gefx suffix.

5)   Ok, now you get to open this baby in Gephi.

6)   Here is Gephi’s awesome quickstart tutorial, which will turn the black and white, uniformly-sized network graph you see before you into a color&size cornucopia with labeled nodes, like the one you see at the top of this post.

7)   That’s it!

Comment