marc smith nodexl [Image credit: Marc Smith]

One of the most interesting segments of email analytics is social graphing—that is, mapping out the relationships of a given inbox. You can do this as a simple one:one tie, but it is more interesting and insightful if you weight the ties according to any number of criteria (number of responses, time between responses, tone of content, number of ties to others in your network etc). In the course of researching what these criteria might be, I’ve come across a bunch of very cool papers on social network analysis that I thought I’d share with you. For a quick overview on SNA, I recommend reading Valdis Krebs’s introduction.

1)      Reputation Network Analysis for Email Filtering (from KDNuggets)

  • By: Jennifer Golbeck and James Hendler
  • Gist: Golbeck and Hendler take an inverse approach to spam: they highlight good messages and display the numerical reputation of their senders. This reputation score is inferred based on reputation scores users have manually entered for people they know.
  • Money quote: “The goal of this scoring system is not to give low ratings to bad senders, thus showing low numbers next to spam messages in the inbox. The main premise is to provide  higher ratings to non-spam senders, so users are able to identify messages of interest that they might not otherwise have recognized. This puts a lower burden on the user, since there is no need to rate all of the spam senders.”

3)    Assessing Vaccination Sentiments with Online Social Media: Implications for Infectious Disease Dynamics and Control

  • By: Marcel Salathé
  • Gist: One of the more compelling usages of SNA is as an epidemic/pandemic forecaster, the theory being that the communication patterns of nodes in a given network can mirror the physical contact patterns. Here, Salathe looks at twitter data from 101,853 users and assesses their H1N1 vaccination sentiment over time. He found that a) positive and negative sentiments form clusters, and b) there is a positive correlation between a cluster’s negative vaccine sentiments and its likelihood of disease outbreaks. Whether this is due to causation or homophily, I’m not sure.
  • Money quote: “We find that projected vaccination rates based on sentiments expressed on Twitter are in very good agreement with vaccination rates estimated by the CDC with traditional phone surveys.”

2)      Measuring Tie-Strength in Virtual Social Networks

  • By: Andrea Petroczi
  • Gist: This paper gives some good background on computer-mediated social networks and tie-strength, and gives a methodology for determining the latter based on the VTS-scale, which measures acquaintance and friendship among members of a given virtual community.
  • Money quote: “Both offline and on-line social networks can be described by 1) their participants, 2) the content, direction, and strength of their relations and ties, 3) their composition, derived from the social attributes of the participants, and 4), their complexity, which indicates the number of relations in a tie.”

4)      Analyzing Social Media Networks with NodeXL

  • By: Marc Smith
  • Gist: NodeXL is an add-in to Excel that allows users to visualize social networks. (To see it in action, check out Smith’s crowd-sourced Flickr gallery). This paper demonstrates how to use it on a given social media data set (in this case, an enterprise intranet social network). Those less pressed for time might want to check out the book version, co-authored by Derek Hansen and Ben Schneiderman.

5)      Semantic Social Network Analysis

  • By: Guillaume Erétéo
  • Gist: Users of so called “enterprise 2.0” platforms often form heterogenous social networks, and in this thesis, Ereteo proposes a way to analyze these networks (for the purpose of creating project teams, identifying experts, fostering communication etc) using the Semantic SNA Framework (SEMSNA) and semantic community detection and controlled labeling (SEMTAGp).
  • Money quote: “The ‘optimal partition’, imposed by mathematics, does not necessarily capture the actual community structure of the network.”