Tuesday, October 4, 2016

Data Mining: Social Network Visualization


  • Data chosen: UCI Network Data Repository
  • Data subset: Les Miserables (https://networkdata.ics.uci.edu/data.php?id=109  (GML file).
  • I used Gephi, an open source data visualization tool, to display a weighted social network of the co-appearance of the characters in the book “Les Miserables” by Victor Hugo. The nodes represent the characters of the book, and an edge indicates that the two characters appear together in the same chapter of the novel, at least once. Node “id”, is a number between 1 and 77, and the label is the character's name. The edge represents the number of co-appearances between the two characters attached to the edge, and can therefore be weighted. The network is a directed graph of 77 nodes and 254 edges. The blue nodes represent high closeness centrality; the green nodes represent low closeness centrality. The network is an example of a directed and weighted cyclic graph.
  • 17 of the 77 characters (nodes) have a high closeness centrality, indicating that they are prominent and influential characters in the book.


Reference
D. E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing, Addison-Wesley, Reading, MA (1993).