Lectures‎ > ‎

Graph prestige

Drawing e-mail connectivity graph

We can start by counting enron email connections:

1688 pete.davis@enron.com->pete.davis@enron.com
363 enron.announcements@enron.com->all.houston@enron.com
326 enron.announcements@enron.com->all.worldwide@enron.com
209 robin.rodrigue@enron.com->gabriel.monroy@enron.com
203 michelle.nelson@enron.com->mike.maggi@enron.com
180 enron.announcements@enron.com->houston.report@enron.com
140 mike.maggi@enron.com->michelle.nelson@enron.com
113 hunter.shively@enron.com->airam.arteaga@enron.com
111 enron.announcements@enron.com->all.states@enron.com
96 announcements.enron@enron.com->dl-ga-all_domestic@enron.com
94 jeffrey.shankman@enron.com->jennifer.burns@enron.com
84 kate.symes@enron.com->kerri.thompson@enron.com
82 kate.symes@enron.com->evelyn.metoyer@enron.com
77 drew.fossum@enron.com->martha.benner@enron.com
71 michelle.cash@enron.com->twanda.sweet@enron.com
69 enron.announcements@enron.com->all_ena_egm_eim@enron.com
69 elizabeth.sager@enron.com->brenda.whitehead@enron.com
68 darron.giron@enron.com->carole.frank@enron.com
68 andrea.ring@enron.com->richard.ring@enron.com
64 office.chairman@enron.com->all.worldwide@enron.com
63 announcements.enron@enron.com->dl-ga-all_enron_houston_employees@enron.com
62 andrea.ring@enron.com->michele.winckowski@enron.com
61 tim.belden@enron.com->center.dl-portland@enron.com
60 stacy.dickson@enron.com->gregg.penman@enron.com
60 mike.grigsby@enron.com->anne.bike@enron.com
58 v.weldon@enron.com->mark.schlueter@enron.com
55 robin.rodrigue@enron.com->kori.loibl@enron.com
55 enron.announcements@enron.com->ena.employees@enron.com
53 announcements.enron@enron.com->dl-ga-all_enron_houston@enron.com
51 jim.schwieger@enron.com->jim.schwieger@enron.com
51 eric.bass@enron.com->shanna.husser@enron.com
49 kay.mann@enron.com->carlos.sole@enron.com
...

That gives us an idea of who is communicating the most, or at least to who is sending the most e-mail to whom. Still, this is pretty hard to visualize. All we have to do is dump those connections above into a graphviz dot file:

digraph comms {
"debra.perlingiere@enron.com" -> "kim.ward@enron.com"
"john.griffith@enron.com" -> "colleen.sullivan@enron.com"
"john.griffith@enron.com" -> "jim.meyn@enron.com"
...

and then we can get a visualization. Unfortunately, the photograph is about 6700 lines and crashes graphviz. If we restrict our graph to at least 10 e-mail communications, we get something to render. there are some interesting relationships.

This guy sends a lot of e-mail:


this guy is the king of getting e-mail:


And this picture might denote boss/assistant type relationships:


Prestige and Hubs

A good approximation to the prestige (authority) of a person is how many incoming links they have such as the Richard Shapiro guy (he was senior vice president of the Enron Corporation). We can derive that by looking at the graph above, or by simply counting the number of inward links. Here is a histogram (limited to 500 emails each for all 150 people):

1690 pete.davis@enron.com
453 all.worldwide@enron.com
402 all.houston@enron.com
337 louise.kitchen@enron.com
280 center.dl-portland@enron.com
254 mike.maggi@enron.com
213 gabriel.monroy@enron.com
183 sara.shackleton@enron.com
181 houston.report@enron.com
181 andy.zipper@enron.com
177 rod.hayslett@enron.com
161 richard.shapiro@enron.com
154 dl-ga-all_enron_worldwide1@enron.com
153 richard.ring@enron.com
152 kam.keiser@enron.com
151 michelle.nelson@enron.com
148 barry.tycholiz@enron.com
147 dan.hyvl@enron.com
144 dl-ga-all_enron_worldwide2@enron.com
144 danny.mccarty@enron.com
142 marie.heard@enron.com
...

We can also look at the number of outgoing links as an approximation for hubs. Here is a histogram of the number of outgoing e-mails. Richard Sanders was assistant general counsel.

1690 pete.davis@enron.com
1374 enron.announcements@enron.com
518 kate.symes@enron.com Trader
438 john.lavorato@enron.com CEO enron america
438 hunter.shively@enron.com  VP, trader
429 robin.rodrigue@enron.com analyst
396 drew.fossum@enron.com Vice President, General Counsel
388 daren.farmer@enron.com logistics manager
384 jeffrey.shankman@enron.com  president Enron global markets
365 mike.mcconnell@enron.com
359 john.arnold@enron.com  VP
340 announcements.enron@enron.com
330 sally.beck@enron.com  chief operating officer
326 mike.grigsby@enron.com VP
312 scott.neal@enron.com VP, trader
308 eric.bass@enron.com  trader
302 phillip.love@enron.com
301 andrea.ring@enron.com
297 david.delainey@enron.com
...

From Communication Networks from the Enron Email Corpus: “pete.davis.@enron.com”, which is known to have served as a
proxy for automatically generated broadcast emails by Enron processes.

Centrality

If we combine (add) the hubs and authority measures, we get a measure of the centrality

3380 pete.davis@enron.com
1374 enron.announcements@enron.com
568 john.lavorato@enron.com CEO enron
543 kate.symes@enron.com
495 jeffrey.shankman@enron.com
476 hunter.shively@enron.com
453 mike.maggi@enron.com
453 all.worldwide@enron.com
433 robin.rodrigue@enron.com
420 drew.fossum@enron.com
407 john.arnold@enron.com
402 all.houston@enron.com
394 mike.mcconnell@enron.com
390 daren.farmer@enron.com
375 mike.grigsby@enron.com
369 sara.shackleton@enron.com
368 louise.kitchen@enron.com
363 david.delainey@enron.com
354 michelle.nelson@enron.com
351 debra.perlingiere@enron.com
341 announcements.enron@enron.com
339 sally.beck@enron.com
335 dan.hyvl@enron.com
333 eric.bass@enron.com
326 scott.neal@enron.com
326 rick.buy@enron.com
326 phillip.love@enron.com
323 andrea.ring@enron.com
320 susan.scott@enron.com
318 tracy.geaccone@enron.com
318 michelle.cash@enron.com
312 stacy.dickson@enron.com
302 errol.mclaughlin@enron.com
300 elizabeth.sager@enron.com
...


def centrality(addrs, inward, outward):
  cent = defaultdict(int) # empty dictionary
  for addr in addrs:
    cent[addr] = inward[addr] + outward[addr]
return cent

More accurate prestige

A node's prestige is a function of the prestige of the people that send them e-mail. For example, if Pres. Obama sent me mail, that would increase my "prestige". Starting with a prestige of 1.0 for each node, weekend update the prestige by summing up the prestige values for all incoming links. If we do that one time, we get something a looks like this:

7077087.0 richard.shapiro@enron.com
7077087.0 howard.fromer@enron.com
6497376.0 bonnie.white@enron.com
5058462.0 britt.davis@enron.com
4707735.0 h..george@enron.com
4516048.0 holly.keiser@enron.com
4483189.0 claudia.meraz@enron.com
4483044.0 jan.cooley@enron.com
4471005.0 suzanne.vann@enron.com
4469347.0 b..sanders@enron.com
2384648.0 greg.whalley@enron.com
1846379.0 brenda.whitehead@enron.com
1785422.0 carol.st.@enron.com
1780547.0 janice.moore@enron.com
...

The numbers get big fast because we are adding up the prestige scores that are themselves being increased. The problem is with doing this once, is that those people updated first are using prestige values of just 1.0, because we haven't gotten to updating those later peoples prestige values yet. We need to iterate. The problem is that the numbers are going to quickly outpace the size of a floating-point number. Secondly, there's no guarantee that we will converge to a fixed vector of prestige numbers. how do we know when to stop iterating? From what I can tell, people simply run this a fixed number of times.

p' = E^Tp for edge matrix E and prestige vector p

for k steps:
  p' = sum over u of E[u,v]p[u] # sum prestige[u] over all incoming links to u

It turns out, that if we normalize the new prestige values at each iteration, the floating-point numbers will stay in range and the algorithm is known to approach a fixed vector because of some mathematical relationships (eigenvector stuff). It seems like a better way to terminate iteration is to measure the difference between previous factor and current prestige vector. When that gets small enough, call it done.

Power iteration:

p_k+1 = (E * p_k) / |E * p_k|

It looks like converges after only two steps. I computed the delta between the previous prestige vector and the newly updated prestige vector:

63.1395215761
1.11634476843
0.192837850654
0.0669503608214

Histogram of the prestige values looks like:

0.2390972008 louise.kitchen@enron.com
0.2292314060 tana.jones@enron.com
0.2105822140 mark.taylor@enron.com
0.1848901570 sara.shackleton@enron.com
0.1819532146 stephanie.panus@enron.com
0.1570978590 marie.heard@enron.com
0.1269224455 john.lavorato@enron.com
0.1254854688 carol.clair@enron.com
0.1253060328 kim.ward@enron.com
0.1245588375 susan.bailey@enron.com
0.1229072244 david.forster@enron.com
0.1215238036 mark.haedicke@enron.com
0.1191156055 barry.tycholiz@enron.com
0.1188921121 richard.sanders@enron.com
0.1072586593 stephanie.sever@enron.com
0.1040185526 dan.hyvl@enron.com
0.1037619562 jeffrey.hodge@enron.com
0.1026769070 jonathan.mckay@enron.com
0.0975462508 andy.zipper@enron.com
0.0928542678 greg.whalley@enron.com
0.0927619252 peter.keohane@enron.com
0.0899782929 john.griffith@enron.com
...

turning off normalization still reaches fixed point:

144494.0000000000 louise.kitchen@enron.com
121838.0000000000 tana.jones@enron.com
114456.0000000000 mark.taylor@enron.com
97519.0000000000 sara.shackleton@enron.com
94609.0000000000 stephanie.panus@enron.com
81815.0000000000 marie.heard@enron.com
79439.0000000000 john.lavorato@enron.com
75709.0000000000 kim.ward@enron.com
74508.0000000000 barry.tycholiz@enron.com
68223.0000000000 mark.haedicke@enron.com
68196.0000000000 david.forster@enron.com
68132.0000000000 richard.sanders@enron.com
67267.0000000000 carol.clair@enron.com
66626.0000000000 susan.bailey@enron.com
64486.0000000000 jonathan.mckay@enron.com
63084.0000000000 stephanie.sever@enron.com
60591.0000000000 jeffrey.hodge@enron.com
60577.0000000000 dan.hyvl@enron.com
57054.0000000000 andy.zipper@enron.com
56896.0000000000 john.griffith@enron.com
56091.0000000000 greg.whalley@enron.com
53519.0000000000 tim.belden@enron.com
51979.0000000000 peter.keohane@enron.com
51622.0000000000 m..presto@enron.com
50112.0000000000 john.arnold@enron.com
49372.0000000000 jason.williams@enron.com
47542.0000000000 gerald.nemec@enron.com
46429.0000000000 steven.kean@enron.com
45898.0000000000 jim.schwieger@enron.com
44886.0000000000 elizabeth.sager@enron.com
44625.0000000000 richard.shapiro@enron.com
44573.0000000000 david.delainey@enron.com
43070.0000000000 russell.diamond@enron.com
41514.0000000000 jeffrey.shankman@enron.com
...

I get same list I think.

HITS

Manning chapter 21: "A good hub page is one that points to many good authorities; a good authority page is one that is pointed to by many good hub pages."

HITS algorithm, Manning on hubs/authorites


Comments