Blogging: how we watch our navels
Filed in: WWW2004, Web, Fri, May 21 2004 22:15 PT
This is a conference blog post about a presenter talking about blogging. Love my navel. Looooooove my navel. Who’s a pretty navel? You are!
Daniel Gruhl of IBM Research presented on “Information Diffusion Through Blogspace”. He described blogging as “just on the cusp of being absolutely universally well-known” and “the greatest thing for exhibitionism… since windows.” Some that die as soon as they’re created, and some that are updated with “pathological intensity.”
The researchers gathered up 11,804 RSS feeds last September and October, comprising 401,021 entries. There were 14 news syndication sites included for comparison.
There’s related work in topic tracking and characterization, epidemiology (blogging is a virus!), network models and graphs, and how information is diffused, including threshold models (after x connections of mine mention a topic, at which point do I start talking about it), and other things that are used in viral marketing. They tracked URL references, noting that while 100,000 distinct links were in the corpus, yet only 700 appeared 10 times or more. (Power laws.) They looked at recurring text sequences, but didn’t get much value out of that. Likewise names of known personalities. Proper nouns were more numerous. As were individual words, but they had to find value in that by comparing the daily frequency of those words against historical frequencies.
They came up with a class of 340 “classic” topics. They broke down into spikes (one-shot deals, things that don’t get discussion over a longer period of time), chatter (low-volume but consistent topics, like “Alzheimer’s”), and “spiky chatter” (frequently mentioned, and likewise prone to spikes, such as “Microsoft”). Spiky chatter was most interesting to them. Once they’ve isolated one of these terms, they try to split out subtopics based on other terms, so they can try to normalize the overall graph. Spikes usually last no more than one or two weeks.
So, how do we bloggers infect one another? (Insert MeetUp joke here.) He described the echo chamber, and used a formal transmission graph to show how ideas propagate. IBM built a propagation model: each connection has two parameters, including probability that an idea-infectable person reads a given blog on a given day. If that doesn’t happen, it doesn’t get spread.
Is it possible to track the echo chamber? That is, can you determine one’s sphere of influence from the people who spread their memes? They presented an algorithm (and again, I napped. Sorry, I’m not the math guy.) But he did show a picture. Using a transmission graph, he shows how he’s trying to determine whick of a number of bloggers infected a given person later on the timeline. The algorithm iterates over the data to arrive at the variables in the graph.
Then they applied it to the blog data. Items were restricted to things not found in the media to reduce noise. The top 100 blogs had a high correlation with blogstreet.com. Then the researchers went back and manually looked at certain bloggers to try to conclude whether indeterminate cases could be resolved (that is, whether two bloggers knew each other), and found that in 90% of cases this was true. The chart that resulted looks — shock! — like the power law curve. There was a measure of “fanout” in which the vast majority were lower than 1 (energy of a story dies before spreading), and very few with powerful capability to spread memes.
Question about out-of-band communication, like mailing lists and so on. He said he’d like to add an item of “the outside world” to the study (ouch. Sounds like he knows us.) Did he consider using Trackback? No. It wasn’t available broadly enough.
(This conference is going to have a serious blogging tilt in 2005. Everybody is seeing blogging as a usage pattern that they really, really want to study. And in the next year, they’ll be seeing just how huge it’s going to get. Start preparing your papers.)