A census of blogspace

Phil Wolff over at a klog apart offers the thought that we need a census of blogspace. He asks a few questions, as well, about the size of the “blogsphere”:

Do you have an educated guess?
Not even remotely. No idea. I could pick a figure, but it could be out by two orders of magnitude.
Do you know of any prior work in this area?
Not that I’m aware of, I must admit, although it could well have passed me by.
Can you think of a methodology or two to create useful measures of the number of bloggers and the number of weblogs?
Google. Google is the best way for queries about all of the net, because it indexes all of the net. You could get a rough estimate of the number of webloggers by making a few simplifying assumptions: all webloggers either have their own domain or are using one of a few weblogging hosts (blogspot, Livejournal, etc, it’s a fairly short list), getting user counts from each of the major hosts, and then searching Google for the word “permalink” and extracting the number of unique domains. That’ll be a low estimate, because there are multiple weblogs on some domains, and because not all weblogs use the word permalink, but it’d be a figure to begin working with.
The other alternative is to assume that all weblogs are interconnected (see the next question), start at one place, and link-crawl yourself, counting as you go. You’d need rules of what constituted a weblog, which is something not well-defined for a person looking at one, never mind an automated process, but hey.
What related questions would you want answered?
How many different “islands” are there in the interconnected map of weblogs? Can you navigate from any given blog to any other blog by merely travelling links between weblogs? What does the map look like? What’s the most connected node? Which node is at the centre of the map? Lots of questions about the map of links, really.
How might you use this information?
Blimes, I dunno. It’d be interesting to look at :) I could do a “six steps to as days pass by” thing, or something.
Pitfalls to avoid?
No idea, guv. At this stage, where there’s no data at all, any data is better than none, so make assumptions, guess figures, and so on. We can refine the data later.
Would you join a BlogCensus.org to provide and share stats?
Suppose so, but I always find that sort of thing fairly silly, because the audience is self-selecting. The Linux Counter is much the same principle, and it’s pretty useless in terms of information.

More in the discussion (powered by webmentions)

  • (no mentions, yet.)