Friday, August 21, 2009

Counting .Gov Pages

I've been fascinated by the government website page count for the last couple of weeks. The numbers just seem so extraordinary.

On August 5th 2009 I wrote that there were 112,000,000.

Today, there are 114,000,000. That would appear to mean 2,000,000 were freshly created in the last 15 days or so. I'm suspicious because the numbers are so round. It's not 112,123,456 and 114,654,321 ... but maybe Google rounds up at numbers that large rather than count exactly.

In a later post I showed the timeline that, at first glance, showed pages dated from the 1450s - but Google's timeline function seems to look for instances of a date in a document. Dan and I have debated why that might be and whilst he leans towards the view that it has some use, I'm a bit more sceptical.

But what you can do is look at pages modified in the last 24 hours, last week and last year (which I think it does by looking for dates within those ranges in the document, rather than looking at the upload date). For government this is:

Last 24 hours: 35,300

Last Week: 1,540,000

Last month: 2,080,000

Last year: 5,390,000

Again, those numbers don't seem to work ... 7 * 35,000 isn't 1.5m ... 52 * 1.5m isn't 5m. Checking every month through 2009, there's a reasonably consistent number of between 1.9m and 2.1m pages.

It's likely that edit frequency varies but it's been a busy week for UK government if 1.5m pages have been updated - if it cost just 1p to update a page, that would be £15,000 of effort ... if it costs £5, that would be £7.5m! But at the annual end, fi it's 5.3m pages at 1p then we're at £530k ... that's about 10 people at fully loaded costs. If it were £5 ... then we have armies of people updating pages.

In July 2001, google says only 20,900 pages were updated. In July 2009 it was 2,120,000. That's over 100 times as many pages.

I feel a table of analysis coming on to see if there's any sense in this. In the meantime, any clues what is going on? Is Google doing something that I haven't accounted for?

6 comments:

  1. Steph Gray9:27 pm

    Hmmm, I'm not sure you're going to get anything very meaningful from this process. Sure, Google page counts are a start, but age of page is going to be very hard to measure in an age of enterprise CMSes which enable editors to schedule whole swathes of the site to be republished (and thus change their last-modified dates) when there's a change to a footer or sidebar.

    Hopefully COI's new audit process will provide some half-decent numbers on costs and scale, in time.

    ReplyDelete
  2. Guest7:16 am

    are you sure google isn't just finding more rather than new being creater?

    ReplyDelete
  3. Dan Harrison9:06 pm

    By "sceptical", you mean "right", right?

    ReplyDelete
  4. you may be right.  but i'm not sure that there are too many "enterprise CMSes" that do global changes of the type you mention. i'd lay money that i can count them on the fingers of both hands, very possibly one hand!

    ReplyDelete
  5. nope, not sure ... do you have data?

    ReplyDelete
  6. often wrong, never in doubt

    ReplyDelete