Thursday, January 02, 2003
Flooding the flood warning
First demand-driven problem of the year hits government websites. And it's only January the 2nd and it's not even noon. The heavy rain over the last few days has meant that the Environment Agency's website that gives details on which areas are likely to be flooded has been overwhelmed with demand and is presently down. I can't find this on any of the news channels, but most stories are carrying both the phone number and the web address - so use the phone. In my "things we'll learn" year end wrap-up, I thought that we'd have a few more of these kind of things. For me, it reinforces 3 of my points: e-government is hard; demand is there if the service is good and offers value; the government lacks intelligent customers who can manage their suppliers to ensure that this kind of thing is covered. This is a bit of a "leaves on the line" story - every year something happens, every year it's a bit different from the previous year, but there are enough similarities for us to draw conclusions and put the right protective measures in place. The lessons are there to be learnt and, although it's not easy, we need to learn them faster and more conclusively. This time last year, the PRO's 1901 Census site was the poster child for website failure as it buckled under demand. You could argue whether that was predictable or not - I was told a few months before that it would be huge but didn't really get it. With the Environment Agency's site it's a little harder to argue - there's been rain for a few days now; flood warnings are in place; pretty much every news story online is pointing to their site. Looks like it was predictable. There are (and even were) three ways out of this situation, any of which would have resulted in the site being up and their reputation being intact: 1. Robust Design If you know this kind of thing is going to happen, you design your site to take that into account. For years now busy sites have incorporate resilient design; sites that manage heavy downloads built mirror services in, edge caching and so on. When the BBC site expects heavy traffic the team there strip out graphics and extraneous content to make the download performant; MSN stores its entire site in the front end web servers so that there is no dynamic content generation (so when the site looks to be filling up, they can rapidly deploy new servers and copy the site to them). So people know how to do this. It is not, however, cheap. And if you only expect one of these events very few months, there's not much ROI there. But for a major site to fail with a 404 error when it's at its most needed moment is close to unforgiveable, so you'd have good reason to expect some or all of these kind of measures were in place. 2. Centralisation If the economics at a local level or departmental level don't justify the kind of spend on resilience that's required, then you move the content and the applications somewhere that does. Giving up control is hard in government. Giving up control of your IT is even harder - but this is just another kind of outsourcing, but one where you get a bigger say in how things get done and probably better oversight. A central service is more likely to be able to deal with peaks, because they will occur more regularly, so the site will be tested more frequently at high loads and the people that run it will know how it responds. Of course, there is always the 50 year storm peak - the end of the tax year, war in Iraq, floods in the UK and a Ministerial scandal, say, that might cause an exceptional load - but it's still cheaper to handle this kind of thing centrally. 3. Syndication The science of syndication is not well understood for things like this, but it's certainly feasible that the main pieces of content could be offered up to a variety of major sites so that no single site is hit heavily. If the agency ran its model for floods on all the main areas of the UK and then offered news sites a summary of the content generated then I'm sure most sites could build it into their system. This method requires more work in advance, more work certainly for the content recipients but it's not dissimilar to the plan adopted for the Iraq Dossier when sites all over government and in the commercial sector were given access to the document a few minutes ahead of schedule. I imagine that the flood warning app is more complicated than this but, nonetheless, it must be achievable. So any of these 3 or maybe all of them are worth pursuing. And if e-government's reputation is going to grow rather than remain tarnished, these measures must be taken for the major services. It seems to me that, given this situation is going to rotate around government fairly regularly for the next couple of years or so, there's a need for a kind of "Mutual Reliance" plan. Last year was PRO and IR, this year is Environment - it could just as easily be the courts' site if there is a major trial or Customs at the end of a busy VAT quarter or a local authority that introduces an innovative new service. It might even be the congestion charging website. But it's going to be someone and probably several someones. I doubt that e-government can take many more hits before it loses what little chance it still has and people revert to phone lines, paper channels and commercial websites. So, we need a combination of those 3 options put in place: - A team that reviews sites for capacity limits, recommends upgrades as necessary. Each upgrade will need to be rated against its cost/benefit. There will also need to be a failure recovery plan, so that if a site does go down it fails gracefully (not with a 404, but with a message that tells people where else they can check for information for instance). - A move to put critical services in a more robust environment, managed by the centre but with close co-operation from all of the departments involved and their IT suppliers - A programme to map out the route to achieve proper syndication around commercial sites of key content and applications (or the output of those applications). We don't do that and there's little chance I think. I'm going to be doing a presentation on "mobile government" later in the month and I'd been pondering the flood warning system as a killer-app for the mobile alerts service that government so badly needs. I talked about this in March 2002 at another conference and Steve Ranger from Computing picked it up in May - there are a bunch of time-critical or time-driven services that government offered that would benefit from alerts issued to peoples' mobile phones. I have some worries about how this might work and the potential for cockup that we expose ourselves too - today's example only reinforces my worries. If people are relying on receiving a text before taking action but the service is down, how much more exposed are they?
Posted by Alan at Thursday, January 02, 2003