An avoidable failure by the numbers

Failure to communicate

A Wall Street Journal article written this week revealed how poor communication and disjointed oversight were largely responsible for the debacle. A review of that article, and others on the same topic, reveals a well-known laundry list of the classic problems faced by big Web projects:

  • Distributed teams, siloed management structure and weak coordination: At the very least, the project involved the Center for Medicare and Medicaid Services (aka CMS), the Department of Health and Human Services (HHS), the White House, Quality Software Services Inc. (QSSI, a technology company), and CGI Group Inc. (another technology contractor). Nobody was charged with coordinating the efforts of these distributed teams, which made it nearly impossible for them to design, develop and test a coherent and functional product.
  • Role confusion: As a direct result of distributed teams and a lack of coordination, teams became confused about their roles (i.e., who was responsible for what). This confusion led to CMS acting as a systems integrator trying to cobble things together, when that job probably should have fallen to CGI Federal (or somebody with deep systems integration expertise).
  • Compressed timeline: A "compressed" timeline is industry parlance for "not enough time to build what has been designed." This demon is incredibly common for big Web projects, and almost always results in teams being taken on a death march down a road to inevitable failure.
  • Inadequate testing: Testing is almost always the first thing to go out the window when time runs short, and clearly, a lack of adequate testing was a huge problem for In particular, it’s abundantly clear that the application was not load tested, which is why it crashed with the onslaught of millions of people trying to use the system. It’s not as if this is the first system that has had to meet these challenges; many commercial Web applications have succeeded under significantly greater loads. In addition to inadequate load testing, insurers were not allowed to begin testing critical functionality until September, just one month before launch. Many of the glitches in the application were related to incorrect assessment of costs and eligibility, which falls squarely in the lap of insurers, who clearly didn’t have enough time to expose these flaws more clearly.
  • Shifting requirements driven by policy and politics: The requirements for were driven by legal policy and occasionally by politics. First, the Supreme Court had to weigh in on the constitutionality of the law. Second, Obama was campaigning for re-election, which led the administration to delay issuing requirements for the project. Third, over 30 states refused to set up their own exchanges, which created late-breaking changes in requirements. Anyone who knows anything about big Web projects knows that if you keep changing requirements, but you don’t change the launch date, you’re in for trouble.
  • Mistaken impressions regarding progress: Leaders were under the impression that the project was moving along well, when in fact it was nowhere close. A prototype was demonstrated at the White House over the summer, but it was all smoke and mirrors; the prototype did not reflect a system with mission-critical application functionality (e.g., identity verification and eligibility determination). In essence, it was a mock up that misled senior officials about the actual rate of development progress. Even the senior lead of the policy arm of the project (Gary Cohen) reported to Congress on Sept. 19 (a little over one week before launch) that everything was on track. Contractors actually testified before a House committee that they warned people the site had not been adequately tested, but apparently, leadership did not get that message. The NY Times also reported that the chief digital architect was deeply troubled about the state of the project.
  • Weak technical execution: A number of technically skilled people looked at the open-sourced code before it was mysteriously removed from GitHub. One analysis I read showed how some basic code optimizations (e.g., Javascript minification) could have radically improved a number of front-end performance issues. Another article on the Huffington Post revealed a codebase using sophisticated technologies, but with a lack of solid execution. These problems could either be attributed to teams working quickly under pressure, or teams without the necessary technical expertise. In either case, the result is clear – a buggy, broken Web application.
  • Poor communication: Everyone who has ever managed a large, complex Web project knows that communications overhead goes up significantly when many different teams are involved. It seems clear that the teams developing were not communicating well, and leadership was not getting accurate messages about the state of the project.
  • Bad launch decision: The first rule of Web application launches is to never launch a product whose core critical systems are badly broken. The Obama administration had stated a date for launch, and the powers that be pushed to launch, come hell or high water. This mindset was clearly aimed at avoiding the inevitable backlash and criticism from Republicans if they had delayed launch. Unfortunately, the launch of a bad product just led to more backlash and criticism, and not just from Republicans. It was a stupid, uninformed, politically-motivated decision, and it could have far-reaching consequences.

How to get big Web projects right

The tragic thing about the list above is that these problems are all well-known and avoidable. It’s not like the people charged with tackling were heading into some kind of Web technology terra incognita. On the contrary – businesses big and small have been doing this stuff, successfully, for years (although roughly 30% of IT projects over $1 MM fail). And yet people make the same mistakes over and over and over, when the solutions are clear:

  • Identify and support an empowered leader: Big projects need good leaders with a strong sense of direction, experience, and the ability to communicate effectively with people across teams. They should be open, and good listeners, with the ability to make smart decisions and course-correct as necessary (e.g., when projects starts going off the rails, which they often do). Finally, this leader needs to be empowered to do their job; if they have oversight over multiple teams, then they need decision-making power that crosses those teams.
  • Develop an actionable strategy: Too many projects take the "Ready, Fire, Aim" approach (i.e., no plan exists for how things are going to get done). Devote some time at the beginning of a project to develop an actionable strategy that’s aligned with and achieves project goals.
  • Build the right team: From project management to user experience design to technology, big projects need to have great teams to deliver a great Web site or application. The team needs to have experience running, designing and developing for the type of product being delivered, and they need to execute well. If multiple teams exist, then it’s critical that at least one person acts as a bridge between those teams.
  • Foster cross-functional collaboration and open communication: Core values can help keep teams aligned, and two critical values for big Web projects have to do with collaboration and communication. People across teams need to work effectively not just within their team, but between teams. Solid collaboration requires robust and open communication. Additionally, people need to be able to communicate freely about their fears and doubts, or problems within the project, without fear of retribution.
  • Design and build the MVP: Big projects can often give rise to excessive complexity. Numerous studies have shown that smaller projects have higher success rates, so keeping a project as small as possible is preferred. Focus on designing and building the minimum viable product (MVP), and no more.
  • Avoid fixed launch dates if possible: Scope and effort are notoriously hard to estimate for large software projects. The cone of uncertainty can lead to large errors in the beginning of a project, which means fixing a launch date is usually a fool’s errand. If you have to pick a date, come up with a high estimate for your MVP and use that as your conservative baseline (i.e., pick a launch date after that).
  • Refine the project plan: Every project needs to continually refine its own plan, and assess the degree to which it is on track (i.e., can the current set of features in scope be delivered by the target launch date). If the current scope can’t be delivered, then either scope / features need to be reduced, or the launch date needs to shift. It’s that simple. Throwing more people at a late software project will actually make it later (Brook’s Law).
  • Test, test, test: Complex systems often have emergent behaviors, and nowhere is this more true than with large Web projects. Bugs, glitches and problems can crop up all over the place once everything gets wired together. For this reason, it’s critical to test these types of applications relentlessly (e.g., unit testing, integration testing, load testing, usability testing, accessibility testing).
  • Don’t launch a broken product: While it may seem self-evident, broken products (i.e., those that don’t cross an agreed-upon viability threshold during testing) shouldn’t be launched. People may be angry if a site or application launches late; they’ll be angrier if it launches on time, but is so broken that it can’t be used.

Don’t get me wrong – big Web projects aren’t easy. It takes a lot of skill to execute one well. In fact, it’s entirely possible that one could do all of the things I’ve listed above, and still have a project fail (e.g., funding gets pulled, conflicts arise with other projects or company initiatives, natural disaster strikes). At the very least, people running big Web projects should learn the lessons of history, and follow generally accepted wisdom as much as possible. Big Web projects that don’t follow these guidelines are likely to walk in the footsteps of, straight down the road to failure.

Update (11/05/2013): Since writing this post, I’ve come across a number of articles on the debacle. I found this Fast Company post on-point, particularly in how the author makes the case for requiring all public-facing government software to be open source. This author and others have also pointed out a few problems that I neglected to mention, including:

Regarding fixes, HBR posted an insightful article outlining how simplicity principles could potentially help fix Obamacare and While this post does offer some useful guiding principles, it doesn’t address the full spectrum of failures is experiencing.