5 Jul 2012
The Accumulo Challenge, Part II
In Part I, we discussed the Senate Armed Services Committee (SASC)’s attempt to hobble the open source Accumulo project in the DOD. They directed the Department’s CIO to jump through a number of reporting hoops before Accumulo would be allowed inside the DOD, and directed the Accumulo team to upstream their work into related open source projects. It appears to be an attempt to dismantle the project on the assumption that it was competing with products and project from the private sector.
The Accumulo case isn’t the first, and will not be the last, project to encounter this kind of resistance. As the government gets more comfortable with open source, it will inevitably create more of its own projects, and collaborate with existing projects. So rather than think about Accumulo narrowly, this is a good time to think more generally about how the government creates open source projects, how it chooses supported or unsupported software, and what should happen when government projects begin to compete with the private sector.
Here’s my take, animated largely by OMB Circular A-130 §8b1(b) which I mentioned in Part I. Though A-130 is mostly toothless, it does embody a number of common-sense IT practices and it’s as good a framework as any to answer some of these questions. Now’s a good time to re-read that section if you’re not already familiar with it.
Do government-sponsored open source projects automatically compete with the private sector?
Here’s a thought experiment: what if Accumulo were the product of a technology transfer project via a CRADA, an IRAD or SBIR program? In these, the government collaborates with the private sector to develop technology, and often provides the creator of that technology with certain special rights to the work product so they can make money on it. Open source is the same, except the intellectual property is available to everyone, not just the prime contractor. So there’s nothing inherently anti-competitive about open source. In fact, using open source as a technology transfer strategy can contribute to the innovation and growth of the US software sector.
Historically, the government’s projects have helped, not hurt, the private sector. From high-performance computing to security, the government’s open source projects have advanced technology, created businesses, and generally lifted all boats. I’ll go so far as to say that open source should be the preferred method of technology transfer, and I’m not the only one.
This isn’t to say that any open source project is worth doing. §8b1(b)(iii) of A-130 tells us that the government should not create duplicative projects when one already exists in the commercial world:
(iii) Support work processes that it has simplified or otherwise redesigned to reduce costs, improve effectiveness, and make maximum use of commercial, off-the-shelf technology;
Provided the project in question isn’t redundant, open source is a great way of developing new technologies and ensuring that they’re widely available to both the government and the private sector.
Is it fair for the government to support open source with staff and money?
So the government can sponsor innovative projects, provided there’s an obvious mission need. That leaves us with the question of government-sponsored contributors. Many of the Accumulo maintainers are government employees or contractors. We’re now in a strange gray area where the government is providing resources to a commercial software project that may (eventually) compete directly with other commercially available offerings. Is this bad?
If a government program finds a mission need unfulfilled by the commercial world, it makes sense to spend resources to satisfy that need. The US Air Force Network Integration Center, for example, produces the “Lightweight Portable Security” operating system that allows telecommuters to safely connect to DOD networks. It’s based on Linux. Although there are many commercial Linux distributions available, none satisfy LPS’ requirements exactly. Crucially, AFNIC is not simply repackaging Linux. They are re-using as much commercially available software as possible, and spending their time and money adding real, differentiated value to an otherwise generic Linux distribution. This means that LPS is consistent with §8b1(b)(iv) of A-130:
(iv) Reduce risk by avoiding or isolating custom designed components, using components that can be fully tested or prototyped prior to production, and ensuring involvement and support of users;
The Accumulo case, at least at the start, was no different. Cell-level security and the other features that make Accumulo unique were not commercially available to the NSA. Like LPS, they built what they needed on commercially available components. This is very common, and the fact that the NSA had the presence of mind to release that work as an open source project should be rewarded.
The fact that Accumulo began as government-developed software and is now commercially available software has no bearing on the government’s ability to continue working on and improving that software. Improving software is, after all, explicitly encouraged by the DOD “Open Source Memo” of 2009:
(ii) The unrestricted ability to modify software source code enables the Department to respond more rapidly to changing situations, missions, and future threats.
Having said all that: just because the government can customize software for its needs doesn’t mean that they should. Don’t forget that §8b1(b)(i) and §8b1(b)(ii) of A-130 also have to be satisfied. If a project is suddenly duplicative or not in keeping with an agency’s mission, the project should use an existing, commercial component.
So what happens when a project becomes duplicative?
SASC was worried that Accumulo was somehow interfering with the HBase and Cassandra projects, which solve many, though not all, of the same problems Accumulo does. If HBase or another project begin to respond to the government’s needs and make Accumulo-like features available, A-130 suggests that the government to take a close look at those commercially available offerings or upstream its code to eliminate the conflict. §8b1(b)(v) and good open source manners agree on this:
(v) Demonstrate a projected return on the investment that is clearly equal to or better than alternative uses of available public resources…
Of course, the Accumulo folks already released their project to the public, and did so when there was no competition for the solution they provided. When they did that, Accumulo became commercial software under the terms of the FAR and DFAR. That changes the situation entirely. Accumulo is now just like any other commercial offering, and can win or lose on its merits. As we discussed earlier, the government should be free to support that project as long as it’s advancing its mission in a cost-effective manner. No harm done, here. In fact, it’s great that Accumulo is advancing development in this area of software and encouraging projects like HBase and Cassandra to keep up. This is an example of the market working, without SASC’s intervention.
Can the government fork an open source project?
“Forking,” which means creating your own version of a project rather than collaborating with its mainstream community, is slightly more complicated. Forking is widely understood as a last-resort, nuclear option in the open source community because it creates technical debt and robs the mainstream project of valuable attention.
If the government created its own version of an existing, functional project without a clear reason, it’s not just a bad technical decision, it’s also bad policy. It’s not an agency’s mission, for instance, to create a general-purpose operating system or word processor. There are plenty of these available as commercial products, and there’s no reason the government should undermine the US software industry by crowding out viable alternatives already available to the public.
Better strategies are available. The government (just like a private company) can make every effort to use existing, commercially available code rather than assume responsibility for their own version of the whole project. This may mean adding specific modules that they need or maintaining an optional set of patches to the mainstream project. §8b1(b)(iv) is explicit about this, as we already discussed, and it’s already policy for the DOD-operated source repository, forge.mil. The forge.mil folks actively discourage commercial code from being hosted there, for fear that they’ll – even inadvertently – create a government-only fork of the project. If the government needs to alter commercial software for their own purposes, they should confine their activity to the specific changes they need rather than create a wholly new work.
None of this means that a fork is forbidden. It means that the decision to fork is serious, and the agency who forks should be prepared with an analysis of alternatives, cost justifications, a support and maintenance plan, as well as a risk mitigation strategy. As a practical matter, it’s far easier to avoid a fork in the first place.
So when should the government merge a project? Who gets to decide?
One can imagine any number of legitimate reasons why the Agency elected to build and support an alternative to HBase or Cassandra. Are they making a bad decision? That’s entirely possible. I think, though, that agencies should be permitted this discretion. As long as they’re creating something relevant to the mission and genuinely new – which was indisputably true of Accumulo – this kind of freedom is, on balance, a benefit.
I can imagine a time when Apache’s Accumulo project would decide that competing with the other projects is too costly. Project leaders would then elect to send some Accumulo-specific features to other upstream projects, in hopes of relieving their maintenance burden. Again, this happens all the time and is one of the great possibilities of the open source process.
It could work the other way, too: the HBase maintainers may decide that Accumulo’s features are valuable, and choose to drag some of them from Accumulo back into HBase. They’re perfectly free to do so, and this is also something that happens all the time in open source projects.
Sadly, SASC has decided that it’s better-qualified to manage the Accumulo project than the project maintainers. This is a dangerous precedent for all open source projects. The intention behind the legislation is good: forks are generally bad, and the government should avoid competing with the private sector. At the same time, Accumulo is open source. That inoculates Accumulo against these criticisms, and in any case §8b1(b) and the DOD open source memo of 2009 provide plenty of guidance the program manager can use to make this kind of decision. Adding language to the 2013 NDAA is completely unnecessary and probably counterproductive.
So what’s to be done?
If the code was closed, you could argue that Accumulo is squandering government resources and interfering with the private sector. Fortunately, that’s not the case – it’s free for anyone, including the Cassandra or HBase projects, to use. It’s useful to think about Accumulo as a shrewd government strategy to advance the state of the art. Knowing that the government is concerned about things like cell-level security, Accumulo has already encouraged other, similar, projects to address these same questions. In this way, the competition between Accumulo and HBase is not altogether different than the kind of competition we see in scientific research: many parties, all advancing the state of the art, to the benefit of everyone. We wouldn’t dream of treating government-funded research as a threat to privately funded research, so what makes these software projects different?
While I don’t feel Congressional intervention is necessary, it’s obvious that we need to talk about ways to refine policies to make these situations less uncertain. Specifically, how will we know when government projects are permissible, and do not interfere with the good work of the private sector? I’ll offer a few questions every government software project should ask itself to answer that important question:
- Is it critical to the mission? Obviously, the government agencies shouldn’t be wasting their time creating projects that are superfluous to their mission. This one’s easy, and covered by §8b1(b)(i) of A-130.
- Are you sure there’s no commercial solution? There are all kinds of tools available to government project managers that can help them learn more about the market for their problem. Through RFIs, Industry Days, and regular communication with industry, you can make sure that you’ve exhausted your commercial options before embarking on the complex and expensive process of maintaining your own software project. In Section 929, SASC directs the DOD CIO to ensure this homework has been done, but the rules and process for doing this are already very clear under §8b1(b)(ii) and §8b1(b)(iii).
- Are you sure there are no alternatives? I mean, really sure? Don’t just rely on industry to tell you what’s out there. Many of your solutions may be available as open source projects that don’t benefit from a commercial entity that can respond to RFIs or attend industry days. Be deliberate about this: it’s easy to do a quick Google search and declare the market barren. It’s also easy to raise the bar impossibly high, and say that because there are no accredited XML transformation engines with embedded IMAP servers, you have to build your own. Be reasonable.
- If you fork, is it necessary? There’s all kinds of literature on this decision, including §8b1(b)(iv), which I won’t cover here, but the decision to fork is a serious one. Can you get your requirements fulfilled with a patch instead? Are you prepared to carry the maintenance burden for the foreseeable future? The decision is a complicated one, and a program office shouldn’t take them lightly. A fork may seem easier at first, but it will be much, much harder over the long term.
- Have you accounted for all the costs? This is notoriously difficult, but take a moment to read §8b1(b)(v) and figure out exactly how complicated you’ve made your life. Because you’re using your own project, do you have a greater maintenance burden? Higher support costs? What about the additional certification or accreditation work? What about your exit costs – will it be harder to move to a better alternative in the future because you’re using a government-specific piece of software?
- Are your assumptions still true? This is the most important, and the question demanded by the Accumulo case. You may read §8b1(b)(vii), done your Analysis of Alternatives, your market research, determined your requirements, but that was two years ago. Is there still no alternative? Maybe there’s another project you could work with to reduce your burden? Having a regular, honest re-evaluation of your assumptions will keep you from wandering too far off the reservation.
Now it’s your turn: what else should agencies be thinking about?