Single Source Information: An Agile Practice for Effective Documentation

Recently reviewed In agile software development you want to travel as light as possible, and the easiest way to do that is to choose the best artifact to record information. I use the term "artifact" to refer to any model, document, source code, plan, and so on created during a software development project. Furthermore, you want to record information as few times as possible, ideally only once. For example, if you describe a business rule in a use case, then describe it in detail in a business rule specification, then implement it in code, you have three versions of the same business rule to maintain. It would be far better to record the business rule once, ideally as human-readable but implementable code, and then reference it from any other artifact as appropriate.

Why do you want to record a concept once? Three reasons:

  1. Reduce your maintenance burden. The more representations that you maintain the greater the maintenance burden because you'll want to keep each version in sync with each other. Figure 1 depicts a typical approach to traditional software development artifacts. The letters in each artifact represent a piece of information stored within it. For example, information A (perhaps our business rule mentioned above) is captured in the requirements document, test model, and source code.
  2. Reduce your traceability burden. With multiple copies the greater your traceability needs will be because you'll need to relate each version to its alternate representations, otherwise you'll never be able to keep them synchronized when a change does occur. Yes, AM advises you to update only when it hurts but the more copies you have of something the more likely it is that it will start to hurt earlier.
  3. Increase consistency. The more copies you have the greater the chance you will have inconsistent information because you very likely won't be able to keep the versions synchronized.

Figure 1. Traditional software development artifacts.


It's interesting that traditional processes typically promote the recording of technical information, such as representing business rules three different ways. At the same time they'll also prescribe design concepts such as normalization and cohesion which lead you to develop a design which implements concepts once. For example, the rules of data normalization motivate you to store data in one place and one place only. Similarly, in object-oriented and component-based design you want to build cohesive items (components, classes, methods, and so on) that fulfill only one goal. If this is ok for your system design, shouldn't it also be ok for your software development artifacts? We clearly need to rethink our approach.


Views, Not Copies

It is clear that you should store system information in one place and one place only, ideally in the most effective place. From the point of view of software development this concept is called normalization, in the programming world an important aspect of literate programming, and in the technical documentation world it is called single sourcing. With single sourcing the idea is to record technical information only once and then generate various documentation views as needed from that information. In the case of the business rule example above it would be recorded using some sort of business rule definition language. A human readable view would be generated for your requirements documentation (this is easier said than done, by the way, but for the sake of argument let's assume that it's possible) and an implementation view generated which would either be run by your business rule engine or compiled as application source code.

Figure 2 depicts a strategy for single sourcing all of the information contained in Figure 1, then automatically generating the original artifacts of Figure 1 through the use of a generator. The views are generated on an as needed basis from the most recent source information, ensuring that they are up-date as of the generation time. .


Figure 2. An ideal approach to single sourcing information.


An important implication of Figure 2 is that although the information is stored in a single place, it can be rendered in multiple ways for different audiences. This is called the Locality of Reference Documentation Principle. End users will need to see information in a different format than programmers, for example. Some people prefer to see diagrams whereas others prefer information in textual form. Just because information needs to be viewed, and worked with, in multiple ways doesn't mean that it needs to be stored multiple times. Just as you build working software from your source code base, you would "build" your documentation views from your single-sourced information base.

Figure 3 depicts a far more realistic approach. Although you would like to store information in one place and one place only, the reality is that your toolset may not allow you to. Also, because you are only human you are going to make mistakes and record the same information twice. Also, Figure 3 shows a common situation in software development: although a lot of your documentation can be represented in source code, for example using JavaDoc comments in Java, some critical information will still be stored as external documentation. It is very common for agilists to have concise system overview documentation, release notes, and user guides for example.

Agile Modeling

Figure 3. A realistic approach to single sourcing information.


Traditional Single Sourcing

To make the traditional single sourcing vision work you need a common way to record information. The Darwin Information Typing Architecture (DITA) is an XML-based format which is promoted for single sourced technical documentation. There is nothing stopping you from creating your own storage strategy: single sourcing is often approached in a top-down manner with the data structure for the documentation is typically defined early in a project. The primary challenge with traditional single sourcing is that it requires a fairly sophisticated approach to technical documentation. This is perfectly fine, but unfortunately many organizations aren't yet able to achieve this vision and find that they need to back away from the approach. This doesn't mean that you need to throw out the baby with the bath water: you should still strive to normalize all of your software development artifacts.


Agile Single Sourcing

There is no reason why you couldn't take a more agile approach where the structure of your system artifacts emerge over time. This is where the AM practice Single Source Information comes in. When you are modeling you should always be asking the questions "Do I need to retain this information permanently?", "If so, where is the best place to store this information?" and "Is this information already captured elsewhere that I could simply reference?". Sometimes the best place to store information is in an agile document, often it's in source code.

There are several AM principles and practices which support agile single sourcing. They are:

  1. Executable specifications. This is one of the easiest approaches to single sourcing information to understand, and often one of the most productive. By taking a test-driven development (TDD) approach at both the requirements level your customer acceptance tests are not only tests they are also requirements specifications. Similarly, with TDD at the design level your developer tests form the majority of your detailed design specification.
  2. Apply the right artifact(s). Technical information should be captured using the most appropriate artifact, be that a hand-drawn sketch, a detailed data model, a use case specification, or source code.
  3. Model with a purpose. You should know why you are creating an artifact, know who it is for, and how they're going to work with it. If you don't understand these three factors then you are very likely to record more information than you need out of fear that someone, somewhere, and at some time may need it. This leads not only to over documentation but very likely to the unintentional capture of the same information several times over.
  4. Support collective ownership. Only when everyone has access to a shared collection of artifacts is it possible to capture information in the right place once and once only. If some people do not have full access then they are motivated to capture their own version of the information.
  5. Build teams of generalizing specialists. When teams are made up of specialists who only know how to work with a small subset of artifacts you will often capture the same information in several places. For example, if your team has expert business analysts, expert coders, and expert testers then each of these groups will capture a business rule using their own approaches - perhaps as a UML activity diagram, as source code, and in the test specification. When people are generalizing specialists who have one or more specialties plus a general understanding of the complete software lifecycle they can work with a wide range of artifacts, reducing the need to capture the same information in several places.
  6. Model with others. Effective software development teams work together in a co-operative and collaborative manner. When people work alone they will capture their own version of the technical information, a version which may be slightly different than that maintained by their co-workers. By modeling with others you not only work together to develop a model or document you also spread skills and knowledge throughout the team, improving the chance that your models and documents will be both consistent and normalized.
  7. Maximize stakeholder ROI. Everyone on a development team should want to ensure that stakeholder's money is spent wisely. Not only is this a good thing to do, it increases the chance that your stakeholders will want to continue working with you in the future. Is it really effective to capture the same information several times, to increase your maintenance burden, and to increase your traceability needs? I don't think so.

Other agile techniques which support single sourcing, at least at the detailed design level, are code refactoring and database refactoring. With code refactoring you make a small change to your source code to improve its design; similarly with database refactoring you make a small change to your database schema to improve its design. Many refactorings, such as Extract Method or Introduce Lookup Table, explicitly increase the normalization of your system's underlying object or data schemas respectively.

It interesting to observe that when an agile team is made up of generalizing specialists, or at least people striving to become so, and when they are actively trying to do the best job possible, that the best place to store information often proves to be your source code. This is exactly what many extreme programmers claim, although in my opinion they've struggled to convey this message in terms which are palatable to traditional IT professionals. Many traditionalists claim that if your documentation is in the source code then it's effectively lost. What they're really saying is that it's lost to people who are unable to read source code, or at least people who don't have tools (such as JavaDoc perhaps) that can extract critical information from the source and present it in an alternative format.

Just as it's extremely rare to find a perfectly normalized relational database I suspect that you'll never truly be able to fully single source all of your software artifacts. In the case of databases performance considerations, and to be fair design mistakes made by project teams, result in less-than-normal schemas. Similarly, everyone isn't going to be able to work with all types of artifacts - it isn't realistic to expect business stakeholders to be able to read program source code and the uber-tools required to support this vision continue to elude us (and likely always will).