Agile Legacy System Analysis and Integration Modeling
Most systems do not exist in isolation,
instead they must interact with other systems in some
fashion. Furthermore, there is very little "greenfield"
development where you build a new system from
scratch, instead the vast majority of software
development is more along the lines of "brownfield"
efforts where you improve upon an existing
system(s). When you are building a system you must
identify the potential interactions it will have with
other existing computing assets, and identify what you
will build upon, to
reuse those legacy assets effectively. The
documentation which describes how to interface to an
external system is referred to as a
contract model in Agile Modeling. If the
contract model(s) exist, and are up-to-date, then you
should consider yourself lucky. In many organizations
legacy systems are poorly documented, if documented at
all, leaving it up to the first team to come along to
update the documentation at least to the level at which
they require it. This effort is often
referred to as "legacy system analysis".
Let's assume that you're developing a
system, or simply enhancing an existing one, which will
need to interface to existing legacy assets within your
organization. There are several tasks, performed
iteratively, to doing so:
To identify the interactions which your system
has with others you need to identify the
"interaction perimeter” of your system, depicted in
Figure 1. As you can see there are five interaction
Reads from external data
source(s). Your system may read from external data
sources such as files or databases. You will need
to understand both the structure and semantics of
Updates to external data
source(s). There are several implications.
First, other systems may depend on the updates that
your system makes, coupling them to yours.
Second, your system may increase the traffic to the
legacy data source and thereby effect the
performance of other systems.
External system interface(s).
Your system may interact with other systems through
provided systems interfaces such as web services or
an application program interface (API). Your system
could even invoke behavior which other systems
depend on, such as a batch job which updates an
Your data source(s). Other
systems may read or write to data sources owned by
your system. These systems depend on the data, or
at least portions of the data.
Your system interface(s).
Other systems may interact with yours by accessing
your data sources or via your own system interface(s).
There are several techniques ways to identify your
system interaction perimeter:
Read your system documentation. Your existing
documentation, if any, should indicate how your
system interfaces to other legacy assets and how
they interact with it. Don’t assume that this
documentation is complete and correct. Even though
your organization may be meticulous in maintaining
documentation it is possible that an interaction was
introduced by another team that hasn't been
Seek help from others.
operations staff are good people to involve
because they have to work with multiple systems on a
daily basis. Your enterprise administrators,
particularly information and network administrators,
may be your best bet as they’re responsible for
managing the assets in production.
Analyze the code. Without accurate
documentation, or access to knowledgeable people,
your last resort may be to analyze the source code
for the legacy system, including code which invokes
your system such as Job Control Language (JCL).
This effort is often referred to as software
Once you've identified a potential
interface which your system is involved with you need to
identify the other systems/assets involved with that
interface. Knowing the interface exists doesn't
automatically mean that you know which systems are
using/supplying that interface. Ideally you
shouldn't need to know this information, but
realistically you sometimes do. For example, your
organization may have a collection of web services
provided by a variety of systems which yours may reuse.
You will need to know what services are available and
what their signatures are, but you likely don't need to
know the underlying system(s) offering each service.
However, you may be accessing a legacy database which is
owned by another team. Minimally you'll need to
know the structure and semantics of the data which
you're accessing. However, you may also need to
know which systems provide the data you're using and the
way in which they provide it (e.g. in batch refreshed
daily at 2 am Greenwich mean time) to determine if you
really want to use that data source. Ideally the
contract model describing the database contains this
sort of information, if not you may need to do the
legacy analysis to obtain it.
The thing to realize is that your system
is part of the interaction perimeters of each of the
external legacy assets, so you must look for the sort of
interactions listed earlier from the point of view of
those assets. It isn’t hard but it is usually tedious
and time consuming.
Not only do you need to know which systems
are coupled to yours, you need to understand how they're
coupled to your system. Your goal is to identify
the way that these systems are coupled to your systems
interaction perimeter. Issues to look for during your
Direction of the interaction.
As noted previously, the direction of the
interaction has different potential impacts.
Information being exchanged.
You need to identify which data, often down to the
element level, is being accessed and how it is being
used. This will help you to identify replacement
data sources, if any, and how effective they are at
covering the original need. For example an external
system may access your database to obtain a complete
list of products offered within the city of Atlanta.
If all other data sources only relate products to
the states in which they’re sold then the external
system will no longer have the preciseness which it
Functionality being invoked.
You need to identify the exact services (operations,
procedures, functions, and so on) being invoked at
the interaction perimeter.
Frequency and volume. The
frequency and the volume of the interaction are
important pieces will indicate the load which you
will put on the new sources for those interactions –
they may not be able to handle the additional stress
without infrastructure upgrades. Or, in the case of
removing load the new sources may be over-powered,
motivating your enterprise administrators to
reconfigure the hardware and/or network resources
for those systems and to divert them where they are
This is not always as straightforward as it might
appear. While you may be able to readily identify the
systems that feed data directly to your system, other
interactions can be more subtle. You must track down
systems (or users) that take an occasional data extract
from your system (perhaps for a quarterly report), the
ones that send an infrequent data feed, all systems that
invoke functionality from your system (and vice versa),
and all links to your system, even if it’s just a URL
from a web page.
If you reverse-engineer your code and still can't tell what business you're in then you've got a serious problem. - Jon Kern
Another problem is the architecture
impedance mismatch between systems. Each system is
built based on its own set of architectural
requirements. For example each system will have made a
decision regarding timeliness: one database is updated
weekly whereas another is updated daily. One system may
be built to validate data in the application source code
whereas another does it in the database. It is quite
common for systems to be architected with the assumption
that they will be the primary controller of the
interactions, yet this clearly can't be the case once
they're integrated. The implication is that the systems
which perform essentially the same services which your
system does may do so in an incompatible manner and
therefore they can't be considered as potential
is usually the most difficult part of
analyzing system interaction. The data structures of
your system and the system(s) your interfaces to will
often be different, the database vendors can vary, the
types of data sources (for example relational databases
versus XML Files versus IMS) will vary, and worse yet
the informational semantics are also likely to be
different. For example, consider an ice cream company.
One database maintains a flavors table which contains
rows for chocolate, strawberry, and vanilla. The
replacement data source contains a table with the exact
same layout which maintains the flavors mocha fudge,
ultimate chocolate, double chocolate, wild strawberry,
winter strawberry, French vanilla, and old-fashioned
vanilla. The new table arguably supports chocolate,
strawberry, and vanilla yet there is clearly a semantics
problem which may be very difficult to overcome.
Table 1 summarizes common legacy
data challenges, presented in detail in
Agile Database Techniques, which you may encounter.
There is a wide range of
quality problems when it comes to data. This
includes: a column or table that is used for
more than one purpose, missing or inconsistent
data, incorrect formatting of data, multiple
sources for the same data, important entities or
relationships which are stored as text fields,
data values that stray from their field
descriptions and business rules, several key
strategies for the same type of entity,
different data types for similar columns, and
varying default values. This list is nowhere
near complete but it should give you a feeling
for the challenges which you will face.
Architecture problems are
also a serious problem with many legacy assets.
Common data-oriented problems include
applications which are responsible for data
cleansing (instead of the database) and varying
timeliness of data.
Similarly you are likely
to find problems in legacy code, such as
inconsistent naming conventions,
inconsistent or missing internal
documentation, inconsistent operation
semantics and/or signatures, highly coupled
and brittle code, low cohesion operations
which do several unrelated things, and code
that is simply difficult to read.
Applications may be
poorly layered, with the user interface
directly accessing the database for
example. An asset may be of high quality
but its original design goals may be at odds
with current project needs; for example it
may be a batch system whereas you need
Architecture problems are
also a serious problem with many legacy
assets. Common problems include:
incompatible or difficult to integrate
(often proprietary) platforms, fragmented
and/or redundant data sources, inflexible
architectures which are difficult to change,
lack of event notification making it
difficult to support real-time integration,
and insufficient security.
Work in an evolutionary manner. You
don't need to develop all of the documentation
for all of the system interactions up front.
Instead you can work iteratively, fleshing out
the contract model(s) a bit at a time. You
should also work incrementally, creating the
contract model(s) and then in turn the actual
integration code as you need them.
Work closely with the legacy system owners.
You need to work collaboratively and
cooperatively with the legacy system owners if
you are to succeed. Fundamentally they own
the system which you need to access, they can
easily prevent you from doing so.
Furthermore, they are the experts, therefore
they are the ones which should be actively
involved with the legacy analysis efforts.
This is yet another example of
active stakeholder participation, where the
stakeholders are the owners of the legacy
Consider look-ahead modeling.
A good reason to
ensure that a development team has sufficient
information about an existing legacy asset which
they need to integrate to.
Don't get hung up on the "one truth".
Many projects go astray when the data
professionals involved with them focus too
heavily on the
one truth above all else.