 |
Data modeling is the act of
exploring data-oriented structures. Like other modeling
artifacts data models can be used for a variety of
purposes, from high-level conceptual models to physical
data models (PDMs). Physical data modeling is
conceptually similar to design class modeling, the goal
being to design the internal schema of a database,
depicting the data tables, the data columns of those
tables, and the relationships between the tables.
|
|
Figure 1
presents a partial PDM for the university – you know
that it isn’t complete by the fact that the Seminar
table includes foreign keys to tables that aren’t shown,
and quite frankly it’s obvious that many domain concepts
such as course and professor are clearly not modeled.
All but one of the boxes represent tables, the one
exception is UniversityDB which lists the stored
procedures implemented within the database. Because the
diagram is given the stereotype Physical Data Model
you know that the class boxes represent tables, without
the diagram stereotype I would have needed to use the
stereotype Table on each table. Relationships
between tables are modeled using standard UML notation,
although not shown in the example it would be reasonable
to model composition and inheritance relationships
between tables. Relationships are implemented via the
use of keys (more on this below).
Figure 1. A partial PDM for
the university.

When you are physical data modeling
the following tasks are performed in an iterative
manner:
-
Identify tables. Tables are the database
equivalent of classes; data is stored in physical
tables. As you can see in Figure 1
the university has a Student table to store
student data, a Course table to store course
data, and so on. Figure 1 uses
a
UML-based notation (this is a publicly defined
profile which anyone can provide input into). If you
have a class model in place a good start is to do a
one-to-one mapping of your classes to data tables, an
approach that works well in “greenfield” environments
where you have the luxury of designing your database
schema from scratch. Because this rarely happens in
practice you need to be prepared to be constrained by
one or more legacy database schemas which you will
then need to map your classes to. In these situations
it is unlikely that you will need to do much data
modeling, you will simply need to learn to live with
the existing data sources, but you will need to be
able to read and understand existing models. In some
cases you may need to perform
legacy data analysis and model the existing schema
before you can start working with it.
-
Normalize tables.
Data normalization is a process in which data
attributes within a data model are organized to
increase the cohesion of tables and to reduce the
coupling between tables. The fundamental goal is to
ensure that data is stored in one and only one place.
This is an important consideration for application
developers because it is incredibly difficult to
stores objects in a relational database if a data
attribute is stored in several places. The tables in
Figure 1 are in third normal
form (3NF).
-
Identify columns. A column is the database
equivalent of an attribute, and each table will have
one or more columns. For example, the Student
table has attributes such as FirstName and
StudentNumber. Unlike attributes in classes,
which can either be primitive types or other objects,
a column may only be a primitive type such as a char
(a string), an int (integer), or a float.
-
Identify stored procedures. A stored procedure is
conceptually similar to a global method implemented by
the database. In Figure 1 you
see that stored procedures such as averageMark()
and studentsEnrolled() are modeled as
operations of the class UniversityDB. These
stored procedures implement code that work with data
stored in the database, in this case they calculate
the average mark of a student and count the number of
students enrolled in a given seminar respectively.
Although some of these stored procedures clearly act
on data contained in a single table they are not
modeled as part of the table (along the lines of
methods being part of classes). Instead, because
stored procedures are a part of the overall database
and not a single table, they are modeled as part of a
class with the name of the database.
-
Apply naming conventions. Your organization should
have standards and guidelines applicable to data
modeling, and if not you should lobby to have some put
in place. As always, you should follow AM’s practice
of
Apply Modeling Standards.
-
Identify relationships. There are relationships
between tables just like there are relationships
between classes. The advice presented relationships
in
UML class diagrams applies.
-
Apply data model patterns. Some data modelers will
apply common data model patterns, David Hay’s (1996)
book
Data Model Patterns is the best reference on
the subject. Data model patterns are conceptually
closest to analysis patterns because they describe
solutions to common domain issues. Hay’s book is a
very good reference for anyone involved in
analysis-level modeling, even when you’re taking an
object approach instead of a data approach because his
patterns model business structures from a wide variety
of business domains.
-
Assign keys. A key is one or more data attributes
that uniquely identify a row in a table. A key that
is two or more attributes is called a composite key.
A primary key is the preferred key for an entity type
whereas an alternate key (also known as a secondary
key) is an alternative way to access rows within a
table. In a physical database a key would be formed
of one or more table columns whose value(s) uniquely
identifies a row within a relational table. Primary
keys are indicated using the <<PK>> stereotype and
foreign keys via <<FK>>.
Read here for more about keys.
Although similar notation is used
it is interesting to note the differences between the
PDM of Figure 21 and the
UML class diagram from which is ti based:
-
Keys. Where it is common
practice to not model scaffolding properties on class
models it is common to model keys (the data equivalent
of scaffolding).
-
Visibility. Visibility
isn’t modeled for columns because they’re all public.
However, because most databases support access control
rights you may want to model them using UML
constraints, UML notes, or as
business rules. Similarly stored procedures are
also public so they aren’t modeled either.
-
No many-to-many associations.
Relational databases are unable to natively support
many-to-many associations, unlike objects, and as a
result you need to resolve them via the addition of an
associative table. The closest thing to an
associative table in is WaitList which was
introduced to resolve the on waiting list
many-to-many association depicted in the
class diagram. A pure associative table is
comprised of the primary key columns of the two tables
which it maintains the relationship between, in this
case StudentNumber from Student and
SeminarOID from Seminar. Notice how in
WaitList these columns have both a PK and
an FK stereotype because they make up the
primary key of WaitList while at the same time
are foreign keys to the other two tables. WaitList
isn’t truly an associative table because it contains
non-key columns, in this case the Added column
which is used to ensure that the first people on the
waiting list are the ones that are given the
opportunity to enroll if a seat becomes available.
Had WaitList been a pure associative table I
would have applied the associative table
stereotype to it.
Remaining Agile
I will often use a CASE tool to
create physical data models. The two features I require
of a data modeling tool are the ability to generate data
definition language (DDL) code required to create the
database schema and the ability to reverse engineer a
data model from an existing database schema. Virtually
all data modeling tools still on the market today
support these features.
This artifact description is excerpted from Chapter 12 of
The Object Primer 3rd Edition: Agile Model Driven
Development with UML 2.
 |
|
The Object Primer 3rd Edition: Agile Model Driven
Development with UML 2 is an
important reference book for agile modelers,
describing how to develop 35
types of agile
models including all 13
UML 2 diagrams.
Furthermore, this book describes the techniques
of the
Full Lifecycle Object Oriented Testing
(FLOOT) methodology to give you the fundamental
testing skills which you require to succeed at
agile software development. The book also
shows how to move from your agile models to
source code (Java examples are provided) as well
as how to succeed at implementation techniques
such as
refactoring and
test-driven development
(TDD). The Object Primer also includes a
chapter overviewing the critical database
development techniques (database refactoring,
object/relational mapping,
legacy analysis, and
database access coding) from my award-winning
Agile Database Techniques
book. |
 |
|
This book describes the philosophies and skills required for
developers and database administrators to work together effectively on
project teams following evolutionary software processes such as Extreme
Programming (XP), the
Rational Unified Process (RUP), the
Agile Unified
Process (AUP), Feature Driven
Development (FDD), Dynamic System Development
Method (DSDM), or The Enterprise Unified Process (EUP). In March 2004
it won a Jolt Productivity award. |
 |
|
This book describes, in detail, how to
refactor a database schema
to improve its design. The first section of the book overviews the fundamentals evolutionary database techniques in
general and of database refactoring in detail. More importantly it
presents strategies for implementing and deploying database refactorings, in
the context of both "simple" single application databases and in "complex"
multi-application databases. The second section, the majority of the
book, is a database refactoring reference catalog. It describes over
60 database refactorings, presenting
data models overviewing the each refactoring and the code to implement it.
|
 |
|
Agile Modeling: Effective Practices for Extreme
Programming and the Unified Process is the seminal
book describing how agile software developers approach
modeling and
documentation. It describes principles and
practices which you can tailor into your existing
software process, such as
XP, the
Rational Unified Process (RUP), or the
Agile Unified Process (AUP), to streamline your
modeling and documentation efforts. Modeling and
documentation are important aspects of any software
project, including agile projects, and this book
describes in detail how to
elicit requirements,
architect, and then
design your system in an agile manner. |
 |
|
The Elements of UML 2.0 Style describes a collection
of standards, conventions, and
guidelines
for creating effective
UML diagrams. They are based on sound, proven
software engineering principles that lead to diagrams
that are easier to understand and work with. These
conventions exist as a collection of simple, concise
guidelines that if applied consistently, represent an
important first step in increasing your productivity as
a modeler. This book is oriented towards
intermediate to advanced UML modelers, although there
are numerous examples throughout the book it would not
be a good way to learn the UML (instead, consider
The Object Primer). The book is a brief 188
pages long and is conveniently pocket-sized so it's easy
to carry around. |
Translations
I actively work with clients around the world to
improve their information technology (IT) practices as
both a mentor/coach and trainer. A full
description of what I do, and how to contact me, can be
found here.
|