Ch 23 -- Introduction to Revision Control
UNIX Unleashed, Internet Edition
- 23 -
Introduction to Revision Control
Eric Goebelbecker
Web sites, programming projects, and even networks revolve around collections
of files. Many of these files depend upon information that is stored in other files,
such as targets for hypertext links, arguments to functions, or network names and
addresses. These relationships can be very difficult to manage, especially
when more than one person is involved or as small projects evolve into large systems.
One of the tools commonly found on a UNIX system for managing those relationships
is a Revision Control System. (also called a Source Control System;
this chapter will use both terms interchangeably) These systems allow a person
(or group of people) to track the changes made to a set of files, quickly and accurately
undo a set of changes, and maintain an audit trail regarding why changes were made.
This chapter will explore the common characteristics and concepts behind these
systems and how you can use them to help manage your projects more effectively. This
will be done without going too far into the specifics of any particular system. RCS,
SCCS, and CVS, three of the most widely used source control systems are covered fully
in the next three chapters.
Source control is often closely associated with software development. While it
is an indispensable tool for any programming project, this chapter will illustrate
how it can also be useful for many other projects.
This chapter will:
- Explain what revision control is and what it is frequently used for.
- Demonstrate essential revision control concepts, such as creating revisions,
checking changes in and out of the system, how file changes are logically organized
and how the systems can be used to easily move between file revisions.
- Cover advanced topics such as using revision control to prevent conflicts created
by a group of people working on a single set of files, documenting changes to files
and creating revision branches.
What Is Revision Control?
Managing change is a common part of computing. Programmers have to manage bug
fixes while producing new versions of applications that are frequently based on the
code that contains what is being fixed. System administrators have to manage a variety
of configuration changes, such as adding new users to systems and adding new systems
to networks, without interfering with day-to-day operations. Web authors have to
make continuous revisions to documents in order to keep up with the constantly growing
and improving Internet competition. Just about any computer related job (or any job
that can use a computer, for that matter) goes through a seemingly endless cycle
of revision, refinement, and renewal.
Fortunately for UNIX users, most of the files used in these processes are text
files, files that consist of (mostly) human readable characters. (A more technical
description would be files that are limited to the ASCII character set.) Programs
in C/C++, Perl, and Java code are written in text files, as are HTML and JavaScript
documents. UNIX configuration files for system and network management are usually
human readable, as are many of the languages used for document creation and formatting,
such as troff, postscript, and ghostscript.
Why is this fortunate? Because revision control systems can manage any text file.
They are sets of utilities that allow users to manage the creation and maintenance
of any document, either alone or in groups. The systems covered in this book are
SCCS, RCS, and CVS.
These systems provide some common features:
- The ability to save multiple versions of a file, and easily select between them.
- The ability to resolve (and prevent) conflicts caused by more than one person
altering a file simultaneously.
- The ability to review the history of changes made to a file.
- The ability to link versions of different files together.
Revision Control Concepts--an Example
In order to illustrate the concepts behind version control, let's use an example
HTML project. Concepts will be introduced without actually demonstrating any commands
or utilities. Instead we will simply describe the operations that we could perform
in order to maintain our project.
Our project will start with the following file, hello.html.
<!DOCTYPE HTML PUBLIC -//IETF//DTD HTML//EN>
<html>
<head>
<title>An Html Page</title>
</head>
<body>
<h1>Hello World!</h1>
<hr>
<address><a href= mailto:eric@prophet>Eric Goebelbecker</a></address>
</body>
</html>
Registering the Initial Revision
The first step is to register hello.html. When a file is registered,
a control file is created, the revision is numbered, and the original
file is marked read-only if we specify that we want a copy to stay behind.
Revisions (or deltas in SCCS terminology) are the building blocks
of source control projects. Files (and groups of files) are stored and retrieved
in terms of the changes made to them. Each time a file is changed and checked
in, a new revision is created.
Since this is the original file, it is referred to as the root of
the revision tree. It would typically be numbered version 1.1. Revision control
systems allow these numbers to be overridden when files are registered or checked
out (We'll explain how and when files are checked out in the next section.)
Revision numbers, such as 1.1, are used as names for versions of files (Actual
names can be used in some situations also (see the "Symbolic Names" section
later in this chapter). The leftmost number usually signifies a major release for
a product. If we were working on a new version of an existing product, we might override
this number to be 2 or 3, depending upon what internal policies exist for version
numbers. The second number represents the minor version, where 2.5 might represent
the fifth revision of a file within version 2. (Revision numbers have taken on a
life of their own since the early days of RCS and SCCS, and really don't mean as
much as they used to.)
It is significant that the version control system marks any remaining copies of
the file as read-only. A version control system is only as accurate as the changes
it's aware of, and registering changes is very important. On a superficial level
the file's permissions act as a reminder to us to keep the file in sync with the
revision control system. More importantly, the file permissions perform a crucial
part when more than one person is involved in working on a project.
Edits to a file cannot be saved if the file is marked read-only, and the permissions
on the file can only be changed by the owner (or by the super-user). The right way
to edit the file is to check it out from the control system, which marks the file
as only being writeable by the person who has checked it out. Therefore, if one user
checks a file out, others will not be able to alter it until it is checked back in.
This is the most fundamental operation in what is called file locking.
NOTE: When a group is working together,
for instance, to create a set of Web Pages, an application development project, or
any other non-system administration-related project, all of the users should have
a proper account and should be using it. No one should be working as root,
since file locking essentially becomes useless when a user can override it at will.
Revision control systems store the series of changes to objects in control files.
(Each system has different options and stores these files differently. See chapter
24 for details on RCS files and chapter 26 for details on SCCS files. CVS, which
is covered in chapter 25 uses RCS files.) These files contain complete histories
of the project, which allows them to serve as both a backup and an audit trail. In
fact, keeping a current copy of the file isn't really necessary, just as long as
the history file is available. Many programming utilities, such as make
and emacs, are aware of revision control and can automatically retrieve
the latest version of a file.
Registering hello.html starts the revision control process. This process
essentially enforces a discipline on users who are working on that project. Files
cannot be altered unless they are checked out, and others cannot work on them unless
they are checked in. If you do not check a file in, your coworkers will most likely
tell you to. Also, as you will see in the next section, when files are checked in,
the systems allows you to add comments regarding the changes you made. If the comments
are missing or incomplete, trouble frequently ensues, especially when the changes
are implicated in a problem.
Creating a New Revision
The e-mail address on line #9 in hello.html will not work for external
systems because the domain name is incomplete, so we must update the file. (Otherwise,
how are people going to tell us what they think of our masterpiece?)
The file is still marked read-only from when we registered it with the revision
control system. In order to edit it, we need to check out the latest version of hello.html.
Checking a file out (or getting it in SCCS terms), provides us with a modifiable
working copy of the file. It also marks the file as being edited within the revision
control system, locking other users out from checking in revisions that could conflict
with ours. (Files can also be checked out for read-only, so the file can be examined
at any time, but only one user can lock it at a time.)
After checking out the file, the line is modified:
<address><a href=mailto:eric@niftydomain.com>Eric Goebelbecker</a></address>
Then we check in (or delta) the file. As a part of the check in process, the system
prompts us for a comment. (The SCCS request prompt is shown.)
comments? Fixed e-mail address.
The project now has a second revision, which is numbered version 1.2 since we
didn't override the default.
The Revision Tree
Let's imagine that this process continues and hello.html grows into a
more sophisticated HTML page.
Figure 23.1.
A Simple Revision Tree.
Each revision is a node on the revision tree. The node labeled version 1.1 (root)
in Figure 23.1 represents the initial revision of hello.html. The
node labeled version 1.2 represents the version with the corrected e-mail address;
version 1.3 could represent a version with some graphics added, and so on.
NOTE: As you may have figured out already,
version control systems use a tree metaphor, much like UNIX directories.
For a simple file such as hello.html, viewing the history of revisions
as a tree may seem like a bit of a stretch. Later, when we cover revision branching
in the "Advanced Concepts" section, the metaphor will have more
meaning.
Returning to an Earlier Revision
Version 1.3 contained a very large graphic, which worked fine on our local LAN,
but took too long to download elsewhere on the Internet.
When the large graphic was added to the page, a lot of formatting was also added,
so simply removing the graphic or adding a smaller one would seriously affect the
page. In order to make the page usable quickly, use the revision control system to
retrieve version 1.2 until you have time to solve the problem with version 1.3.
The systems make this easy, because the file can be checked out at a specific
revision level. You can also check out revision 1.2 as a read-only file so users
can view it, while addressing the problem with revision 1.3.
Advanced Concepts
Now that we've covered the basic concepts, let's move on to some more advanced
applications of revision control, such as how to use it to resolve problems, how
to maintain more than one version of a project, and how it makes managing a project
that involves more than one person much easier to manage than e-mail and those sticky-pad
notes.
Revision History
Having only three versions of hello.html made the transition back to
an earlier version too easy. Let's move on to a more comprehensive example.
An accounting package has a major new feature added. (Let's imagine that now it
calculates the value of a customer's account in U.S. dollars and German Marks.) Following
the addition of that enhancement, a few minor features and a pair of bugs are fixed.
One day a customer points out that the calculation in German currency has a problem.
Since the program has gone through some changes since that feature was added, how
can the bug be isolated quickly? Viewing the revision history could help. Below is
a theoretical revision history from SCCS.
D 1.5 97/08/03 16:23:32 fred 4 3 00024/00025/00200
MRs:
COMMENTS:
Added compatibility with fvwm
D 1.4 97/08/03 16:23:32 fred 4 3 00024/00025/00200
MRs:
COMMENTS:
Fixed divide by zero bug in entry module
D 1.3 97/07/15 19:14:21 mike 3 2 00002/00002/00223
MRs:
COMMENTS:
Added report formatting features and support for HP680C
D 1.2 97/06/27 19:03:26 melvin 2 1 00012/00003/00213
MRs:
COMMENTS:
Added Deutsch Mark valuation module
The bug was introduced back in version 1.2 when Melvin added the support for Deutsche
Marks. However, since then Mike and Fred added reporting features and support for
fvwm and fixed another bug. We see how use of revision comments can aid
in a project by isolating when and where a problem may have been introduced. The
"How do I use RCS?" section in Chapter 24 explains the use of the rcslog
command for viewing revision histories in RCS and CVS. "Examining Revision Details
and History" in chapter 26 explains how to view this information in SCCS. CVS
Multiple Versions of a Single File or Project
In the previous examples, you only needed a revision tree with a single path,
the trunk. Let's look at a situation where a project needs more advanced solutions.
A small ISP (Internet Service Provider) provides two varieties of service
to its customers. One is a shell account where a customer can dial in and
log into a UNIX host. The other is a PPP account, where the customer dials
in for a network connection, but never logs into one of the ISP's systems. (Note
to nitpickers: the PPP login is handled by a terminal server.)
All users do, however, need to have accounts on the POP mail server, because
all of them will receive mail and the mail must be saved with proper ownership and
file permissions until the users retrieve it, either with a mail agent from their
shell account or to their systems at home. Therefore, the ISP needs to maintain two
UNIX passwd files, one for shell users only and one with all users. (Second
note to nitpickers: yes, if the terminal server uses a passwd file, we need
three. It's only an example!)
The initial revision of the passwd file, prior to the ISP offering PPP
accounts, might have looked like this:
abe:x:200:200:Abraham Lincolni:/export/home/abe:/bin/sh
ben:x:201:200:Benjamin Franklin:/export/home/ben:/bin/ksh
sue:x:202:200:Susan B Anthony:/export/home/sue:/bin/ksh
ike:x:203:200:Dwight D Eisenhower:/export/home/ike:/bin/ksh
fdr:x:204:200:Franklin D Roosevelt:/export/home/fdr:/bin/ksh
harry:205:200:Harry S Truman:/export/home/harry:/bin/sh
john:x:206:200:John Galt:/export/home/john:/bin/csh
At a certain point, however, the ISP administrator needed to add users to the
passwd file who did not belong on the shell host, only on the POP host:
abe:x:200:200:Abraham Lincolni:/export/home/abe:/bin/sh
ben:x:201:200:Benjamin Franklin:/export/home/ben:/bin/ksh
sue:x:202:200:Susan B Anthony:/export/home/sue:/bin/ksh
ike:x:203:200:Dwight D Eisenhower:/export/home/ike:/bin/ksh
fdr:x:204:200:Franklin D Roosevelt:/export/home/fdr:/bin/ksh
harry:x:205:200:Harry S Truman:/export/home/harry:/bin/sh
john:x:206:200:John Galt:/export/home/john:/bin/csh
bill:x:207:200:William Clinton:/tmp:/bin/nosuchshell
hillary:x:208:200:Hillary Clinton:/tmp:/bin/nosuchshell
al:x:209:200:Albert Gore:/tmp:/bin/nosuchshell
hank:x:210:200:Hank Reardon:/tmp:/bin/nosuchshell
(The users with nosuchshell only have access to POP mail.)
Revision control provides two possible solutions for this problem.
Branching the Revision Tree
If the administrator just wanted to use the Shell accounts as a base for the POP
mail file, she could add a branch to the revision tree.
Figure 23.2.
A revision tree with branches.
As Figure 23.2 shows, a branch creates a new development path for the project.
It also has an impact on revision numbers. The branch that extends from revision
1.2 is labeled 1.2.1.1, because it is the initial revision derived from number 1.2.
The second set of two numbers is used exactly as the first, with a major and minor
number.
NOTE: Revision numbers can be thought
of as extending revision control's similarity to UNIX file systems. The revision
numbers label versions much the same way directory names identify subdirectories.
By branching, the administrator is able to include the contents of the existing
file in the new version without adding unneeded entries in the original tree. But
what happens when a new shell user signs up? The administrator still has to add the
same information in two places.
Merges
No one wants to do the same thing twice, least of all a probably already overloaded
system administrator. But what mechanism would allow users who are added to the shell
system to show up on the POP system without inadvertently adding POP users to the
list of shell users?
Most revision control systems support merging branches in order to avoid
having to manually add changes. This process allows the administrator to add entries
from the main tree to the branch, without also adding them back to the main tree.
In Figure 23.3 the version 1.4 is merged with 1.2.1.2 to create version 1.2.1.3.
Figure 23.3.
Branched revision tree with a One-way merge.
Merging files can be a very intricate process, and it is a powerful feature that
can be used in many more ways than the one we just covered. For more information,
see Chapter 24's "How do I use RCS?" for details on merging files managed
by RCS, the "Merging" section of Chapter 25 for CVS information, and the
Chapter 26 "Merging Revisions" heading for a method used in SCCS.
File Locking
We've already covered how checking out a file for editing prior to making changes
prevents conflicts. Let's examine a situation where files are changed without the
benefit of file locking. We'll refer to Figure 23.4, where Arthur and Beverly are
trying to finish a web project for a major client.
Figure 23.4.
Two-person Web project without file locking.
Arthur grabs a copy of revision 1.5 of index.html and begins editing
it. While he is making changes, Beverly also grabs a copy of revision 1.5 of index.html
and begins making her changes, independently of Arthur. Arthur checks in his changes
as revision 1.6, reports to his manager that the changes are complete, and confidently
flies to Belize for his two-week scuba diving vacation. Beverly checks in her changes
as revision 1.7, which now contains none of Arthur's changes! Charlie, their manager,
discovers that Arthur's changes are not in the weekly release and calls Arthur to
find out why, completely ruining Arthur's vacation. Note that even though revision
1.7 is the descendant of 1.6, it doesn't contain the changes Arthur made, since the
revision control system simply replaced 1.6 with 1.7. (The system has no way of evaluating
what changes should be applied.)
One way to resolve this conflict is to check out both versions 1.6 and 1.7 (to
different filenames, of course) and merge them. Arthur's vacation, however, is still
ruined.
Figure 23.5.
Two-Person Web project with file locking.
Compare this with the second timeline (Figure 23.5). Arthur grabs a locked copy
of revision 1.5 of index.html and begins editing it. While he is making
changes, Beverly tries to grab a copy of revision 1.5 of index.html, but
the source control system informs her that the revision is locked by Arthur and that
she cannot check it out. Beverly waits for Arthur to finish, or if her changes are
urgent, she contacts Arthur to work out a way to get her changes done quickly. Arthur
checks in his changes as revision 1.6, reports to his manager that the changes are
complete, and blissfully flies to Australia for his four-week scuba diving vacation.
(on which he is spending the bonus he received for implementing a source control
system for the company.) Beverly learns that index.html is no longer locked
and checks out revision 1.6. Beverly checks in her changes as revision 1.7, which
contains both her modifications and Arthur's. Charlie notices that Arthur's changes
are in the weekly release and remembers what a great thing it was that they finally
implemented that source control system after Arthur's previous vacation. (Beverly
tours Spain for two weeks, and Charlie goes home to play golf, leaving the new developer
in charge.)
Keywords
RCS and SCCS enable you to imbed codes into working files that are expanded
(converted) into information about the file when it is checked out. These codes can
help identify the file once it has left the revision control system and also help
you figure out what state the file is in without having to resort to revision control
commands.
Some of the options available are:
- Branch and version information--The system will insert the version of the file
and any applicable branch information.
- Line Number--The line number where the keyword is placed, which can be very useful
for debugging languages that do not have a preprocessor, such a Perl.
- Date and time information--The date and/or time that the file was checked out
and the date that the latest revision was created.
- Module name--The name of the file.
- Author-- The author of the file and also the name of the last person to lock
it (not in SCCS).
- Log message--The revision comments(not in SCCS).
The codes available differ for different systems by a wide margin. See the specific
chapter (and manual pages) for the system you are using for more information.
Symbolic Names, Baselines, and Releases
A symbolic name is a name that is attached to a particular revision of
a file that can be used to refer to it without having to know the revision number.
Therefore, a major milestone in a file's history can be referred to with a name.
NOTE: SCCS does not support symbolic names.
See the section on releases in this chapter and the chapter 26 for a possible workaround.
A baseline is a captured set of revisions that have some special association,
such as "submitted to editor," "compiles successfully," "ran
for two hours without crashing," "released for beta testing." (Of
course, the last two might mean the same thing for some development organizations.)
The ability to create symbolic names is probably the most compelling reason to
use a more sophisticated revision control system, such as RCS or CVS, instead of
SCCS, although SCCS does provide a workaround that should satisfy most situations.
Using Releases to Replace Symbolic Names
Without symbolic names, you can achieve a similar effect using release numbers.
A release is baseline, usually with the property of being released for distribution,
which, depending upon the type of file, is a program that has been provided to
customers in either binary or source form, a document that has been printed and sold
or distributed, or perhaps a document that has simply been submitted to someone for
approval.
Symbolic names can be replaced by manipulating the revision numbers. When the
project hits a milestone, you can either synchronize all of the file's revision numbers
(bring them all to the same level, such as 1.7) or increase the major version number
of the next revision (the next change for all of the files is checked in at 2.1).
The second method works quite well, since most systems will automatically retrieve
the highest minor revision when only a major revision number is specified. So if
a project was released with three files at versions 1.1, 1.5, and 1.7, the system
will automatically retrieve those versions the next time the major revision number
1 is retrieved since no minor number was specified.
Summary
In this chapter we've covered the basic concepts behind revision control, and
how it can be used to manage a variety of activities. We demonstrated how users first
register a file with the system, then check it out for editing and then check it
back in when the changes are done so the system becomes aware of the file's new state.
We then discussed how this series of revisions can be viewed as a revision tree,
and how files can be extracted from the system at any point on that tree. From there
we covered advanced concepts, such as "branching" the tree in order to
create more than one version of a project, and how to view a file's revision history.
The advanced section also covered file locking in order to prevent editing conflicts
and how to have the version control system automatically add annotations to files
when they are checked out. We also touched on the process of merging file revisions
and the use of symbolic names and baselines for versions of projects. By understanding
these concepts you should not only be able to pick a source control system and learn
it rapidly, but also be able to identify situations where adopting a revision control
system will help make you more productive.
RCS, which is covered in depth in chapter 24, has become the most widely used
"free" revision control system, primarily because of it's advanced features
such a symbolic names and it's availability on all UNIX variants. It is also the
basis for CVS, which is covered in depth in Chapter 25. CVS is found in many networked
development environments because it simplifies the process of distributing files
in a controlled manner while tracking changes.
Chapter 26 covers SCCS, which is the simplest of the revision control systems
to learn, and is the system that is most frequently bundled with UNIX variants. It
is commonly used for one or two person projects that need basic file locking and
backup capabilities.
©Copyright,
Macmillan Computer Publishing. All rights reserved.
|