Digital
Preservation and Copyright
If all information
in the world was written on clay tablets or carved into marble, its
preservation would be greatly simplified. Even paper, when manufactured and
stored properly, can have a life measured in hundreds of years. Today, however,
much of the information being produced is digital,[1] and digital
formats are notoriously fragile. Either the media on which the information is
stored becomes unreadable, or the hardware and software needed to read the work
becomes obsolete. Think of that old 8" floppy disk in the back of the
drawer with your attempt from twenty years ago to write the Great American
Novel (in WordStar). The magnetic data might not still be readable; drives that
can read the disk are scarce; and few word processing packages today can
understand WordStar documents.
To preserve analog
information resources, it is often sufficient to house them in a benign
environment. In particularly bad cases, it might be necessary to make a
microfilm or xerographic copy of the original, but copying is the exception
rather than the rule. Digital preservation, however, starts with copying. At a
minimum, files need to be copied from obsolete or decaying media, such as
8" floppy disks or 5 " floppies, to current storage media. Good
preservation practice requires much more, including making multiple copies of
files. Digital documents may need to be changed from WordStar to WordPerfect to
Word format, or perhaps even converted to PDF or XML format. Every time you use
a digital file, you must copy it. When digital documents are displayed in a
computer, they are copied from the storage medium into the RAM memoryof the
computer where it is then displayed. Digital preservation and access is all
about copying.
In copyright law,
copying is known as "reproduction," and it's one of the exclusive
rights of the copyright owner.[2]
The right to publicly display a work is also an exclusive right of the
copyright owner,[3]
as is the right to make an adaptation, known as a "derivative work."[4] Our desire to
keep digital information around for the future runs smack into the exclusive
rights of the copyright owner.
Fortunately, while
there is no general exemption for preservation activities in copyright law,
there are exemptions that can help individuals and especially libraries and
archives legally preserve expressive works for the future. There are some
specific exemptions for certain types of actions and for certain actors.
Furthermore, in the absence of a specific exemption, one can always consider
fair use as a defense when making a preservation copy.
Is it copyrighted?
Do you own the copyright?
Even before you
start looking for exemptions in the copyright law, it is always a good idea to
check first to determine if an item really is copyrighted. Since there have not
been any registration or notice requirements for copyright protection since
1989, most digital information is copyrighted as soon as it is created. But
there are exceptions. Works created by the federal government are in the public
domain,[5]
as are facts,[6]
ideas,[7]
and expired works. Digital copies of public domain works may themselves be in
the public domain.[8]
You also don't need
to worry about legal restrictions on preservation if you own the copyright in
the work. The copyright in your draft of the Great American Novel most likely
belongs to you, and you can do with it what you want. The same goes for the
digital photographs you took on vacation last summer.
Let's assume,
though, that what you are interested in preserving is copyrighted and that you
do not own the copyright in the work. What then? There are at least three
specific sections of the copyright law that may be of assistance.
Archiving
Computer Programs: 17 USC § 117
If the digital file
you are interested in saving is a computer program, 17 USC § 117 of the United
States copyright law can help. This section states that in spite of the
copyright owner's exclusive rights, it is permissible for you to make a copy
for archival purposes of a copyrighted computer program. A computer program is
defined in the law as "a set of statements or instructions to be used
directly or indirectly in a computer in order to bring about a certain
result."[9]
The law allows you to make a copy of the WordStar program (if you legally own
it), and even adapt it to run on your Windows XP or Linux machine (if you can),
but not share the file with anyone else. The section only applies to the
computer program itself. It does not authorize the reproduction or adaptation of
documents created with WordStar when the copyright in those documents is owned
by someone other than you.
Libraries and
Archives Making Preservation Copies: 17 USC § 108
Libraries and
archives have additional preservation options under 17 USC § 108 of United
States copyright law. One of the few good things included in the Digital Millenium Copyright Act
("DMCA") was a provision that explicitly allows libraries and
archives to make up to three copies of a work for preservation purposes. Unlike
the rest of the provisions of Section 108, the items being preserved can be in
any format (text, images, sound, etc.). Furthermore, the copies can be digital,
so long as they are not distributed digitally nor made available to the public
in a digital format outside the premises of the library or archives.
In order to take
advantage of the exception, libraries and archives must follow certain ground
rules. They must be either open to the public or allow access to non-affiliated
researchers; the copying cannot be for "direct or indirect commercial
advantage"; the library or archives must own a legal copy of the original
item; and any copies made must carry with them a notice of copyright.[10] If the work
is unpublished, preservation copies can be made for the purpose of preservation
or security.[11]
If the work is published, preservation copies can be made to replace an
original that is "damaged, deteriorating, lost, or stolen, or if the
existing format in which the work is stored has become obsolete." The law
stipulates that a format is obsolete "if the machine or device necessary
to render perceptible a work stored in that format is no longer manufactured or
is no longer reasonably available in the commercial marketplace." The
library or archives must also conduct a reasonable investigation to confirm
that an unused copy cannot be obtained at a fair price. If digital copies are
made, access to the digital version must be limited to the premises of the
library or archives.[12]
Using Section 108, libraries and archives can start preserving old digital files in their collections. It does not help them, however, preserve materials that they do not own, such as networked resources or Web sites. Nor does Section 108 help individuals who want to preserve a digital files they may have legally acquired or obtained from the Internet. For this sort of preservation, we must rely on fair use.
Fair Use Preservation
by Individuals and Libraries: 17 USC § 107
Since individuals
cannot use Section 108 to make copies, even for preservation purposes, they
must turn to the Fair Use provision in US copyright law. Mary Minow provides a
highly readable overview of fair use in "How I learned to love FAIR USE..."[13] At the heart
of the fair use exemption is the assessment of the four factors that constitute
fair use: Purpose of the use, Nature of the work, Amount
or substantiality used, and Market impact (PNAM). What might a
fair use argument for digital preservation look like?
It is likely that most preservation copying would meet Minow's PNAM test. As Robert Oakley has noted about preservation copying in general:
Virtually everyone views
preservation copying as socially beneficial. It is consistent with the
Constitutional purposes for copyright since the preservation of printed
knowledge is necessary for the progress of science and the useful arts.[14]
If preservation is
being done for non-commercial, socially beneficial reasons, it seems likely
that the "Purpose" factor would lean towards fair use.
The nature of
digital works, the second fair use factor, can vary greatly, but Congress seems
open to preserving a wide variety of material when preservation is at stake.[15] The "Nature"
factor, then, might also support a fair use.
The third factor,
the Amount and substantiality copied, might normally weigh against a
finding of fair use, since the item is being copied in its entirety. But the
Supreme Court has noted, "the extent of permissible copying varies with
the purpose and character of the use."[16] Obviously,
if the purpose is to preserve a work, then the entire work must be copied. The
amount copied is appropriate for the purpose, and so a court might even find
this use fair.
The fourth factor,
the Market impact of making of a preservation copy, is likely to be the
most important in any fair use assessment, and unfortunately it is almost
impossible to guess how a court might rule on this. Would the courts conclude
that digital information is like the computer programs protected by 17 USC §
117, which can be migrated and adapted to run on new platforms without
compensation to the copyright owner? Or would the courts conclude that
purchasing a copy of a work does not give you the right to copy it onto new
media or transform it into new formats into perpetuity? Would they decide that
individuals, like libraries copying under 17 USC § 108, must first determine if
an unused copy can be purchased before a preservation copy can be made?
Unfortunately, there have been no cases involving digital preservation that can
serve as indicators of how the courts might rule.
As is always the
case with fair use, you can't really know if your use is fair until a court
determines if it is fair. Nevertheless, when considering the preservation
problems of motion pictures, the Senate concluded that given the great danger
of loss, "making of duplicate copies for purposes of archival preservation
certainly falls within the scope of 'fair use.' "[17] We can hope
that the courts might accept a similar argument for equally fragile digital
information.
Preserving the
World Wide Web: A Specific Case
As the World Wide
Web has become an ever-more important information resource, there has been
growing interest in preserving parts of it. Examples of Web site preservation
projects include: flooding in the Red River Valley,[18] the national
election in 2000,[19]
the response to the events of 11 September 2001,[20] and the web
pages of the Clinton White House.[21] The Internet Archive has sought to
capture and preserve a sizeable portion of the entire World Wide Web.[22] Other
projects have sought to preserve the national Web of Sweden,[23] Denmark,[24] and the
Nordic Countries.[25]
On a more local scale, many universities and other organizations are beginning
to wonder how they might capture and preserve Web pages associated with, but not
necessarily owned by, them.
Most information
found on the Web is automatically copyrighted when created. The groups that
want to preserve Web pages, however, are often not the copyright owners of
those Web pages. We can presume that the copyright owner has granted an implied
license to allow people to copy a Web page to a local machine and display it
there; after all, if they did not want people to be able to read a page (which
in the Web environment means making a temporary copy on your local machine), they
would not have put the document up on the Web. But is there implied permission
to copy and preserve Web pages whose copyright you do not own? If not, can such
actions qualify as a fair use?
The most ambitious
attempt to preserve the Web, the Internet Archive and its "Wayback
Machine," allows you to retrieve outdated Web pages from multiple points
in time. The Internet Archive has attempted to bolster a possible fair use
defense in a number of ways. First, it allows Web page producers to "opt
out" of the archives. It does this by offering instruction on how to use a
"robots.txt" file to prevent its crawlers from retrieving new pages.
Presence of a "robots.txt" file will also prevent Wayback Machine
users from accessing previously harvested pages. In addition, under certain
conditions, the Internet Archive will remove material from its holdings.[26]
The Internet
Archive's willingness to respect the wishes of those copyright owners who want to
limit and control the reproduction of their copyrighted works reduces the
Archive's risk of infringement suits. At the same time, it diminishes the
utility of the archive as a whole by excluding important parts of the Web. For
example, eBay is one of the most successful commerce initiatives in Internet
history. The use policy for eBay stipulates that users must "not use any
robot, spider, scraper or other automated means to access the Site for any
purpose without our express written permission." [27] Any Web
archiving initiative that respects the terms of eBay's policy will not capture
the eBay site, diminishing the value of the archive as documentation of the
history of the Internet.
It is unclear how
much protection the Internet Archive's policies really provide. As one recent
analysis concluded:
In short, the Internet Archive
largely ignores copyright law in the process of collecting its material,
provides only a limited (and, arguably, effectively valueless) protection for
the material once stored, and in effect disclaims any responsibility for what
is done with the material by the end user, as well as any liability that the
end user may incur in accessing the material. Given the litigious nature of the
US, it will be interesting to see if the Internet Archive's success in avoiding
litigation over its activities will continue for much longer.[28]
Licensing and the
DMCA: The Final Hurdles
Even if you
determine that 17 USC §§ 117, 108, or 107 can be used to legally preserve a
digital resource, you may still not be out of the woods. Licensing contracts,
for example, can override these sections. You should carefully read any
licensing agreements you or your institution has signed before making
preservation copies of licensed resources.
Further, under the
DMCA, if the digital resource is protected by a technology that controls access
to the resource, you cannot legally bypass the access control mechanism, even
to preserve it.[29]
On October 28, 2003, the Librarian of Congress issued a rulemaking that
rejected the idea of a general preservation exception to the terms of the DMCA.[30] Two of the
four exceptions found in the rule making might be of some assistance in
preservation. For the next three years, users may legally bypass access control
mechanisms that rely on dongles that have malfunctioned. They may also bypass
access control mechanisms in computer programs and video games distributed in
formats that have become obsolete and which require the original media or
hardware as a condition of access.
Conclusion
Good preservation
practice has often existed in a legal gray area. Libraries usually made three
copies when microfilming long before the law gave explicit permission for the
practice, and many radio programs have been saved only because individuals
systematically taped them from the air, without the permission of the copyright
owner.[31]
Digital preservation resides in an even murkier legal gray area because of the
fundamental need to copy digital information (one of the exclusive rights of
the copyright owner) in order to preserve it. In addition, there is greater
interest in preserving works that you may not own, particularly web pages. The
lack of legal certainty, however, should not prevent individuals and libraries
from undertaking the socially beneficial task of preserving digital
information. The law explicitly authorizes some preservation actions
(especially if the materials are not made digitally available to others), and a
strong fair use defense can be built outside the library or archives.
More Resources
In addition to the
items listed in the notes below, the following publications provide useful
information on the legal issues associated with preservation, particularly the
preservation of Internet resources:
June Besek,
"Copyright issues relevant to the creation of a digital archive: a
preliminary assessment" (Washington, D.C.: Council on Library and
Information Resources and the Library of Congress, 2003) <http://
www.clir.org/pubs/reports/pub112/contents.html>
Adrienne Muir,
"Copyright and licensing for digital preservation" Library and
Information Update, June 2003 <
http://www.cilip.org.uk/update/issues/jun03/article2june.html>
Dwayne Buttler
and Kenneth Crews, "Copyright Protection and Technological Reform of
Library Services" in Tomas Lipinski, ed., Libraries, Museums, and
Archives: Legal Issues and Ethical Challenges in the New Information Era
(Lanham, Maryland: Scarecrow Press, 2002)
Acknowledgements
Mary Minow and Nancy
McGovern provided invaluable advice during the preparation of this article.
[1] Peter Lyman and Hal Varian, "How Much
Information 2003?" <http://www.sims.berkeley.edu/research/projects/how-much-info-
2003/>
Peter B. Hirtle is Director for Instruction and Learning in the Instruction,
Research, and Information Services Division of Cornell University Library.
Hirtle also serves as the Intellectual Property Officer for the Cornell
University Library. Previously while at Cornell, Hirtle served as Director of
the Cornell Institute for Digital Collections where he explored the use of
emerging technologies to expand access to cultural and scientific sources
through the development and management of distinctive digital collections. He
also served as the Associate Editor of D-Lib Magazine
This work is licensed under a Creative Commons
License.