Digital Preservation and Copyright

By Peter B. Hirtle

If all information in the world was written on clay tablets or carved into marble, its preservation would be greatly simplified. Even paper, when manufactured and stored properly, can have a life measured in hundreds of years. Today, however, much of the information being produced is digital,[1] and digital formats are notoriously fragile. Either the media on which the information is stored becomes unreadable, or the hardware and software needed to read the work becomes obsolete. Think of that old 8" floppy disk in the back of the drawer with your attempt from twenty years ago to write the Great American Novel (in WordStar). The magnetic data might not still be readable; drives that can read the disk are scarce; and few word processing packages today can understand WordStar documents.

To preserve analog information resources, it is often sufficient to house them in a benign environment. In particularly bad cases, it might be necessary to make a microfilm or xerographic copy of the original, but copying is the exception rather than the rule. Digital preservation, however, starts with copying. At a minimum, files need to be copied from obsolete or decaying media, such as 8" floppy disks or 5 " floppies, to current storage media. Good preservation practice requires much more, including making multiple copies of files. Digital documents may need to be changed from WordStar to WordPerfect to Word format, or perhaps even converted to PDF or XML format. Every time you use a digital file, you must copy it. When digital documents are displayed in a computer, they are copied from the storage medium into the RAM memoryof the computer where it is then displayed. Digital preservation and access is all about copying.

In copyright law, copying is known as "reproduction," and it's one of the exclusive rights of the copyright owner.[2] The right to publicly display a work is also an exclusive right of the copyright owner,[3] as is the right to make an adaptation, known as a "derivative work."[4] Our desire to keep digital information around for the future runs smack into the exclusive rights of the copyright owner.

Fortunately, while there is no general exemption for preservation activities in copyright law, there are exemptions that can help individuals and especially libraries and archives legally preserve expressive works for the future. There are some specific exemptions for certain types of actions and for certain actors. Furthermore, in the absence of a specific exemption, one can always consider fair use as a defense when making a preservation copy.

Is it copyrighted? Do you own the copyright?

Even before you start looking for exemptions in the copyright law, it is always a good idea to check first to determine if an item really is copyrighted. Since there have not been any registration or notice requirements for copyright protection since 1989, most digital information is copyrighted as soon as it is created. But there are exceptions. Works created by the federal government are in the public domain,[5] as are facts,[6] ideas,[7] and expired works. Digital copies of public domain works may themselves be in the public domain.[8]

You also don't need to worry about legal restrictions on preservation if you own the copyright in the work. The copyright in your draft of the Great American Novel most likely belongs to you, and you can do with it what you want. The same goes for the digital photographs you took on vacation last summer.

Let's assume, though, that what you are interested in preserving is copyrighted and that you do not own the copyright in the work. What then? There are at least three specific sections of the copyright law that may be of assistance.

Archiving Computer Programs: 17 USC 117

If the digital file you are interested in saving is a computer program, 17 USC 117 of the United States copyright law can help. This section states that in spite of the copyright owner's exclusive rights, it is permissible for you to make a copy for archival purposes of a copyrighted computer program. A computer program is defined in the law as "a set of statements or instructions to be used directly or indirectly in a computer in order to bring about a certain result."[9] The law allows you to make a copy of the WordStar program (if you legally own it), and even adapt it to run on your Windows XP or Linux machine (if you can), but not share the file with anyone else. The section only applies to the computer program itself. It does not authorize the reproduction or adaptation of documents created with WordStar when the copyright in those documents is owned by someone other than you.

Libraries and Archives Making Preservation Copies: 17 USC 108

Libraries and archives have additional preservation options under 17 USC 108 of United States copyright law. One of the few good things included in the Digital Millenium Copyright Act ("DMCA") was a provision that explicitly allows libraries and archives to make up to three copies of a work for preservation purposes. Unlike the rest of the provisions of Section 108, the items being preserved can be in any format (text, images, sound, etc.). Furthermore, the copies can be digital, so long as they are not distributed digitally nor made available to the public in a digital format outside the premises of the library or archives.

In order to take advantage of the exception, libraries and archives must follow certain ground rules. They must be either open to the public or allow access to non-affiliated researchers; the copying cannot be for "direct or indirect commercial advantage"; the library or archives must own a legal copy of the original item; and any copies made must carry with them a notice of copyright.[10] If the work is unpublished, preservation copies can be made for the purpose of preservation or security.[11] If the work is published, preservation copies can be made to replace an original that is "damaged, deteriorating, lost, or stolen, or if the existing format in which the work is stored has become obsolete." The law stipulates that a format is obsolete "if the machine or device necessary to render perceptible a work stored in that format is no longer manufactured or is no longer reasonably available in the commercial marketplace." The library or archives must also conduct a reasonable investigation to confirm that an unused copy cannot be obtained at a fair price. If digital copies are made, access to the digital version must be limited to the premises of the library or archives.[12]

Using Section 108, libraries and archives can start preserving old digital files in their collections. It does not help them, however, preserve materials that they do not own, such as networked resources or Web sites. Nor does Section 108 help individuals who want to preserve a digital files they may have legally acquired or obtained from the Internet. For this sort of preservation, we must rely on fair use.

Fair Use Preservation by Individuals and Libraries: 17 USC 107

Since individuals cannot use Section 108 to make copies, even for preservation purposes, they must turn to the Fair Use provision in US copyright law. Mary Minow provides a highly readable overview of fair use in "How I learned to love FAIR USE..."[13] At the heart of the fair use exemption is the assessment of the four factors that constitute fair use: Purpose of the use, Nature of the work, Amount or substantiality used, and Market impact (PNAM). What might a fair use argument for digital preservation look like?

It is likely that most preservation copying would meet Minow's PNAM test. As Robert Oakley has noted about preservation copying in general:

Virtually everyone views preservation copying as socially beneficial. It is consistent with the Constitutional purposes for copyright since the preservation of printed knowledge is necessary for the progress of science and the useful arts.[14]

If preservation is being done for non-commercial, socially beneficial reasons, it seems likely that the "Purpose" factor would lean towards fair use.

The nature of digital works, the second fair use factor, can vary greatly, but Congress seems open to preserving a wide variety of material when preservation is at stake.[15] The "Nature" factor, then, might also support a fair use.

The third factor, the Amount and substantiality copied, might normally weigh against a finding of fair use, since the item is being copied in its entirety. But the Supreme Court has noted, "the extent of permissible copying varies with the purpose and character of the use."[16] Obviously, if the purpose is to preserve a work, then the entire work must be copied. The amount copied is appropriate for the purpose, and so a court might even find this use fair.

The fourth factor, the Market impact of making of a preservation copy, is likely to be the most important in any fair use assessment, and unfortunately it is almost impossible to guess how a court might rule on this. Would the courts conclude that digital information is like the computer programs protected by 17 USC 117, which can be migrated and adapted to run on new platforms without compensation to the copyright owner? Or would the courts conclude that purchasing a copy of a work does not give you the right to copy it onto new media or transform it into new formats into perpetuity? Would they decide that individuals, like libraries copying under 17 USC 108, must first determine if an unused copy can be purchased before a preservation copy can be made? Unfortunately, there have been no cases involving digital preservation that can serve as indicators of how the courts might rule.

As is always the case with fair use, you can't really know if your use is fair until a court determines if it is fair. Nevertheless, when considering the preservation problems of motion pictures, the Senate concluded that given the great danger of loss, "making of duplicate copies for purposes of archival preservation certainly falls within the scope of 'fair use.' "[17] We can hope that the courts might accept a similar argument for equally fragile digital information.

Preserving the World Wide Web: A Specific Case

As the World Wide Web has become an ever-more important information resource, there has been growing interest in preserving parts of it. Examples of Web site preservation projects include: flooding in the Red River Valley,[18] the national election in 2000,[19] the response to the events of 11 September 2001,[20] and the web pages of the Clinton White House.[21] The Internet Archive has sought to capture and preserve a sizeable portion of the entire World Wide Web.[22] Other projects have sought to preserve the national Web of Sweden,[23] Denmark,[24] and the Nordic Countries.[25] On a more local scale, many universities and other organizations are beginning to wonder how they might capture and preserve Web pages associated with, but not necessarily owned by, them.

Most information found on the Web is automatically copyrighted when created. The groups that want to preserve Web pages, however, are often not the copyright owners of those Web pages. We can presume that the copyright owner has granted an implied license to allow people to copy a Web page to a local machine and display it there; after all, if they did not want people to be able to read a page (which in the Web environment means making a temporary copy on your local machine), they would not have put the document up on the Web. But is there implied permission to copy and preserve Web pages whose copyright you do not own? If not, can such actions qualify as a fair use?

The most ambitious attempt to preserve the Web, the Internet Archive and its "Wayback Machine," allows you to retrieve outdated Web pages from multiple points in time. The Internet Archive has attempted to bolster a possible fair use defense in a number of ways. First, it allows Web page producers to "opt out" of the archives. It does this by offering instruction on how to use a "robots.txt" file to prevent its crawlers from retrieving new pages. Presence of a "robots.txt" file will also prevent Wayback Machine users from accessing previously harvested pages. In addition, under certain conditions, the Internet Archive will remove material from its holdings.[26]

The Internet Archive's willingness to respect the wishes of those copyright owners who want to limit and control the reproduction of their copyrighted works reduces the Archive's risk of infringement suits. At the same time, it diminishes the utility of the archive as a whole by excluding important parts of the Web. For example, eBay is one of the most successful commerce initiatives in Internet history. The use policy for eBay stipulates that users must "not use any robot, spider, scraper or other automated means to access the Site for any purpose without our express written permission." [27] Any Web archiving initiative that respects the terms of eBay's policy will not capture the eBay site, diminishing the value of the archive as documentation of the history of the Internet.

It is unclear how much protection the Internet Archive's policies really provide. As one recent analysis concluded:

In short, the Internet Archive largely ignores copyright law in the process of collecting its material, provides only a limited (and, arguably, effectively valueless) protection for the material once stored, and in effect disclaims any responsibility for what is done with the material by the end user, as well as any liability that the end user may incur in accessing the material. Given the litigious nature of the US, it will be interesting to see if the Internet Archive's success in avoiding litigation over its activities will continue for much longer.[28]

Licensing and the DMCA: The Final Hurdles

Even if you determine that 17 USC 117, 108, or 107 can be used to legally preserve a digital resource, you may still not be out of the woods. Licensing contracts, for example, can override these sections. You should carefully read any licensing agreements you or your institution has signed before making preservation copies of licensed resources.

Further, under the DMCA, if the digital resource is protected by a technology that controls access to the resource, you cannot legally bypass the access control mechanism, even to preserve it.[29] On October 28, 2003, the Librarian of Congress issued a rulemaking that rejected the idea of a general preservation exception to the terms of the DMCA.[30] Two of the four exceptions found in the rule making might be of some assistance in preservation. For the next three years, users may legally bypass access control mechanisms that rely on dongles that have malfunctioned. They may also bypass access control mechanisms in computer programs and video games distributed in formats that have become obsolete and which require the original media or hardware as a condition of access.


Good preservation practice has often existed in a legal gray area. Libraries usually made three copies when microfilming long before the law gave explicit permission for the practice, and many radio programs have been saved only because individuals systematically taped them from the air, without the permission of the copyright owner.[31] Digital preservation resides in an even murkier legal gray area because of the fundamental need to copy digital information (one of the exclusive rights of the copyright owner) in order to preserve it. In addition, there is greater interest in preserving works that you may not own, particularly web pages. The lack of legal certainty, however, should not prevent individuals and libraries from undertaking the socially beneficial task of preserving digital information. The law explicitly authorizes some preservation actions (especially if the materials are not made digitally available to others), and a strong fair use defense can be built outside the library or archives.

More Resources

In addition to the items listed in the notes below, the following publications provide useful information on the legal issues associated with preservation, particularly the preservation of Internet resources:

June Besek, "Copyright issues relevant to the creation of a digital archive: a preliminary assessment" (Washington, D.C.: Council on Library and Information Resources and the Library of Congress, 2003) <http://>

Adrienne Muir, "Copyright and licensing for digital preservation" Library and Information Update, June 2003 <>

Dwayne Buttler and Kenneth Crews, "Copyright Protection and Technological Reform of Library Services" in Tomas Lipinski, ed., Libraries, Museums, and Archives: Legal Issues and Ethical Challenges in the New Information Era (Lanham, Maryland: Scarecrow Press, 2002)


Mary Minow and Nancy McGovern provided invaluable advice during the preparation of this article.

Peter B. Hirtle is Director for Instruction and Learning in the Instruction, Research, and Information Services Division of Cornell University Library. Hirtle also serves as the Intellectual Property Officer for the Cornell University Library. Previously while at Cornell, Hirtle served as Director of the Cornell Institute for Digital Collections where he explored the use of emerging technologies to expand access to cultural and scientific sources through the development and management of distinctive digital collections. He also served as the Associate Editor of D-Lib Magazine , a monthly magazine about innovation and research in digital libraries.

