Friday 9 October 2009

A tale of 10,000,000 books

The fundamental reasons why the electric car has not attained the popularity it deserves are (1) The failure of the manufacturers to properly educate the general public regarding the wonderful utility of the electric; (2) The failure of [power companies] to make it easy to own and operate the electric by an adequate distribution of charging and boosting stations. The early electrics of limited speed, range and utility produced popular impressions which still exist.
This quotation would hardly surprise anyone who follows electric vehicles. But it may be surprising to hear that in the year when it was written thousands of electric cars were produced, and that year was nearly a century ago. This appeared in a 1916 issue of the journal Electrical World, which I found in Google Books, our searchable repository of millions of books. It may seem strange to look back a hundred years on a topic that is so contemporary, yet I often find that the past has valuable lessons for the future. In this case, I was lucky — electric vehicles were studied and written about extensively early in the 20th century, and there are many books on the subject from which to choose. Because books published before 1923 are in the public domain, I am able to view them easily.

But the vast majority of books ever written are not accessible to anyone except the most tenacious researchers at premier academic libraries. Books written after 1923 quickly disappear into a literary black hole. With rare exceptions, one can buy them only for the small number of years they are in print. After that, they are found only in a vanishing number of libraries and used book stores. As the years pass, contracts get lost and forgotten, authors and publishers disappear, the rights holders become impossible to track down.

Inevitably, the few remaining copies of the books are left to deteriorate slowly or are lost to fires, floods and other disasters. While I was at Stanford in 1998, floods damaged or destroyed tens of thousands of books. Unfortunately, such events are not uncommon — a similar flood happened at Stanford just 20 years prior. You could read about it in The Stanford-Lockheed Meyer Library Flood Report, published in 1980, but this book itself is no longer available.

Because books are such an important part of the world’s collective knowledge and cultural heritage, Larry Page, the co-founder of Google, first proposed that we digitize all books a decade ago, when we were a fledgling startup. At the time, it was viewed as so ambitious and challenging a project that we were unable to attract anyone to work on it. But five years later, in 2004, Google Books (then called Google Print) was born, allowing users to search hundreds of thousands of books. Today, they number over 10 million and counting.

The next year we were sued by the Authors Guild and the Association of American Publishers over the project. While we have had disagreements, we have a common goal — to unlock the wisdom held in the enormous number of out-of-print books, while fairly compensating the rights holders. As a result, we were able to work together to devise a settlement that accomplishes our shared vision. While this settlement is a win-win for authors, publishers and Google, the real winners are the readers who will now have access to a greatly expanded world of books.

There has been some debate about the settlement, and many groups have offered their opinions, both for and against. I would like to take this opportunity to dispel some myths about the agreement and to share why I am proud of this undertaking. This agreement aims to make millions of out-of-print but in-copyright books available either for a fee or for free with ad support, with the majority of the revenue flowing back to the rights holders, be they authors or publishers.

Some have claimed that this agreement is a form of compulsory license because, as in most class action settlements, it applies to all members of the class who do not opt out by a certain date. The reality is that rights holders can at any time set pricing and access rights for their works or withdraw them from Google Books altogether. For those books whose rights holders have not yet come forward, reasonable default pricing and access policies are assumed. This allows access to the many orphan works whose owners have not yet been found and accumulates revenue for the rights holders, giving them an incentive to step forward.

Others have questioned the impact of the agreement on competition, or asserted that it would limit consumer choice with respect to out-of-print books. In reality, nothing in this agreement precludes any other company or organization from pursuing their own similar effort. The agreement limits consumer choice in out-of-print books about as much as it limits consumer choice in unicorns. Today, if you want to access a typical out-of-print book, you have only one choice — fly to one of a handful of leading libraries in the country and hope to find it in the stacks.

I wish there were a hundred services with which I could easily look at such a book; it would have saved me a lot of time, and it would have spared Google a tremendous amount of effort. But despite a number of important digitization efforts to date (Google has even helped fund others, including some by the Library of Congress), none have been at a comparable scale, simply because no one else has chosen to invest the requisite resources. At least one such service will have to exist if there are ever to be one hundred.

If Google Books is successful, others will follow. And they will have an easier path: this agreement creates a books rights registry that will encourage rights holders to come forward and will provide a convenient way for other projects to obtain permissions. While new projects will not immediately have the same rights to orphan works, the agreement will be a beacon of compromise in case of a similar lawsuit, and it will serve as a precedent for orphan works legislation, which Google has always supported and will continue to support.

Last, there have been objections to specific aspects of the Google Books product and the future service as planned under the settlement, including questions about the quality of bibliographic information, our choice of classification system and the details of our privacy policy. These are all valid questions, and being a company that obsesses over the quality of our products, we are working hard to address them — improving bibliographic information and categorization, and further detailing our privacy policy. And if we don’t get our product right, then others will. But one thing that is sure to halt any such progress is to have no settlement at all.

In the Insurance Year Book 1880-1881, which I found on Google Books, Cornelius Walford chronicles the destruction of dozens of libraries and millions of books, in the hope that such a record will “impress the necessity of something being done” to preserve them. The famous library at Alexandria burned three times, in 48 B.C., A.D. 273 and A.D. 640, as did the Library of Congress, where a fire in 1851 destroyed two-thirds of the collection.

I hope such destruction never happens again, but history would suggest otherwise. More important, even if our cultural heritage stays intact in the world’s foremost libraries, it is effectively lost if no one can access it easily. Many companies, libraries and organizations will play a role in saving and making available the works of the 20th century. Together, authors, publishers and Google are taking just one step toward this goal, but it’s an important step. Let’s not miss this opportunity.


(This first appeared in the New York Times, available here.)