Nearly a decade ago, computer scientists at Carnegie Mellon University embarked on a project with an astonishingly lofty goal: Digitize the published works of humankind and make them freely available online.
The architects of the Universal Digital Library initiative on Nov. 27 said they have surpassed their latest target, having scanned more than 1.5 million books—many of them in Chinese—and are continuing to scan thousands more daily.
“Anyone who can get on the internet now has access to a collection of books the size of a large university library,” said Raj Reddy, a computer science and robotics professor at the university who led the project.
Much of the recent work in the Million Book Project has been carried out by workers at scanning centers in India and China, helped by $3.5 million in seed funding from the U.S. National Science Foundation and in-kind contributions from computer hardware and software makers.
The United States, China, and India each have contributed $10 million to the project, undertaken with partners at China’s Zhejiang University, India’s Indian Institute of Science, and Egypt’s Library at Alexandria.
At least half the books are out of copyright or scanned with the permission of copyright holders. Excerpts of copyright-protected works are available, though organizers expect complete texts to become available eventually.
The project is not the first of its kind. Online search-engine operator Google Inc. and software giant Microsoft Corp. have begun similar endeavors—though Carnegie Mellon representatives say theirs is the largest university-based digital library of free books and that its purpose is noncommercial.
An alternative book-scanning effort called the Open Content Alliance also seeks to scan books from public and university libraries, and many universities have spurned Google and Microsoft in favor of this approach. (See “Google’s book scanning faces competition.”)
The Universal Digital Library’s Million Book Project is a step toward the creation of an online library that would make traditionally published books available to anyone, said Reddy. “The economic barriers to the distribution of knowledge are falling,” he said in a statement.
Michael Shamos, a Carnegie Mellon computer science professor and copyright lawyer working on the project, said the library’s mission includes making vast amounts of information freely available and preserving rare and decaying texts, among other things.
Books have been borrowed for scanning from various institutions and individuals worldwide, though institutions in Europe declined to participate, he said.
The digital library so far contains books published in 20 languages, including 970,000 in Chinese, 360,000 in English, 50,000 in the southern Indian language of Telugu, and 40,000 in Arabic.