Google putting its digital library to the test

Google Book Search has about 12 million books available.

Google Inc. is giving researchers nearly a half-million dollars to test the academic value of its rapidly growing online library.

The grants announced July 14 will be used to help pay for 12 humanities projects studying questions that will require sifting through thousands of books to reach meaningful conclusions.

Google is hoping the research will validate its long-held belief that making electronic copies of old books will bring greater enlightenment to the world. The company’s critics, though, have argued that the internet search leader has trampled over copyright laws to build a commanding early lead in digital books so it can boost profits.

The winners of Google’s “digital humanities” awards include a project at George Mason University seeking to draw a more accurate portrait of the Victorian age through a deeper analysis of the vocabulary used in the books from that period.

Other researchers at the University of California, Los Angeles and the University of Washington are studying tools and techniques for automated library analysis.

The 23 scholars receiving $479,000 in grants are scattered across 15 universities. They’re eligible to apply next year for the remaining $521,000 that Google has budgeted for its digital humanities awards.

“Over the next year, we’ll provide selected subsets of the Google Books corpus—scans, text, and derived data such as word histograms—to both the researchers and the rest of the world as laws permit,” wrote Google’s Jon Orwant in a blog post about the grants.

Google set out in 2004 to make electronic copies of all the books in the world to feed more compelling material into its online search engine and make more human knowledge available to anyone with an internet connection.

But those book-scanning ambitions quickly became bogged down in legal battles over copyright issues that still haven’t been resolved. Google has made digital copies of more than 12 million books so far, but they aren’t all available to see because of a dispute centered on out-of-print books still protected by copyrights.

Google is still awaiting a federal judge’s ruling on a proposed settlement that would give the company the digital rights to the out-of-print books. The U.S. Department of Justice, consumer watchdog groups, and several of Google’s rivals have objected to the agreement on the grounds that the settlement threatens to give Google too much power in the emerging market for digital books.

The projects and researchers receiving Google’s Digital Humanities Research Awards are:

• Steven Abney and Terry Szymanski, University of Michigan: Automatic Identification and Extraction of Structured Linguistic Passages in Texts.

• Elton Barker, The Open University; Eric C. Kansa, University of California-Berkeley; Leif Isaksen, University of Southampton, United Kingdom: Google Ancient Places (GAP): Discovering historic geographical entities in the Google Books corpus.

• Dan Cohen and Fred Gibbs, George Mason University: Reframing the Victorians.

• Gregory R. Crane, Tufts University: Classics in Google Books.

• Miles Efron, Graduate School of Library and Information Science, University of Illinois: Meeting the Challenge of Language Change in Text Retrieval with Machine Translation Techniques.

• Brian Geiger, University of California-Riverside; Benjamin Pauley, Eastern Connecticut State University: Early Modern Books Metadata in Google Books.

• David Mimno and David Blei, Princeton University: The Open Encyclopedia of Classical Sites.

• Alfonso Moreno, Magdalen College, University of Oxford: Bibliotheca Academica Translationum: link to Google Books.

• Todd Presner, David Shepard, Chris Johanson, James Lee, University of California-Los Angeles: Hypercities Geo-Scribe.

• Amelia del Rosario Sanz-Cabrerizo and José Luis Sierra-Rodríguez, Universidad Complutense de Madrid: Collaborative Annotation of Digitalized Literary Texts.

• Andrew Stauffer, University of Virginia: JUXTA Collation Tool for the Web.

• Timothy R. Tangherlini, University of California-Los Angeles; Peter Leonard, University of Washington: Northern Insights: Tools & Techniques for Automated Literary Analysis, Based on the Scandinavian Corpus in Google Books.

Link:

Google Book Search