K-12 administrators working to integrate non-English-speaking students into the mainstream classroom environment might benefit from a new technology that allows spontaneous translation from one language to another.

The technology, recently tested in a video conference at Carnegie Mellon University, enabled testers to ask questions using casual speech patterns and receive spontaneous translations in return.


The Consortium for Speech Translation Advanced Research (C-STAR) was established in 1991 to conduct research in spoken language translation. It has grown over the past seven years from four partner laboratories to 20 partner or affiliate labs in the USA, Japan, Germany, Korea, Italy, France, and Switzerland.

The recent international video conference consisted of scientists at several labs posing as mock “tourists” and planning trips to Heidelberg, Kyoto, and New York City by conversing with each other in their native languages and allowing the computerized translation devices to bridge the language barrier.

For example, a Carnegie Mellon graduate student asked a scientist posing as a travel agent in Kyoto, “What time is it in Japan?” He asked his question in English. Shortly thereafter, he received a digitally translated response to the Kyoto scientist’s Japanese answer, “It’s 1 a.m. in Japan.”

Participants speaking six different languages booked imaginary flights, made hotel reservations, and received tour information via the web-based spontaneous translation system.

Perhaps the most impressive demonstration of the speech-translation technology available was the live video link to a Carnegie Mellon student in Germany, touring the grounds of Heidelberg Castle and using speech translation devices to ask German speakers to take his picture, give him directions, and relay information on tourist sites.

He did all this wearing a headset, with a laptop in his backpack, and a computer the size of a paperback book strapped to his arm. Researchers are currently working to develop smaller versions of the paperback-size machine for the commercial marketplace.

School applications

Advanced technology such as this might one day foster instant communication between English-speaking teachers and non-English speaking students. This would be a significant development, because it is no longer rare for school districts to have more than 100 distinct languages spoken by students.

In the school division serving Fairfax County, Va., for example, English as a Second Language classes are available in 145 schools county-wide for 11,099 students, grades 1-12, who collectively speak more than 100 languages. With the increasing need to accommodate English as a Second Language, many such schools stand to benefit from new speech translation technology.

Envision this scenario: You are in charge of developing a curriculum in which 25 seventh-grade students, all of whom speak little or no English, must develop a basic understanding of the Renaissance as part of their history requirement. Through the use of speech translation devices, these ESL students can be led on a virtual tour of the Vatican Museum by an authority on Renaissance art and antiquities via a web link directly to their classroom. These students can ask questions in their native language and receive answers in the same language translated from Italian.

The new technologies becoming available are the most user-friendly translation devices ever developed. Though computerized speech translation has been around for a while, the less-advanced technology only allowed for a limited vocabulary with perfect syntax and grammar.

The latest devices, such as Carnegie Mellon’s JANUS speech translation system, are far superior to previous devices, according to C-STAR chairman Alex Waibel. “Speech recognition systems have been improved to handle the sloppy speech people produce when talking spontaneously with each other,” he said. “The um’s, er’s, interruptions, hesitations, and stutterings of spontaneous speech are automatically recognized, filtered, and properly prepared for translation.”

There have also been advances in machine translation, with two primary translation modes.


The Interlingua approach first transfers people’s speech into an intermediary language—which represents their intended meaning—before translating it into the second language. This method is beneficial because researchers do not have to build translations between every pair of languages, just between the languages and the Interlingua.

The second approach is example-based, and uses parallel bodies of text to indicate phrases that correspond. These types of translations can actually learn from examples.

For now, speech recognition programs are limited to talk related to travel, such as itineraries, flight information, and bookings, but the far-reaching implications for educators are obvious.

“This type of technology works on one semantic domain at a time,” said Maxine Ezkenazi, system scientist at Carnegie Mellon’s Language Technology Institute. How soon the scientists get to education might depend on money. “Working on a new domain depends on funding,” said Ezkenazi.

If language barriers were effectively eliminated through the help of advanced speech recognition systems, kids could link directly to sources of information from all over the world.

“Using a CD-Rom is passive,” Ezkenazi observed. “Speech makes you active, because the opportunity to ask questions and receive spontaneous answers is there.”

Consortium for Speech Translation Advanced Research (C-STAR)


Carnegie Mellon University Language Technologies Institute