Friday, 16 October 2009

Google Translator Toolkit and minority languages

Today, we've added 285 new languages to Google Translator Toolkit, bringing the total number of languages supported by this product to 345 — and making it possible to translate between 10,664 language pairs. Google Translator Toolkit is a language translation service for professional and amateur translators that builds on Google Translate and makes translation faster and easier.

In addition, we've made the Translator Toolkit interface available in 35 languages, so that more people can access the service in their own language.

At Google, we're focusing on how Translator Toolkit can help preserve and revitalize small and minority languages. Minority languages, also called regional, indigenous, heritage or threatened languages, are languages spoken by the minority people in one locale in a sovereign state or country. Were these endangered languages to become extinct, it would mean an immeasurable loss of knowledge, culture and way of life to minority people worldwide.

For this project we worked with Dr. Te Taka Keegan, a Māori language activist and senior lecturer in computer science at the University of Waikato who spent much of his career on how technology can assist in minority language revitalization. At Google, Dr. Keegan researched how computer-aided translation tools can help preserve minority languages.

To support his research, we released an alpha version of the Translator Toolkit to various members of Māori translation community in Aotearoa (New Zealand). Māori, an Eastern Polynesian language spoken predominately in Aotearoa (New Zealand), is a good starting point because it is one world's 7,000 languages under threat of extinction. According to the 2006 census, 132,000 people can hold a conversation in Māori. That's roughly 24% of Māori or 4% of New Zealanders.

Dr. Keegan found that tools such as Translator Toolkit can help minority languages in several ways:
  • Translation memories and glossaries, when shared across members of a language community, can help unify the language’s written form, increasing translation speed and quality of documents published in that language and preserving the language in the long run.
  • Because computer-aided translation can improve translation speed and quality, translators become more productive. When automatic translation is available, as it is for 87 of Google Translator Toolkit's 345 languages, it increases speed further by producing instant translations that people can use as a starting point for their work. And at Google, we use these human translations to improve the translation algorithm of Google Translate over time, creating a virtuous cycle that benefits both human translators and machine translation.
  • Online presence of small languages keeps languages relevant in the age of the Internet and globalization, encouraging minority language use by children, who are ultimately responsible for bringing the language to future generations.
Languages provide identity, pride, a sense of belonging and spiritual guidance to minority language communities. We hope that by giving both majority and minority language speakers around the world the tools to make online content accessible in their language, we will enable more people to share their culture and knowledge with others worldwide.

Ko te reo te hā te mauri o te Māoritanga
Language is the very life-breath of being Māori. 
(Māori)

Mak-muwekma mak-noono ya roote 'innutka, mak-'uyyaki_,
Nuhu, mak pekre ne tuuxi,
'At mak roote 'innutka hu_i_tak.

Our culture and our language are the way to our past,
From it we embrace the present,
And follow the road to the future.
(Muwekma Ohlone Indian tribe, original residents of San Francisco and Santa Clara Counties, California, the home of Google)