Nov 21
Machine translation services like Google Translate, Babel Fish, and many other commercial software applications seem to do a pretty good job at translating formal texts like news articles and letters. However, the quality and understandability of the output of these translation services diminish when faced with the challenge of translating general ideas, emotions, or slang terminology. The developers of these services have made great strides in the attempt to solve this problem by adding the missing "human element". For example, Google's translation software has an index of the entire web at it's disposal. Using this substantially large corpus, Google is able to provide translations based on texts that were translated by actual human beings.

The corpus it can scan includes all the paper put out since 1957 by the EU in two dozen languages, everything the UN and its agencies have ever done in writing in six official languages, and huge amounts of other material, from the records of international tribunals to company reports and all the articles and books in bilingual form that have been put up on the web by individuals, libraries, booksellers, authors and academic departments.1
Based on personal experience, this method appears to work pretty well... most of the time. What about translating song lyrics? Colloquial sayings/terms that aren't scribed in formal text? Love poems? Google's clever translation algorithm fails miserably when faced with input like this. The simple substitution of words from one language to another, or even the substitution of phrases would not solve this problem because of a number of reasons. For example, if one were attempting to translate some creative text like a poem or song, the author's use of colorful language would confuse the machine translator, whereas a human translator would be able to make connections between figurative ideas and the actual text.

How can we teach computers to understand figurative language? How can we teach computers to love? Well, like Google's algorithm for language translation, we can use the content of the web to yield interesting results. Currently, Google depends on having the targeted language in some sort of context with the source language to complete it's translation. So, for example, if we are trying to translate a Turkish sentence into English, some news outlet may have an article that has been translated into both languages by humans. Google's algorithm would detect that, and use it's corpus of translated texts and other data (like a Turkish-English dictionary or a list of phrases) to piece together an English sentence. The obvious flaw in this technique is that not every piece of content on the web is translated into many different languages (English in our example).

Suppose we are trying to translate a Korean phrase, "쭉쭉빵빵". According to a Korean slang dictionary I found on the web (I do not speak Korean), this phrase means "sexy", or "looks so cool". Running this phrase through Google Translate yielded this:

... not very helpful. Whenever Google is completely unable to translate something, it will simply return the Romanized text (which is what it did above).

Let's consider one type of content on the web that has been forgotten in this process: images. Images are universal, language independent, and given a large enough sample size, any human can get the general idea or "theme" behind a series of images. If we run our example Korean phrase through Google's Image search engine, we get the following (relatively safe for work) results:

We appear to be getting a lot of images of women in our results. Not ordinary women though, sexy women. We can most definitely intuitively infer the meaning of the phrase based on the results we got from the preceding series of images.

Unfortunately, computers aren't able to "infer" any meaning behind images. However, using image fingerprinting technology such as that developed by a company called Idée (which spawned a product called TinEye), we can use the results gathered from an image search query to vaguely match a result in a database of images that have been tagged by humans with various keywords. In our example, we would most likely match an image of a scantily clad woman which in our database could be tagged "sexy", "hot", etc. Thus yielding a pretty accurate translation of the actual meaning of the word.

Of course there are some problems with this approach as well. One problem that comes to mind is deciding which part of a sentence should be run through this image processing algorithm, and which part should be translated using a context-sensitive text only web search. There is also a great deal of ambiguity in image search results that could cause problems. There are also some emotions/ideas that simply cannot be expressed in graphical form. Despite this method's flaws, I do believe it could be a step toward a more accurate translation result. Would be an interesting research topic to explore some day.



+ New Comment


all UI design visual design multi-touch NLP ling-uistics algo-rithms