Nov 21
Machine translation services like Google Translate, Babel Fish, and many other commercial software applications seem to do a pretty good job at translating formal texts like news articles and letters. However, the quality and understandability of the output of these translation services diminish when faced with the challenge of translating general ideas, emotions, or slang terminology. The developers of these services have made great strides in the attempt to solve this problem by adding the missing "human element". For example, Google's translation software has an index of the entire web at it's disposal. Using this substantially large corpus, Google is able to provide translations based on texts that were translated by actual human beings.

The corpus it can scan includes all the paper put out since 1957 by the EU in two dozen languages, everything the UN and its agencies have ever done in writing in six official languages, and huge amounts of other material, from the records of international tribunals to company reports and all the articles and books in bilingual form that have been put up on the web by individuals, libraries, booksellers, authors and academic departments.1
Based on personal experience, this method appears to work pretty well... most of the time. What about translating song lyrics? Colloquial sayings/terms that aren't scribed in formal text? Love poems? Google's clever translation algorithm fails miserably when faced with input like this. The simple substitution of words from one language to another, or even the substitution of phrases would not solve this problem because of a number of reasons. For example, if one were attempting to translate some creative text like a poem or song, the author's use of colorful language would confuse the machine translator, whereas a human translator would be able to make connections between figurative ideas and the actual text.

How can we teach computers to understand figurative language? How can we teach computers to love? Well, like Google's algorithm for language translation, we can use the content of the web to yield interesting results. Currently, Google depends on having the targeted language in some sort of context with the source language to complete it's translation. So, for example, if we are trying to translate a Turkish sentence into English, some news outlet may have an article that has been translated into both languages by humans. Google's algorithm would detect that, and use it's corpus of translated texts and other data (like a Turkish-English dictionary or a list of phrases) to piece together an English sentence. The obvious flaw in this technique is that not every piece of content on the web is translated into many different languages (English in our example).

Suppose we are trying to translate a Korean phrase, "쭉쭉빵빵". According to a Korean slang dictionary I found on the web (I do not speak Korean), this phrase means "sexy", or "looks so cool". Running this phrase through Google Translate yielded this:





... not very helpful. Whenever Google is completely unable to translate something, it will simply return the Romanized text (which is what it did above).

Let's consider one type of content on the web that has been forgotten in this process: images. Images are universal, language independent, and given a large enough sample size, any human can get the general idea or "theme" behind a series of images. If we run our example Korean phrase through Google's Image search engine, we get the following (relatively safe for work) results:





We appear to be getting a lot of images of women in our results. Not ordinary women though, sexy women. We can most definitely intuitively infer the meaning of the phrase based on the results we got from the preceding series of images.

Unfortunately, computers aren't able to "infer" any meaning behind images. However, using image fingerprinting technology such as that developed by a company called Idée (which spawned a product called TinEye), we can use the results gathered from an image search query to vaguely match a result in a database of images that have been tagged by humans with various keywords. In our example, we would most likely match an image of a scantily clad woman which in our database could be tagged "sexy", "hot", etc. Thus yielding a pretty accurate translation of the actual meaning of the word.

Of course there are some problems with this approach as well. One problem that comes to mind is deciding which part of a sentence should be run through this image processing algorithm, and which part should be translated using a context-sensitive text only web search. There is also a great deal of ambiguity in image search results that could cause problems. There are also some emotions/ideas that simply cannot be expressed in graphical form. Despite this method's flaws, I do believe it could be a step toward a more accurate translation result. Would be an interesting research topic to explore some day.

Sources:
1 http://www.independent.co.uk/life-style/gadgets-and-tech/features/how-google-translate-works-2353594.html

0 comments

Jun 18
I had worked in the IT industry for almost three years. When dealing with clients who are far from technically savvy, I noticed a couple of interesting things about how "regular" people attack user interfaces. 

Strangely, one of the most frustrating things about how these clients interact with a user interface is that they do so very logically. A logical strategy in learning how a user interface works seems like it would be the best solution. However, many user interfaces today are designed more with intuition in mind rather than a logical, more procedural layout. This technique may sound counter-intuitive, but consider the controls of an automobile. If the layout for a car was more "logical", all controls would be labeled buttons. A button to steer left, a button to steer right. Although just about anyone would be able to figure out how to use this interface simply by reading the labels on the buttons, it lacks a sort of self-evidence. This is important, because in one of my observations of user interaction, people will touch objects before they read labels. This is why the concept of the steering wheel is so brilliant; it's completely axiomatic. No one in their right mind would misinterpret a steering wheel as an accelerator, as it only turns left and right, instilling the concept of horizontal control. As soon as one grabs a steering wheel, it becomes clear how the automobile is steered left and right, there's no reading labels, menus, or sounding out instructions in ones head. It is an innate control. What makes this an innate control? My thought is that it all boils down to how humans manipulate objects in space.

Moving things typically involves picking the object up first, then dropping it somewhere else (hence the brilliance of drag-and-drop in the traditional mouse driven user interface). It is interesting how this contrasts with a touch screen interface, however. Most people can't fathom picking up objects on a two dimensional plane, so how would drag and drop work in that case? An interesting example of this is how Apple accomplishes moving icons around iPhone OS devices. To those unfamiliar, to move an application icon's position on the home screen, the user presses and holds on the icon for about a second, the icon "lifts" up, and all the other icons around it move around as the user slides the icon around the screen. This is incredibly intuitive, because it's similar to how humans interact with flat objects on a two dimensional plane, like playing cards on a desk. How this contrasts with drag-and-drop, however, is that since human interaction is done through a device that's usually not within the user's line of sight (the mouse), this level of interaction is abstracted. Thus allowing user interfaces on a desktop computer to be more of a product of invention, rather than simulating objects in the real world. This is also why I don't think virtual reality gaming will ever take off on touch screen devices.

Does this mean there's more room for innovation using the old, out dated mouse driven hardware since there's no tangible, real world metaphor to emulate? I think that the birth of the multitouch user interface is not only innovating in the mobile space, it's a harbinger to a better desktop UI experience as well. It will be interesting, however, to see what happens to the mouse five years from now.
44 comments

Categories
all UI design visual design multi-touch NLP ling-uistics algo-rithms