HomeSCS Home
School of Computer Science School of Computer Science  
News
EducationResearch People About
 
 
CSD
RI
ISRI
HCII
LTI
CALD
CALD
 
 
 
 

 

CALENDAR OF EVENTS

 

 SCS Calendar Events

 Search for Events by Date

 Submit an Event to the SCS Calendar



July 2008

 
  1   2   3   4   5  
6   7   8   9  10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
26 27 28 29 30
31

 



August 2008

 
  1   2   3   4   5  
6   7   8   9  10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
26 27 28 29 30
31

 

When: Thursday, May 15, 09:00 a.m.

Where: 3002 Newell-Simon Hall

Jae Dong Kim

LTI PhD Thesis Proposal

Abstract:
In general, corpus-based machine translation systems prefer longer units because they naturally convey local context and local reordering. Our lexical Example-Based Machine Translation (EBMT) system also uses long matches of the input so that it takes advantage of keeping local context and reordering. However, its translation score calculation to find a target phrase given a match had been based on heuristics and we needed a mathematically more reasonable model.

On the other hand, analyzing sentences into their chunks instead of N-gram phrases may help a translation system in several ways. As there are now fewer translation units per sentence, there is less distortion(reordering) to be reckoned with. Hence, less noise is to be expected from the mathematical modeling techniques. Another advantage is that we can to some degree systemically translate untranslatable tokens that exist only in one side. For example, when we translate an English sentence into Korean, the word-to-word translation systems cannot produce a nominative case marker in Korean unless rules are given by human experts or the systems "hallucinate" markers and use language modeling to guess whether or not the case marker should in fact be present.

In this proposal, we show how our new phrasal aligner SPA improved the system and discuss what to investigate for SPA and a chunk-based system. For the chunk-based system, we propose methods on chunk alignment using SPA and other SMT techniques, chunk generalization with chunk labels and chunk use as a basic translation unit in conjunction with a word-based system as a back-off model.

<< Back

Email

 
HomeSCS Home   ARCHIVES
Contact Info