2014年3月27日星期四

Cross Language Information Retrieval

1. Major Approaches and Challenges in CLIR
The mistakes of the translation
ambiguities exists in translation

Three major challenges in  CLIR 
1.what to translate
2. How to obtain translation knowledge
3.how to apply the translation knowledge

Identify the translation unit: 
Three types of resources:
(1) bilingual dictionaries
(2) corpora focus
(3) word translation,phrase translation

Tokenization:
The process of recognizing words, Chinese characters will be more difficult


Stemming:
Three types of corpora, part of speech, tagging, syntactic parsing
To use bi-gram method and parse the words and use the decreasing order to rank and the highest point is recognized as the phrases.

Stop-words:


Obtaining Translation Knowledge: 
Acquire the translation knowledge:
And extract the translation knowledge from translation resources

Obtain the bilingual resources dictionaries and corpora:

Extracting the translation knowledge:

Dealing with the out-of-corpors term: 
Transliteration: Orthographic Mapping(Share the same alphabet order)  and Phonetic Mapping.
Backoff Translation: (1) Match the surface form of the input terms (2) Match the stem form of the input terms (3)Using web mining to help to identify the words
pre_translation query expansion

Using Translation Knowledge: 
Translation Disambiguation:

Weighting Translation Alternative: 
IF_DF weighting

Cranfield Method Evaluation of the interactive system:






没有评论:

发表评论