1. Major Approaches and Challenges in CLIR
The mistakes of the translation
ambiguities exists in translation
Three major challenges in CLIR
1.what to translate
2. How to obtain translation knowledge
3.how to apply the translation knowledge
Identify the translation unit:
Three types of resources:
(1) bilingual dictionaries
(2) corpora focus
(3) word translation,phrase translation
Tokenization:
The process of recognizing words, Chinese characters will be more difficult
Stemming:
Three types of corpora, part of speech, tagging, syntactic parsing
To use bi-gram method and parse the words and use the decreasing order to rank and the highest point is recognized as the phrases.
Stop-words:
Obtaining Translation Knowledge:
Acquire the translation knowledge:
And extract the translation knowledge from translation resources
Obtain the bilingual resources dictionaries and corpora:
Extracting the translation knowledge:
Dealing with the out-of-corpors term:
Transliteration: Orthographic Mapping(Share the same alphabet order) and Phonetic Mapping.
Backoff Translation: (1) Match the surface form of the input terms (2) Match the stem form of the input terms (3)Using web mining to help to identify the words
pre_translation query expansion
Using Translation Knowledge:
Translation Disambiguation:
Weighting Translation Alternative:
IF_DF weighting
Cranfield Method Evaluation of the interactive system:
没有评论:
发表评论