2014年3月25日星期二

Multilingual Information Access

This essay is mainly talked about the Multi-language models called cross language retrieval model.

This model is to help the user to improve their cross language search in the text.

There are benefits for polygots: 
language. Polyglots can benefit from MLIA in at least three ways: (1) they can find documents in more than one language with a single search, (2) they can formulate queries in the language(s) for which their active vocabulary is largest, and (3) they can move more seamlessly across languages over the
course of an information seeking episode than would be possible if documents written in
different languages were available only from different information systems.

Background for people why they want to do the multi-language searching: 
1. The cold war background
2. The world war two
3. The world wide web (The emergence)
4. IBM: The machine can learn to translate by statistical analysis.

Cross Language information retrieval:  
(1) language and character set identification, (2) language-specific processing, and (3) construction of an “inverted index” that allows rapid identification of which documents contain specific terms. Sometimes the language in which a document is written and the character set used to encode it can be inferred from its source (e.g.., New York Times articles are almost always written in English, and typically encoded in ASCII) and sometimes the language and character set might be indicated using metadata (e.g., the HTML standard used for Web pages provides metadata fields for these purposes).

The distinguishing technique for the Cross Language Information Retrieval: 
(1) translate each term using the context in which that word appears to help select the right translation, (2) count the terms and then translate the aggregate counts without regard to the context of individual occurrences, or (3) compute some more sophisticated aggregate “term weight” for each term, and then translate those weights.

The Problems still exists in the Cross language search: 
1. The egg or the basket problem
2. Present ranking problem
3. The similar drives for the development of the new search engine

没有评论:

发表评论