2014年2月7日星期五

Probabilistic Information Retrieval and other models

Muddies Point:
1. What are the differences between the bayes model and bayes network models.
2. What will be IR models closing related to other similar methods.
3 What are the disadvantages for choosing the probability models.

The probability models:
The conditional event: P(A, B) = P(A ∩ B) = P(A|B)P(B) = P(B|A)P(A)
From these we can derive Bayes’ Rule for inverting conditional probabilities:
Bayes' Rule:
P(W|L)={\frac  {P(L|W)P(W)}{P(L)}}={\frac  {P(L|W)P(W)}{P(L|W)P(W)+P(L|M)P(M)}}

The 1/0 loss case
In the simplest case of the PRP, there are no retrieval costs or other utility concerns that would differentially weight actions or errors. You lose a point for either returning a nonrelevant document or failing to return a relevant document (such a binary situation where you are evaluated on your accuracy
1/0 LOSS is called 1/0 loss). The goal is to return the best possible results as the top k
documents, for any value of k the user chooses to examine. The PRP then says to simply rank all documents in decreasing order of P(R = 1|d, q). BAYES OPTIMAL a set of retrieval results is to be returned, rather than an ordering, the Bayes

C0 · P(R = 0|d) − C1 · P(R = 1|d) ≤ C0 · P(R = 0|d′) − C1 · P(R = 1|d′)

The Binary Independence Model  first present a model which assumes that the user has a single
step information need. As discussed in Chapter 9, seeing a range of results might let the user refine their information need. Fortunately, as mentioned there, it is straightforward to extend the Binary Independence Model so as to provide a framework for relevance feedback, and we present this model in

Since each xt is either 0 or 1, we can separate the terms to give:
O(R|~x,~q) = O(R|~q) · t:xt=1
P(xt = 1|R = 1,~q)
P(xt = 1|R = 0,~q)
t:xt=0
P(xt = 0|R = 1,~q)
P(xt = 0|R = 0,~q)

Adding 1
2 in this way is a simple form of smoothing. For trials with categorical outcomes (such as noting the presence or absence of a term), one way to estimate the probability of an event from data is simply to count the number of times an event occurred divided by the total number of trials.This is referred to as the relative frequency of the event. RELATIVE FREQUENCY Estimating the prob MAXIMUM LIKELIHOOD ability as the relative frequency is the maximum likelihood estimate (or MLE), ESTIMATE MLE because this value makes the observed data maximally likely.

Finite automata and language models
If instead each node has a probability distribution over generating different terms, we have a language model. The notion of a language model is inherently probabilistic. A language model is a function LANGUAGE MODEL that puts a probability measure over strings drawn from some vocabulary. That is, for a language model M over an alphabet S:

Under the unigram language model the order of words is irrelevant, and sosuch models are often called “bag of words” models, as discussed in Chapter 6 (page 117). Even though there is no conditioning on preceding context, this model nevertheless still gives the probability of a particular ordering of
terms. However, any other ordering of this bag of terms will have the same probability. So, really, we have a multinomial distribution MULTINOMIAL over words. So long DISTRIBUTION as we stick to unigram models, the language model name and motivation could be viewed as historical rather than necessary.



没有评论:

发表评论