2014年3月18日星期二

The anatomy of large-scale hyper textual web search engine

How to build the effective search engine as Google in case study.

The scaling web engine:
The WWWW , has been indexing over 110,000 web pages;

The main components for building a effective search engine:
(1) The faster crawler
(2) The large storage
(3) The queries must be handled fastly
(4) and also the fast indexing system

The another problem:
The web pages are expanding but the people viewing the results are not changed.

The google best practice:
(1) Page Rank (2) And using the link to improve the search results

Page Rank Calculation:
The assumption:
The user just randomly choose the webpage and never hits back but later on he has to start again.
The random probability that the surfer will browse the page is the PAGE RANK.

And the dampen factor is that the surfer will be easily get board and will start all over again.

We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d
is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are
more details about d in the next section. Also C(A) is defined as the number of links going
out of page A. The PageRank of a page A is given as follows:

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
Note that the PageRanks form a probability distribution over web pages, so the sum of all
web pages’ PageRanks will be one

The google Architecture overview:

没有评论:

发表评论