The scaling web engine:

The WWWW , has been indexing over 110,000 web pages;

The main components for building a effective search engine:

(1) The faster crawler

(2) The large storage

(3) The queries must be handled fastly

(4) and also the fast indexing system

The another problem:

The web pages are expanding but the people viewing the results are not changed.

The google best practice:

(1) Page Rank (2) And using the link to improve the search results

**Page Rank Calculation:**

The assumption:

The user just randomly choose the webpage and never hits back but later on he has to start again.

The random probability that the surfer will browse the page is the PAGE RANK.

And the dampen factor is that the surfer will be easily get board and will start all over again.

We assume page A has pages

**T1...Tn which point to it**(i.e., are citations). The parameter d

is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are

more details about d in the next section. Also C(A) is defined as the number of links going

out of page A. The PageRank of a page A is given as follows:

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

Note that the PageRanks form a probability distribution over web pages, so the sum of all

web pages’ PageRanks will be one

The google Architecture overview:

## 没有评论:

## 发表评论