Research on Node Ranking – Peer to Peer …. Hoàng Cường
Chapter 1
Table of Contents
Hoàng Cường 1
HÀ NỘI - 2010 1
1.1.2 Architecture of P2P systems 10
Simplified algorithm 16
Damping factor 17
Motivation 22
Mathematical details 22
Example 23
multi-document summarization, which brings out salient
sentences from sentence graphs; 29
mining symbolizeative experts in the data-mining commu-
nity from co-author graphs. In comparison to the state-of-
the-art graphed based methods, our method produces su-
perior results in these applications 29
We propose a framework based on an exact and an ap-
proximate solution to estimate PageRank on a sub-graph.
The Rank_local_idea algorithm is an exact solution. It as-
sumes that the PageRank scores of outside data are known.
We prove that the Rank_local_idea scores for data in the
sub-graph converge to the true PageRank scores 35
Since the PageRank scores of outside data may not typi-
cally be available, we propose the Pagerank_local algo-
rithm to estimate PageRank scores for the sub-graph. Both
iii
Research on Node Ranking – Peer to Peer …. Hoàng Cường
Rank_local_idea and Pagerank_local symbolize the set of
outside data with an outside node graph EG and extend
the sub-graph with links to graph EG. They also modify
the PageRank transition matrix with respect to (the links
to) graph EG. 35
The Rank_local_idea and Pagerank_local framework for-
malizes the problem of ranking a sub-graph. It allows us to
model multiple scenarios where ranking a sub-graph is im-
portant. Rank_local_idea can be used to model scenarios
where PageRank scores of the global graph are known a
priori and can potentially be re-used. This includes the
case where the sub-graph symbolizes the data that have
been updated, or the sub-graph symbolizes the data that
contain all the semantic types of interest to a domain ex-
pert for a personalized or semantic ranking such as Objec-
tRank. Pagerank_local can be applied in general to all
these problems, when we do not know the PageRank scores
of outside data. 35
We compare our approach with the stochastic comple-
mentation (SC) approach. SC builds a super-graph by
carefully examining candidate outside data and adding
them into the super-graph if adding this page has a signifi-
cant influence on the PageRank scores of the sub-graph.
Our approach in contrast models the outside data using a
iv
Research on Node Ranking – Peer to Peer …. Hoàng Cường
node graph EG, and it can be used in situations when a su-
per-graph cannot be obtained easily. Our approach also
avoids the cost of a global computation. The Pagerank_lo-
cal computation is also much cheaper than SC since SC
bears the high cost of constructing the super-graph. 35
We experimentally study the effect of size and type of the
sub-graphs on the accuracy of Pagerank_local. We study
several types of sub-graphs including domain specific sub-
graphs, topic specific sub-graphs, and sub-graphs gathered
by a Breadth First Search crawler. We compare our re-
sults with SC and two baseline ranking algorithms; one
was discussed in [18], and the other is local PageRank on
the sub-graph (ignoring the outside data). We show that
Pagerank_local has similar or superior ranking accuracy
to SC and typically its runtime performance is an order of
magnitude better than SC. Pagerank_local also outper-
forms the two baseline algorithms on ranking accuracy.
Our contributions are as follows: 35
Sum = 0.206 + 0.2862 = 0.4922 44
E = ( , = (0.418529053, 0.581470947) 44
Sum = 0.4922 46
1.http://www.google.com Google. Google search engine 57
3.Amy N. Langville & Carl D. Meyer (2006) Google's PageRank and Beyond: The Science of
Search Engine Rankings information retrieval. Research and Development in Information Retrieval,
pp. 334-342 57
v
Research on Node Ranking – Peer to Peer …. Hoàng Cường
List Images
vi
Research on Node Ranking – Peer to Peer …. Hoàng Cường
List tables
Table 3.2.1: The Pagerank converge and HITS converge Error: Reference source not
found
Chapter 1:
7
Research on Node Ranking – Peer to Peer …. Hoàng Cường
Peer to Peer and Ranking Problem
1.1. Peer to Peer
A peer-to-peer, commonly abbreviated to P2P, is any distributed network archi-
tecture conceive in associate that make a portion of their resources (such as processing
power, disk storage or network bandwidth) directly available to other network partners,
without the need for central coordination instances (such as servers or stable hosts).
Peers are both suppliers and consumers of resources, in contrast to the traditional client–
server model where only servers put out, and clients snack.
Peer-to-peer was popularized by file sharing systems like Napster. File sharing is
the practice of distributing or providing access to digitally stored information, such as
computer programs, multi-media (audio, video), documents, or electronic books. It may
be implemented through a variety of storage, transmission, and distribution models and
common methods of file sharing incorporate manual sharing using removable media,
centralized computer file server installations on computer networks, World Wide Web-
based hyperlinked documents, and the use of distributed peer-to-peer networking.
1.1.1. Peer to Peer overview
In its simplest form, a peer-to-peer (P2P) network is created when two or more
PCs are connected and share resources without going through a separate server. A P2P
network can be an ad hoc connection—a couple of computers connected via a Universal
Serial Bus to transfer files. A P2P network also can be a permanent infrastructure that
links a half-dozen computers in a small office over copper wires. Or a P2P network can
be a network on a much grander scale in which special protocols and applications set up
direct relationships among users over the Internet.
The initial use of P2P networks in business followed the deployment in the early
1980s of free-standing PCs. In contrast to the mini-mainframes of the day (e.g.
Fuitsu/ICL, IBM AS/400, IBM Mainframe, Unisys, … ), which used by over 16,000 or-
ganizations, the list, or selections from this list, will be of particular interest to those
companies who either supply medium sized to large scale systems or who offer products
and/or services related to the use of such systems; any data is supplied with the named
head of IT, as standard
8
Research on Node Ranking – Peer to Peer …. Hoàng Cường
Mini- mainframes served up word processing and other applications to dumb ter-
minals from a central computer and stored files on a central hard drive, the then-new
PCs had self-contained hard drives and built-in CPUs. The smart boxes also had on-
board applications, which meant they could be deployed to desktops and be useful with-
out an umbilical cord linking them to a mainframe.
Shared file and printer access within a local area network may either be based on
a centralized file server or print server, sometimes denoted client–server paradigm, or on
a decentralized model, denoted peer-to-peer network topology or Workgroup (computer
networking). In client–server communications, a client process on the local user com-
puter takes the initiative to start the communication, while a server process on the file
server or print server remote computer passively waits for requests to start a communi-
cation session. In a peer-to-peer network, any computer can be server as well as client.
It’s fantastic and efficient.
In effect, every connected PC is at once a server and a client. There's no special
network operating system residing on a robust machine that supports special server-side
applications like directory services (specialized databases that control who has access to
what).
Image 1.1.1 : “Peer to Peer” means “connected together”
In a P2P environment, Unlike client-server networks, where network information
is stored on a centralized file server PC and made available to tens, hundreds, or thou-
sands client PCs, the information stored across peer-to-peer networks is uniquely decen-
tralized. Because peer-to-peer PCs have their own hard disk drives that are accessible by
all computers, each PC acts as both a client (information requestor) and a server (infor-
9
Research on Node Ranking – Peer to Peer …. Hoàng Cường
mation provider). In the diagram below, three peer-to-peer workstations are shown. Al-
though not capable of handling the same amount of information continuance that a
client-server network might, all three computers can communicate directly with each
other and share one another's resources.
1.1.2 Architecture of P2P systems
Peer-to-Peer Architecture distinguishes itself by its distribution of power and
function. Rather than concentrating its power in the server, Peer-to-Peer
models rely on the power and bandwidth of participants. They form ad
hoc connections between nodes for sharing all kinds of information and files.
Peer-to-Peer discards hierarchical notions of clients and servers (clients at
the top, servers on the bottom) and replaces it with equal peer nodes that
function simultaneously as clients and servers. This also discards the idea of
a central server, which exists in Client-Server Architecture.
There are several classifications of Peer-to-Peer networks. These include
pure/hybrid and structured/unstructured Peer-to-Peer networks. Pure
P2P networks merge the role of clients and servers as equals and do not provide
a central server for managing the network or a central router that forwards
requests to other networks.
Hybrid P2P models, on the other hand, do contain a central server that stores peer
information and responds to request for information stored on that server. In this config-
uration, peers host available resources since there no central server provides this func-
tion.
Peers also make central servers aware of what resources they want to share and
make those resources available to peers that request them. Also, route terminals function
as used addresses and are indexed to find an absolute address.
The structure of P2P networks is determined by the nature of the overlay net-
work, which consists of all participating peers as equal nodes. Nodes in an overlay net-
work are connected through virtual or logical links that create a path to the underlying
network.
Essentially, overlay networks are network built on top of other networks. Peer-to-
Peer networks are considered overlay networks because they are usually built on top of
the Internet. Structured P2P networks use a global protocol so that searches can be
routed TO any peers/nodes BY any peers/nodes on the network.
To retrieve rare files, more structured overlay links are required. The most com-
mon structured P2P network is the distributed hash table (DHT). DHTs are decentralized
distributed systems that store names and values. Any participating node in the network
can lookup and retrieve values. Maintenance of the DHT mapping is distributed among
the nodes. The ownership of each file is assigned to a peer, but the addition or deletion
of peers or files doesn’t cause major disruptions. This makes them very scalable.
Unstructured P2P networks establish links more arbitrarily. To join, a peer only
has to copy the links of an existing node and then add its own links as it develops. To
find a desired file, however, the request must be flooded throughout the network. This
10
Research on Node Ranking – Peer to Peer …. Hoàng Cường
doesn’t always necessarily return the desired results if the file being requested is rare.
There is no correlation between peer and content. Also, flooding increases network traf-
fic, slowing down responses and file sharing.
The primary advantage of P2P networks is that all clients contribute their re-
sources. These resources include computing power, bandwidth, and storage space. In
traditional Client-Server models there are a fixed number of servers, so the addition of
clients slows down network processing. In Peer-to-Peer models, as nodes are added, sys-
tem resources increase (contributed by the added nodes) to accommodate demand.
Comparisons
Client-Server Architecture provides certain advantages over these other network
models. For example, Client-Server models offer easier maintenance, security, and ad-
ministration. For example, encapsulation makes it possible for servers to be repaired, up-
graded, or replaces without clients being affected.
Encapsulation is the process by which an object can hide its data and methods
without revealing them to users. Also, because all data is stored on servers, the data is
more secure. Servers control access and ensure that only screened clients can access and
manipulate data. Again, since data is centralized on servers, updates occur on the server
and are then transmitted to clients as they request services.
In P2P models, updates must be applied and copied to peers in the network,
which requires a lot labor and is prone to errors. However, Client-Server paradigms
often suffer from network traffic congestion. This is not a problem for P2P, since
network resources are in direct proportion to the number of peers in the network. Also,
Client-Server paradigms lack the robustness of P2P networks. Robustness refers to a
network’s ability to bounce back or continue functioning if one of the components fails.
If a server fails in Client-Server models, the request cannot be completed. In P2P, a node
can fail or abandon the request. Other nodes still have access to resources needed to
complete the download.
1.1.3. Distributed hash tables
Image 1.1.3: Distributed hash tables example
Distributed hash tables (DHTs) are a class of decentralized distributed systems
that provide a lookup service similar to a hash table: (key, value) pairs are stored in the
DHT, and any participating node can efficiently retrieve the value associated with a
11
Research on Node Ranking – Peer to Peer …. Hoàng Cường
given key. Responsibility for maintaining the mapping from keys to values is distributed
among the nodes, in such a way that a change in the set of participants causes a minimal
amount of disruption. This allows DHTs to scale to extremely large numbers of nodes
and to handle continual node arrivals, departures, and failures.
DHTs form an infrastructure that can be used to build peer-to-peer networks. No-
table distributed networks that use DHTs include BitTorrent's distributed tracker, the
Kad network, the Storm botnet, YaCy, and the Coral Content Distribution Network.
1.2. Ranking in Peer to Peer networks
1.2.1 Introduction
This thesis discusses the execution distributed page ranking technology in the
peer-to-peer network crown which constructs. The distribution page ranking needs,
because net's size grows by the remarkable speed, and the centralized page ranking is
not may promote. Open style system PageRank is proposed in the paper base in Google
use traditional PageRank.
Image 1.2.1: System must to have the ranking engine to find the one
We then proposed that some distributed page ranking algorithm, proves their
convergence partially, and discusses some interesting products they. The indirect
12
Không có nhận xét nào:
Đăng nhận xét