[Date Prev] [Date Next] [Thread Prev] [Thread Next] Indexes: Main | Date | Thread | Author

[ba-ohs-talk] Graph structure in the web


SAN JOSE, PALO ALTO & SAN MATEO, CA, May 11, 2000 -
Scientists from IBM Research, Compaq Corporate Research Laboratories and
AltaVista Company have completed the first comprehensive "map" of the
World Wide Web, and uncovered divisive boundaries between regions of the
Internet that can make navigation difficult or, in some cases,
impossible.    (01)

This study—the largest ever to be conducted on the topography of the
Web—is part of an ongoing, collaborative project by AltaVista, Compaq
and IBM. The researchers expect to update the study on a regular basis
from collected data using AltaVista's search engine and advanced
connectivity server software with Compaq AlphaServer system containing
16 gigabytes of RAM, enough to hold the entire Web map in memory. IBM
Research analyzed the data and contributed to the development of the
"Bow Tie" Theory."    (02)

"Bow Tie" Theory Explains the Four Regions of the Web
"The image of the Web that emerged through the research was that of a
bow tie. Four distinct regions make up approximately 90% of the Web (the
bow tie), with approximately 10% of the Web completely isolated from the
entire bow tie.    (03)

The "strongly-connected core" (the knot of the bow tie) contains almost
one-third of all Web sites. Web surfers can easily travel between these
sites via hyperlinks, this large "connected core" is at the heart of the
Web.    (04)

One side of the bow contains "origination" pages, constituting almost
one-quarter of the Web. "Origination" pages are pages that allow users
to eventually reach the connected core, but cannot be reached from it.
The other side of the bow contains "termination" page, constituting
approximately almost one-quarter of the Web. "Termination" pages can be
accessed from connected core, but do not link back to it. The fourth and
final region contains "disconnected" pages, constituting approximately
22% of the Web. Disconnected pages can be connected to origination
and/or termination pages but are not accessible to or from the connected
core."    (05)

< http://research.compaq.com/news/map/www9%20paper.htm >    (06)

Abstract
"The study of the web as a graph is not only fascinating in its own
right, but also yields valuable insight into web algorithms for
crawling, searching and community discovery, and the sociological
phenomena which characterize its evolution.  We report on experiments on
local and global properties of the web graph using two Altavista crawls
each with over 200M pages and 1.5 billion links.  Our study indicates
that the macroscopic structure of the web is considerably more intricate
than suggested by earlier experiments on a smaller scale."    (07)

.... "There are several reasons for developing an understanding of this
graph:    (08)

1.Designing crawl strategies on the web.
2.Understanding of the sociology of content creation on the web.
3.Analyzing the behavior of web algorithms that make use of link
information.  To take just one example, what can be said of the
distribution and evolution of PageRank values on graphs like the web?
4.Predicting the evolution of web structures such as bipartite cores and
webrings, and better algorithms for discovering and organizing them.
5.Predicting the emergence of new, yet unexploited phenomena in the web
graph."    (09)


----------------------------------------------------
Sign Up for NetZero Platinum Today
Only $9.95 per month!
http://my.netzero.net/s/signup?r=platinum&refcd=PT97    (010)