主题报告: How Big is the Web? What does it look like?
报 告 人: André Trudel [Acadia University in Canada]
报告时间: 8月19日(周四)10:00~11:00
报告地点: 上海大学延长校区行健楼734室
邀 请 人: 张 武 教授
报告简介:
Before this question can be answered, we must decide on a measurement. Possible measurements in decreasing size are the number of URLs, pages, sites, or servers. We are interested in the last measurement. Specifically, we want to count the number of publicly accessible web servers. Every device attached to the internet is assigned a unique identifier called an IP address. There are 4.3 billion addresses. For various reasons, 3.7 billion could potentially host a web server. Obviously, not all do. But, exactly how many?
In theory, measuring the Web is a trivial problem. We check each of the 3.7 billion addresses for a Web server. In practice though, this is a difficult problem. If we had one computer that ran 24/7 and averaged one IP address per minute, it would take 7,000 years to complete a web census. This amount of time is clearly unreasonable. In the past, researchers used estimation techniques.
We discuss the challenges and results from the WORLD’s first Web census that visits every IP address. When one census terminates, we begin a new one in order to gather historical Web data. The results of each census are stored in a database.
Once we have this information, a natural question to then ask is: If we drew the Web on a single sheet of paper, what would it look like?
We needed to use a 9 foot by 9 foot sheet of paper! We show pictures of this poster. We also consider the geographical distribution of the web servers. The web servers are categorized by country which allows us to view which countries have the most web servers and which have none. We also present other visualizations of the data we have collected.
报告人:
Professor André Trudel works at the Jodrey School of Computer Science at Acadia University in Canada. He has been at Acadia since 1989. He received my PhD from the University of Waterloo in 1990.His primary research area is temporal knowledge representation and reasoning. The long term goal of the research program is to produce a rich, robust and efficient temporal knowledge representation and reasoning system capable of solving a wide range of complex real world problems. His secondary research area has the goal of answering the questions "How big is the Web?", "How is it changing over time?", and "What does it look like?". His research group at Acadia is the first in the world to measure the exact size of the Web and visualize the results. Our census software visits every valid IP address to check for the presence of a publicly accessible Web server on port 80. When a Web server is found, information such as the server"s root page is recorded in a database. The database is a snapshot in time of the entire Web. Web census databases will be compared to measure the Web"s decline/growth over time. Our census database has historical importance because with the imminent changeover to IPV6, future Web censuses will probably be impossible. The latest census database can be used as a benchmark for comparing future IPV6 Web size estimates.