Sunday, December 10, 2006

Powered by Python

I am so much hooked to Google but never thought much about the technology they use. The need never arised, that may be the reason. While googling about Google Ad Words I found this paper The Anatomy of Search Engine This is great source to learn about Google's search technology and it come from the founders of Google.
Now here comes the shocker " In order to scale to hundreds of millions of web pages, Google has a fast distributed crawling system. A single URLserver serves lists of URLs to a number of crawlers (we typically ran about 3). Both the URLserver and the crawlers are implemented in Python. Each crawler keeps roughly 300 connections open at once. This is necessary to retrieve web pages at a fast enough pace. At peak speeds, the system can crawl over 100 web pages per second using four crawlers. This amounts to roughly 600K per second of data." Does any one still doubts power of a scripting languages?

0 Comments:

Post a Comment

<< Home