With various search engines like Google coming into picture, finding information has become a very easy task. All you have to do is enter your keyword and find a complete list of results from all over the web instantly. Ever wondered how does Google compile such huge data within few seconds? Its almost like searching a book with huge amount of data. Google follows three basic processes in delivering the desired search results for your query.
- Crawling
- Indexing
- Serving
Crawling: A large number of websites being created and updated regularly on the world wide web. Hence it becomes important for Google to keep a track and update them to its database. The job is performed by an automated web browser or crawler called the Googlebot. Its aim is to discover new and updated pages to be added to the Google index. Number of sites to be crawled and with what frequency is determined by a computer generated algorithm. The process is initiated by gathering a list of web pages from previous crawl process. Next thing Googlebot does is visit each of these web sites to detect new links, dead links and updated sites. The information is then added to the list of pages to be crawled for updating the Google index. The frequency of a Google crawl is determined by the relative importance of your site. A quality website with good content is likely to attract the crawler more frequently. The crawler uses two metrics to measure the importance of a web site. The first one is the number of links pointing to your website. Google doesn’t give importance to your website unless there are quality links pointing to it. The other factor is the number of pages present within your web site. The crawl rate increases with large and constantly growing sites.
Indexing: Once crawling is done the next step is to compile an index of all the words it sees with their location on each page. Pages provided by the Googlebot are stored in the index data base. The main aim of compiling this database is to provide rapid access to documents containing the user queries. To improve the search speed common words called the stop words ( the, in, is etc) and some punctuations are ignored. Also the indexer concerts the letters into lower case to improve performance.
Serving: Once a keyword is entered in the search box the Google’s query processor comes into action. The processor looks into the index for matching pages and returns the relevant results within a fraction of seconds. The relevancy with which these results are displayed depends upon a number of factors. One such important factor which is considered is the page rank. It is the measure of the importance Google gives to a particular page. This is determined by the number of external links pointing to your web site.
The whole process a search engine follows can be summed in three simple steps. Once a query is entered by the user it is sent to the index server by the web server. The index server then retrieves the stored documents matching the user query. The relevant search results are then presented to the user within a fraction of a second.




