We all know why Google is the world’s favourite search engine; the next question is how do they do it? Google likes to keep their secrets close to their chest, but occasionally the Public Relations department lets something slip. Here’s what we have managed to find out:
Google’s processing power
Rather than using one ‘supercomputer’ to handle all search queries Google relies on a network of computer ‘clusters’ worldwide. Each cluster is a group of 1000+ individual machines networked together to act as one computer.
The advantage of a using computer clusters is that if one machine dies; the rest keep on going. Google engineers can either replace the machine or bring it back to life without upsetting the cluster’s ability to process information. Google continues to deliver fast results and the user is unaware there’s even been a problem.
How many pages does Google have indexed?
At last count Google had indexed a staggering 130 trillion web pages (2016) making it by far the largest index of any search engine.
Google’s index is fed by an army of ‘spiders’ which are automated programs that travel around the web, ‘crawling’ from hyperlink to hyperlink. On their travels they record everything they see and save it onto Google’s main database (or index).
Websites with few inbound links, poor traffic and irregular updates may be crawled every few days. However, busy sites can be visited by a spider several times an hour; ensuring that the Google’s database always stays fresh and up-to-date.
How does Google handle search queries?
Bear with us; things are about to get a bit technical: When you type a query into a search box the first thing Google does is relay your query to the best-suited cluster, usually it’s the one that’s geographically the closest or the least busy. Google then retrieves all of the documents it has tagged as containing the key elements of your search query. And because Google looks at one word at a time, this can be a vast amount of data.
Take the following example: Search for ‘a mortgage for a first-time buyer’ and you’ll get approximately half a million results. Google strips out what it deems unimportant (in this case a and for) and returns every document that contain combinations of all or some of the following: mortgage, first-time and buyer.
Searches can refine their results by using what Google terms ‘advanced operators’. Perhaps the best-known of these is to place the search query in quotation marks, which informs Google that you only want results which exactly match the search query. Use quotation marks on the above example and the number of results is reduced to 1.2 million (down from 50 million)
The next step is for Google to order the results and this is where the ranking algorithm comes in (see below). Google’s algorithm sorts the results to display the most relevant and trusted sites first. Specific details including: the page address, title and a short description are grabbed from Google’s database and compiled into the results. And thanks to Google’s mighty processing power; all this takes less time than it takes to type the search query in the first place.
How does Google’s search algorithm work?
The success of Google’s search results is largely down to its ranking algorithm, a mathematical calculation which is applied to every web page to decide its relevancy and importance. Google’s algorithm is based on hundreds of factors; some of which we can guess at, others we can’t. However, it’s Google’s constant tweaking of their algorithm that keeps webmasters on their toes.
A simplified way to understand the algorithm is to think of it as a ‘points’ system. The more points you get; the higher you rank. Now let’s take one element of a web page that we know Google places weight on: the title tag.
Google may choose to award you ‘one ranking point’ if you use a keyword phrase once in the title tag. If you use the same phrase twice you get ‘two ranking points’. But use the phrase three times and Google deducts a point because it’s beginning to look like spam.
Over time more and more webmasters realise that ‘two keyword phrases in the title tag’ is good thing and change their web pages accordingly. Google’s search results begin to get skewed, so they alter the weight placed on keyword density in title tag. Now you only get a point if your keyword phrase appears once, if it appears twice you get nothing. And the search results shuffle accordingly.
Keyword density is one possible algorithmic ranking factor, but there are a myriad others in the title tag alone, such as: the length of title, does it contain keyword synonyms, how often is it updated etc… and that’s just the title tag!
Most search engines rely on algorithms that work in a similar way, what makes Google’s different is the concept of PageRank. Google assigns each web page a relative strength based on the quality of links pointing to it, and it’s this value which helps Google to order the search results (find out more about Google PageRank)
It’s all about Google: Quick Recap
- Google has a user-centric philosophy, focusing on delivering great results as quickly as possible
- Google has cemented its position as the darling of the search engine world through innovation and investment
- Their ranking algorithm and use of PageRank means that nobody else does it quite like Google
- Google remains tight-lipped about exactly how they rank websites, but their Webmaster Guidelines give us some valuable clues
- It isn’t just their software that makes Google unique, but their hardware too. Worldwide computer clusters and multiple data centres means Google can run seamlessly 24/7
- Google’s continually evolving and with it SEO, resting on your laurels really isn’t an option
If you are hungry for more and want the ‘official word’ head over to Google’s jazzy microsite how search works complete with video content from the kind of search himself Matt Cutts.