Web indexation: where does Qwant’s independence stand?

We say everything !

Team Qwant

16 November 2018

3min

When it comes to search engines, there is still a lot of confusion between the meta-search engines that simply display results provided by others in a different layout, and the independent search engines that index web content themselves and have their own algorithms for ranking the results. At Qwant, we have been creating a true independent search engine since the first day, by indexing the Web ourselves and developing our own algorithms. This allows us to provide you revelant results without having to collect your personal data.

This is extremely important to ensure a real European technological sovereignty. It was in fact unusual for our knowledge of the Web to depend on one or two American actors, who decide for most of Europeans what is relevant to their research, by imposing their perception and their self-interest.

We have invested significantly in the creation of our index and are investing more and more every day. At the time we publish these pages, Qwant has 20 billion indexed web pages on its servers, and every day our crawlers go through more than a billion web pages to add, delete those that no longer exist, or update all the information about them. To our knowledge, Qwant has the largest indexation capacity in Europe.

However, you still read too often that Qwant uses Bing, as if Qwant was just a simple meta-search engine that doesn’t have its own technologies. This mistake, for example, was reported by the law blog Precisement.org, which makes a simple comparison between Bing’s and Qwant’s results, without knowing how things really work in reality. He notes that 51% of the results are the same (which shows that 49% are different). “Qwant’s index and its search technologies, by all appearances (…) are provided by Microsoft’s Bing,” it writes.

Thankfully, this is not the case! Here is for example what a small part (2000 links) of what Qwant indexes on Precisement.org looks like. It is a visual representation generated with Graphee, an internal tool that has been developed and distributed in open source, which allows you to visualize links between pages of a website or between different websites:

Each point you see in this image is a page on the site, with an assigned weight calculated by our algorithms to determine the importance of the page.

These points are generated from the indexation data that Qwant made. For example here is a CSV extract that currently lists more than 6100 referenced pages with which there are links to Precisement.org:

And of course, we store a copy of the site’s content to index it and evaluate its relevance to the keywords searched for by our users:

Tens of millions of websites are thus present in our index and we come back to it very often with our crawlers (more often for the big very popular sites, less often for the small sites rarely updated). In reality, Qwant uses Bing to complement research results on which we do not have enough relevance, and on images where storage capacities are very big. Moreover, the main SEO logics are often the same, which explains why you often find the same search results, classified slightly differently according to the weight given to one or the other. But we are changing our algorithms every day. The shift towards total independence is therefore progressive, and this is indeed the direction taken by Qwant, difficult to see from the outside!

Relative posts