The simplest description of how the Yandex search engine works. How the Yandex search engine works Yandex search engine information

1. Terms and definitions In this agreement on the processing of personal data (hereinafter referred to as the Agreement), the following terms have the following definitions: Operator - IE Dneprovsky Oleg Alexandrovich. Acceptance of the Agreement - full and unconditional acceptance of all the terms of the Agreement by sending and processing personal data. Personal data - information entered by the User (the subject of personal data) on the site and directly or indirectly related to this User. User - any physical or entity, who successfully completed the procedure for filling in the input fields on the site. Filling in the input fields - the procedure for sending the User's first name, surname, telephone number, personal address Email(hereinafter - Personal data) to the database of registered users of the site, produced for the purpose of identifying the User. As a result of filling in the input fields, personal data is sent to the Operator's database. Filling in the input fields is voluntary. site - a site located on the Internet and consisting of one page. 2. General provisions 2.1. This Agreement was drawn up on the basis of the requirements of the Federal Law of July 27, 2006 No. 152-FZ "On Personal Data" and the provisions of Article 13.11 on "Violation of the law Russian Federation in the field of personal data "of the Code of Administrative Offenses of the Russian Federation and applies to all personal data that the Operator can receive about the User while using the Site. 2.2. Filling in the input fields by the User on the Site means the User's unconditional consent to all the terms of this Agreement (Agreement Acceptance). In case of disagreement with these conditions, the User does not fill in the input fields on the Site. 2.3. The User's consent to the provision of personal data to the Operator and their processing by the Operator is valid until the termination of the Operator's activities or until the User withdraws consent. By accepting this Agreement and going through the Registration procedure, as well as making subsequent access to the Site, the User confirms that he, acting on his own will and in his interest, transfers his personal data for processing to the Operator and agrees to their processing. The user is notified that the processing of his personal data will be carried out by the Operator on the basis of Federal Law No. 152-ФЗ dated July 27, 2006 "On Personal Data". 3. List of personal data and other information about the user to be transferred to the Operator 3.1. When using the Operator's Site, the User provides the following personal data: 3.1.1. Reliable personal information that the User provides about himself / herself when filling in the input fields and / or in the process of using the services of the Site, including last name, first name, patronymic, phone number (home or mobile), personal e-mail address. 3.1.2. Data that is automatically transmitted to the services of the Site in the course of their use using the software installed on the User's device, including the IP address, information from Cookies, information about the User's browser (or other program through which the services are accessed). 3.2. The operator does not verify the accuracy of the personal data provided by the User. In this case, the Operator assumes that the User provides reliable and sufficient personal information on the issues proposed in the Input Fields. 4. Purposes, rules for the collection and use of personal data 4.1. The operator processes personal data that are necessary to provide services and provide services to the User. 4.2. The User's personal data is used by the Operator for the following purposes: 4. 2.1. User identification; 4.2.2. Providing the User with personalized services and services (as well as informing about new promotions and services of the company by sending letters); 4.2.3. Maintaining communication with the User, if necessary, including sending notifications, requests and information related to the use of services, the provision of services, as well as processing requests and applications from the User; 4.3. During the processing of personal data, the following actions will be performed: collection, recording, systematization, accumulation, storage, clarification (update, change), extraction, use, blocking, deletion, destruction. 4.4. The user does not object that the information specified by him in certain cases may be provided to the authorized state bodies of the Russian Federation in accordance with the current legislation of the Russian Federation. 4.5. The User's personal data is stored and processed by the Operator in the manner prescribed by this Agreement, during the entire period of the Operator's activities. 4.6. The processing of personal data is carried out by the Operator by maintaining databases, automated, mechanical, manual methods. 4.7. The site uses Cookies and other technologies to track the use of the Site's services. This data is necessary for optimization technical work Site and improve the quality of service. The Site automatically records information (including URL, IP address, browser type, language, date and time of the request) about each visitor to the Site. The user has the right to refuse to provide personal data when visiting the Site or disable cookies, but in this case, not all functions of the Site may work correctly. 4.8. The confidentiality conditions provided for by this Agreement apply to all information that the Operator can receive about the User during the latter's stay on the Site and using the Site. 4.9. Is not confidential information, publicly disclosed in the course of the execution of this Agreement, as well as information that may be obtained by the parties or third parties from sources to which any person has free access. 4.10. The Operator takes all necessary measures to protect the confidentiality of the User's personal data from unauthorized access, modification, disclosure or destruction, including: Provides constant internal verification of the processes of collecting, storing and processing data and ensuring security; ensures the physical security of data, preventing unauthorized access to technical systems ensuring the operation of the Site, in which the Operator stores personal data; provides access to personal data only to those employees of the Operator or authorized persons who need this information to perform duties directly related to the provision of services to the User, as well as the operation, development and improvement of the Site. 4.11. With regard to the User's personal data, their confidentiality is preserved, except for cases of voluntary provision by the User of information about himself for general access unlimited number of persons. 4.12. The transfer by the Operator of the User's personal data is lawful when the Operator is reorganized and the rights are transferred to the legal successor of the Operator, while all obligations to comply with the terms of this Agreement in relation to the one received by him are transferred to the legal successor. personal information... 4.13. This Statement applies only to the Operator's Site. The Company does not control and is not responsible for the sites (services) of third parties to which the user can click on the links available on the Operator's Site, including in the search results. On such Sites (services), other personal information may be collected or requested from the user, as well as other actions may be performed 5. User's rights as a subject of personal data, modification and deletion of personal data by the user 5.1. The user has the right to: 5.1.2. Require the Operator to clarify his personal data, block or destroy them if the personal data is incomplete, outdated, inaccurate, illegally obtained or not necessary for the stated purpose of processing, as well as take legal measures to protect their rights. 5.1.3. Receive information regarding the processing of his personal data, including information containing: 5.1.3.1. confirmation of the fact of personal data processing by the Operator; 5.1.3.2. the purposes and methods of processing personal data used by the operator; 5.1.3.3. the name and location of the Operator; 5.1.3.4. processed personal data relating to the relevant subject of personal data, the source of their receipt, unless another procedure for submitting such data is provided for by federal law; 5.1.3.5. terms of processing personal data, including the terms of their storage; 5.1.3.6. other information provided for by the current legislation of the Russian Federation. 5.2. Withdrawal of consent to the processing of personal data can be carried out by the User by sending to the Operator an appropriate written (printed on a physical medium and signed by the User) notification. 6. Obligations of the Operator. Access to personal data 6.1. The Operator undertakes to ensure that unauthorized and inappropriate access to the personal data of Users of the Operator's Site is prevented. At the same time, authorized and targeted access to the personal data of the Users of the Site will be considered access to them by all interested parties, implemented within the framework of the objectives of the activities and topics of the Operator's Site. At the same time, the Operator is not responsible for the possible improper use of the Users' personal data, which occurred as a result of: technical problems in software and in technical means and networks outside the control of the Operator; in connection with the intentional or unintentional use of the Operator's Sites not for their intended purpose by third parties; 6.2 The operator takes the necessary and sufficient organizational and technical measures to protect the user's personal information from unauthorized or accidental access, destruction, alteration, blocking, copying, distribution, as well as from other illegal actions of third parties with it. 7. Changes to the Privacy Policy Statement. Applicable law 7.1. The Operator has the right to make changes to this Regulation without any special notification to the Users. When changes are made in the current edition, the date is indicated last update... The new version of the Regulations comes into force from the moment of its posting, unless otherwise provided by the new version of the Regulations. 7.2. The law of the Russian Federation shall apply to these Regulations and the relationship between the User and the Operator arising in connection with the application of the Regulations. I agree I do not accept

Hello dear friends! In this article, we will continue to consider the Yandex search engine, and as you remember, in past articles, the history of the creation of this great company, which ranks first among competitors in Russia and not only, was considered.

All this is good, but newbies and seasoned site builders are interested in the most important question, of course, related to how to bring their projects to the first places of the TOP results.

Therefore, let's look at how the Yandex search engine works in order to understand what kind of rake you can step on, and what you should expect from a Russian search engine.

In the last article, we discussed with you. The topic turned out to be quite interesting and useful. Therefore, I decided to supplement it, deepen it, so to speak.

So, probably, with the question “Why does a search engine index documents?” I got excited - this is a no brainer. It remains to clarify the question "how".

Website ranking algorithms

First, let's take a look at some of the algorithms that are fundamental to any search engine:

- Algorithm for direct search.

What is it - you remember that you read a wonderful story in one of the books. And you start looking in turn. They took one book - leafed through - did not find, took another ... The principle is clear, but this method is extremely long. This is also understandable.

- Reverse search algorithm.

For this algorithm is generated from every page of your blog - a text file is created. This file lists in alphabetical order ALL the words you have used. Even the position of this word in the text is indicated (coordinates in the text).

This is enough quick way, but the search is already taking place with some kind of error.

The main thing here is to understand that this algorithm is not looking for the Internet, not with a blog search. And separately taken text file, which was created a long time ago. When the robot came to you. And these files (reverse indexes) are stored on Yandex servers.

So, these were the basic search algorithms. Those. how Yandex simply finds the documents it needs. There shouldn't be any problems with this.

But Yandex knows not one documents, or even 100 documents, but according to the latest data from my sources - Yandex knows about 11 billion documents (10,727,736,489 pages).

And among all this quantity, you need to select documents that are suitable for the request. And more importantly, you need to somehow rank them. Those. rank according to the degree of importance, or rather, according to the degree of usefulness to the reader.

Search Mathematical Models

To solve this issue, mathematical models come to the rescue. We will now talk about the simplest models.

Boolean mat. Model- If the word occurs in the document, the document is considered found. Just coincidence and nothing complicated.

But there are problems here. For example, if you, as a user, enter some popular word, or even better the preposition "in", which is the most common word in Russian and is found in EVERY document, then you will be given so many results that you do not even realize such a number, how many documents did you find. Therefore, the following mate model appeared.

Vector mat. Model- this model determines the "weight" of the document. Not only does a coincidence occur, but this word must also appear several times. Moreover, the more a word occurs, the higher the relevance (correspondence).

It is the vector model that ALL search engines use.

Probability model- more complex. The principle is this: the search engine found the page reference itself. For example, you are looking for information about the history of Yandex. Yandex has some kind of standard, let's say it will be my previous article about Yandex.

And he will compare all other documents with this article. And the logic here is this: the more a page of your blog looks like my article, the more LIKELY the fact that your blog page will also be useful to the reader and also tells about the history of Yandex.

To reduce the number of documents that need to be shown to the user, the concept of relevance was introduced, i.e. compliance.

How well your blog page really matches the topic. This is an important topic when it comes to search quality.

Assessors - who they are and what are they responsible for

This relevance is also needed to assess the quality of the algorithms.

For this there is a special forces headquarters - they are called Assessors. it special people who browse the search results with their hands.

They have instructions on how to check sites, how to rate, etc. And they manually determine in order whether your pages are suitable for search queries or not.

And the quality of search algorithms depends on the opinion of the assessors. If all the assessors say that the search results do not match the queries, then the ranking algorithm is incorrect, and here only Yandex is to blame.

If the assessors say that only one site does not match the request, it means that the site flies somewhere far away and goes down in the search results. More precisely, not the entire site, but only one article, but this is "not the point."

Of course, assessors cannot view and evaluate ALL articles with their hands and eyes. Well this is understandable.

And other parameters come to the rescue, according to which the ranking of pages is carried out.

There are a lot of them, well, for example:

  • page weight (VIC, PageRank, tumblers all in all);
  • domain authority;
  • the relevance of the text to the request;
  • relevance of texts external links request;
  • as well as many other ranking factors.

Assessors make comments, and people who are responsible for setting mathematical model ranking already, in turn, edit the formula, as a result of which the search engine works better.

The main criteria for evaluating the work of the formula:

1. Accuracy of search engine results- percentage of documents that match the request (relevant). Those. the fewer pages not matching the request are present, the better.

2. Completeness of the search engine results is the ratio of relevant web pages to given request to the total number of relevant documents in the collection (a set of pages in the search engine).

For example, if there are more relevant pages in the entire collection than in the search results, then this means that the search results are incomplete. This was due to the fact that some of the relevant web pages fell under the filter.

3. Relevance of search engine results- This is the correspondence of the web page to what is written in the snippet. For example, a document may be very different or may not exist at all, but it may be present in the SERP.

The relevance of the issue directly depends on how often the search robot scans documents from its collection.

Collection collection (indexing of site pages) is carried out special program- a search robot.

The search robot receives a list of addresses for indexing, copies them, then the contents of the copied web pages are sent for processing to an algorithm that converts them into reverse indexes.

Well, here "in a nutshell", if I may say so, we discussed the principles of the search engine.

Let's summarize:

  1. A search robot comes to your blog.
  2. The crawler maintains the reverse index of the page for later retrieval.
  3. Using a mathematical model, the document is processed and displayed in the search results according to the formulas and taking into account the opinion of the assessor.

This is, if very, very simplified. Just to get a basic understanding of how the Yandex search engine works.

I have now written so much text, and perhaps so many things are not clear. Therefore, I suggest that you return to this article a little later and watch this video.

This is an excellent guide that I used to study at one time.

Hope this information will help you better understand why any of your sites are in relevant positions in the search and do everything to improve them.

On this I say goodbye to you, if you have any questions, I am always happy to answer them in the comments. Or maybe you want to supplement the article?

In any case, give your opinion. !

We are not as unique as we think: millions of people before us have puzzled and millions after us will puzzle the search engine with almost the same questions. On the other hand, we are too unpredictable: the formulation of our request is influenced by a huge number of factors that we do not understand. And at least for this reason, the request of each of us, no matter how banal it may be, requires an individual approach.

In fact, the entire work of the search engine "Yandex" is reduced to two simple things: to understand what a person really wants to know, and in a few seconds to find suitable documents for him among the billions of documents on the Web.

Take prints

The search engine's system is somewhat similar to the Matrix, and the search robot (a complex program that makes decisions on its own) is like Agent Smith.

In order not to search the entire Internet every time someone needs to find out something, the search engine does part of the work in advance - it checks what is on the Web and where it lies with the help of thousands of search robots. They are of two types: basic and fast. The main one bypasses and processes the Internet as a whole, and the fast one - documents that appeared a minute or even a couple of seconds ago. The task of robotic programs is to select information that is useful and useful for users, to process it, filtering out all that is outdated and unnecessary. In some ways it resembles sorting garbage: paper in one container, glass in another, plastic in a third, food waste in a fourth ...

The information collected by the robots forms the so-called “snapshot of the Internet”. It is stored on thousands of Yandex servers and is constantly updated. A snapshot is like a list that tells you where you can find what information. On this list, everyone has keyword not one, but millions of "pages" are listed. In order for all updates to the nugget to be available to users, they are transferred from the repository to " basic search". Data from the main robot is transferred every few days, and from the fast robot - in real time.

Bring to clean water



ILLUSTRATION: EUGENE TONKONOGIY

Looking for an answer to the question asked in a prepared base, the machine faces two main difficulties. The first difficulty is language. Before looking for an answer to a question, it is important for a machine to understand in which language to do it. For example, for a Russian-speaking person on the query "Prince Igor's squad", the search will find documents with information about the army, and for a Ukrainian, the "Prince Igor's squad" will also give documents mentioning Princess Olga, his spouse, since in Ukrainian "wife" is "Squad". And in the rich Russian language, the same word or its derivatives can mean different things. For example, the word “steel” is one of the forms of the noun “steel” and the verb “to become”. The second difficulty is human psychology. When we enter a request, we expect a quick and accurate answer, without worrying, of course, about the correspondence of the formulation of the request to the principles of mathematical analysis, according to which the brain of the machine works. For example, by typing in search string the word "napoleon", what does a person want to get: a recipe for a cake or a biography of a French emperor, buy brandy or find the address of a mental hospital?


In such situations, several technologies come into play at once. You can give you a few hints below the search bar that further refine your query. Like, choose what you need: Napoleon recipes or Napoleon - Bonaparte. If the user does not respond to the request of the car and does not add words to the "Napoleon", then the "Spectrum" technology helps: without hoping for help, the machine immediately searches for information in several categories (about the cake, and about the emperor, and about the yak horse. ..). In addition, personalization mechanisms help to understand the user - the machine's knowledge of what this user was looking for from his computer a day or two or three or a month ago: if you often asked Yandex questions about cooking, the machine will first show you the results saying, that Napoleon is a cake.

Combinations: hobby clubs

The task of a search engine is not limited to simply selecting documents containing words and phrases from search query... The machine needs to understand which documents meet our conflicting requirements and why they meet them. Do we want to get information about Napoleon - a cake, or maybe we visited a fitness club with a pretentious name for a couple of years, or even are completely concerned about the complexes of people of short stature. In any case, solving the problem requires a non-trivial approach.


The creators of the Yandex search program found this approach by delegating the choice to the machine. On the one hand, a soulless, but very fast and intelligent machine does not know and does not want to know anything about us as individuals, and on the other, it tries to find out as much as possible about each one.

In addition to the geographic location of the user and the linguistic analysis of his requests search engine uses several thousand criteria that are not at all obvious to humans.

The trick is that the machine develops and updates these criteria on its own.

It simply uses data on the preferences and user behavior of millions of people and connects this “arithmetic mean” to our query history. The principles that guide the Matrix within itself, comparing the thousands of categories of user interests it has developed, often do not fit into traditional human notions of what “interests” can in principle be. There are tens of thousands of them. They create different, sometimes funny, combinations with each other. For example, one of such combinations may be that the search results match the interests of the person who bred newts. At the same time, a person is not just interested in newts, but already breeds them, but only for the first year.

Estimates. Helping hands


The matrix, of course, decides itself (with the help of higher mathematics) what and in what sequence should be shown to users based on tens of thousands of criteria. But the Matrix also uses living people - 1000 Yandex employees, the so-called assessors, evaluate the search results for a particular query (of course, not every query is evaluated, and this is done not in real time) for their compliance with expectations regular user: not as rational as a machine, not as precise in wording, contradictory and emotional.

They have long become an integral part of the Russian Internet. Search engines now these are huge and complex mechanisms that represent not only a tool for finding information, but also tempting areas for business.

Most of the users of search engines have never thought (or thought, but did not find an answer) about the principle of work of search engines, about the scheme for processing user requests, about what these systems consist of and how they function ...

This master class is designed to answer the question of how search engines work. However, you will not find factors that influence the ranking of documents here. And the more you should not count on detailed explanation the Yandex operation algorithm. He, according to Ilya Segalovich, director of technologies and development of the search engine "Yandex", can be recognized only "under torture" by Ilya Segalovich himself ...

2. The concept and functions of the search engine

A search engine is a software and hardware complex designed to search the Internet and responding to a user's request, specified in the form of a text phrase (search query), by issuing a list of links to information sources, in order of relevance (in accordance with the request). Major international search engines: "Google", Yahoo, MSN. On the Russian Internet, these are Yandex, Rambler, and Aport.

Let's take a closer look at the concept of a search query using the Yandex search engine as an example. The search query should be formulated by the user in accordance with what he wants to find, as briefly and simply as possible. Let's say we want to find information in Yandex on how to choose a car. To do this, open home page"Yandex", and enter the text of the search query "how to choose a car." Further, our task is to open the links to sources of information on the Internet provided at our request. However, it is quite possible not to find the information we need. If this happened, then either you need to rephrase your request, or in the search engine database there really is no relevant information on our request (this can be when setting very "narrow" queries, such as "how to choose a car in Arkhangelsk")

The primary task of any search engine is to deliver people exactly the information they are looking for. And to teach users to make "correct" requests to the system, ie. queries that match the principles of search engines are not possible. Therefore, developers create algorithms and principles of search engines that would allow users to find the information they are looking for.

This means the search engine must "think" the way the user thinks when looking for information. When a user makes a request to a search engine, he wants to find what he needs as quickly and easily as possible. Having received the result, he assesses the work of the system, guided by several basic parameters. Did he find what he was looking for? If not, how many times did he have to rephrase the query to find what he was looking for? How relevant was he able to find information? How fast was the search engine processing the request? How convenient were the search results? Was the desired result the first or the hundredth? How much junk was found along with useful information? Will you find the information you need when you turn to a search engine, say, in a week, or in a month?

In order to satisfy all these questions with answers, the developers of search engines are constantly improving the algorithms and principles of search, adding new functions and capabilities, and trying in every possible way to speed up the work of the system.

3. The main characteristics of the search engine

Let's describe the main characteristics of search engines:

  • Completeness

    Completeness is one of the main characteristics of a search engine, which is the ratio of the number of documents found upon request to the total number of documents on the Internet that satisfy this request. For example, if there are 100 pages on the Internet containing the phrase “how to choose a car”, and only 60 of them were found for the corresponding query, then the completeness of the search will be 0.6. Obviously, what fuller search, the less likely it is that the user will not find the document he needs, provided that it exists on the Internet at all.

  • Accuracy

    Accuracy is another main characteristic of a search engine, which is determined by the degree to which the found documents match the user's request. For example, if the query “how to choose a car” contains 100 documents, 50 of them contain the phrase “how to choose a car”, and the rest simply contain these words (“how to choose the right radio tape recorder and install it in a car”), then the search accuracy is considered equal to 50/100 (= 0.5). The more accurate the search, the faster the user will find the documents he needs, the less various kinds of "garbage" will be encountered among them, the less often the documents found will not match the request.

  • Relevance

    Relevance is an equally important component of search, which is characterized by the time that passes from the moment documents are published on the Internet until they are entered into the index base of the search engine. For example, the next day after the appearance of interesting news, a large number of users turned to search engines with relevant queries. Objectively, less than a day has passed since the publication of news information on this topic, but the main documents have already been indexed and are available for search, thanks to the existence of the so-called "quick base" in large search engines, which is updated several times a day.

  • Search speed

    Search speed is closely related to its resistance to stress. For example, according to Rambler Internet Holding LLC, today, during business hours, the Rambler search engine receives about 60 queries per second. Such workload requires a reduction in the processing time of an individual request. Here, the interests of the user and the search engine coincide: the visitor wants to get results as quickly as possible, and the search engine must process the query as quickly as possible so as not to slow down the calculation of the following queries.

  • Visibility

4. Short story search engine development

In the initial period of the development of the Internet, the number of its users was small, and the amount of available information was relatively small. For the most part, only research workers had access to the Internet. At this time, the task of searching for information on the Internet was not as urgent as it is now.

One of the first ways to organize access to information resources network was the creation of open catalogs of sites, links to resources in which were grouped according to topic. The first such project was the site Yahoo.com, which opened in the spring of 1994. After the number of sites in the directory increased significantly, the search option was added the information you need according to the catalog. In the full sense, it was not yet a search engine, since the search area was limited only to the resources present in the directory, and not to all Internet resources.

Link directories were widely used in the past, but have almost completely lost their popularity at the present time. Since even modern catalogs, huge in their volume, contain information only about an insignificant part of the Internet. The largest directory of the DMOZ network (also called the Open Directory Project) contains information about 5 million resources, while the search database google systems consists of over 8 billion documents.

In 1995, the search engines Lycos and AltaVista appeared. The last for many years was a leader in the field of information search on the Internet.

In 1997, Sergey Brin and Larry Page created the Google search engine as part of research project at Stanford University. Google is currently the most popular search engine in the world!

In September 1997, the Yandex search engine, which is the most popular in the Russian-speaking Internet, was officially announced.

Currently, there are three main search engines (international) - Google, Yahoo and with their own databases and search algorithms. Most of the other search engines (of which there are a large number) use in one form or another the results of the three listed. For example, AOL search (search.aol.com) uses a Google base, while AltaVista, Lycos, and AllTheWeb use a Yahoo base.

5. The composition and principles of the search engine

In Russia, the main search engine is Yandex, then - Rambler.ru, Google.ru, Aport.ru, Mail.ru. Moreover, at the moment, Mail.ru uses the Yandex search engine and database.

Almost all major search engines have their own structure that is different from others. However, it is possible to single out the main components common to all search engines. Differences in the structure can only be in the form of the implementation of mechanisms for the interaction of these components.

Indexing module

The indexing module consists of three auxiliary programs (robots):

Spider (spider) - a program designed to download web pages. Spider provides page download and retrieves all internal links from this page. The html-code of each page is downloaded. Robots use HTTP protocols to download pages. The "spider" works as follows. The robot sends the “get / path / document” request and some other HTTP request commands to the server. In response, the robot receives a text stream containing service information and the document itself.

  • Page url
  • the date the page was downloaded
  • server response http header
  • page body (html-code)

Crawler ("traveling" spider) - a program that automatically crawls all the links found on the page. Highlights all links present on the page. Its task is to determine where the spider should go next, based on links or based on a predefined list of addresses. Crawler, following the links found, searches for new documents that are still unknown to the search engine.

Indexer is a program that analyzes web pages downloaded by spiders. The indexer parses the page into its constituent parts and analyzes them using its own lexical and morphological algorithms. Various page elements are analyzed, such as text, headings, links, structural and style features, special service html tags, etc.

Thus, the indexing module allows you to crawl a given set of resources by links, download pages encountered, extract links to new pages from received documents and perform a complete analysis of these documents.

Database

A database, or an index of a search engine, is a data storage system, an information array that stores specially converted parameters of all documents downloaded and processed by the indexing module.

Search Server

The search server is an essential element of the entire system, since the quality and speed of search directly depends on the algorithms that underlie its functioning.

The search engine works as follows:

  • The request received from the user is subjected to morphological analysis. The information environment of each document contained in the database is generated (which will subsequently be displayed in the form, that is, corresponding to the request text information on the search results page).
  • The received data is passed as input parameters to a special ranging module. The data for all documents is processed, as a result of which, for each document, its own rating is calculated, which characterizes the relevance of the query entered by the user, and the various components of this document stored in the search engine index.
  • Depending on the user's choice, this rating can be adjusted additional conditions(for example, the so-called "advanced search").
  • Next, a snippet is generated, that is, for each found document, the title, a short annotation that best matches the request and a link to the document itself are extracted from the document table, and the found words are highlighted.
  • The resulting search results are transmitted to the user in the form of a SERP (Search Engine Result Page) - search results page.

As you can see, all these components are closely related to each other and work in interaction, forming a clear, rather complex mechanism for the search engine operation, which requires a huge expenditure of resources.

6. Conclusion

Now let's summarize all of the above.

  • The primary task of any search engine is to deliver people exactly the information they are looking for.
  • The main characteristics of search engines:
    1. Completeness
    2. Accuracy
    3. Relevance
    4. Search speed
    5. Visibility
  • The first full-fledged search engine was the WebCrawler project, published in 1994.
  • The search engine includes the following components:
    1. Indexing module
    2. Database
    3. Search Server

We hope that our master class will allow you to get a closer look at the concept of search engines, to better know the main functions, characteristics and the principle of operation of search engines.