Test drive of the voice assistant "Alice" from "Yandex. Alice. How Yandex teaches artificial intelligence to talk to people Alice open mail

  • Development of mobile applications,
  • Data Mining,
  • Machine learning
  • In the future, it seems to us, people will interact with devices using their voice. Already, applications recognize the exact voice commands laid down in them by developers, but with the development of artificial intelligence technologies, they will learn to understand the meaning of arbitrary phrases and even maintain a conversation on any topic. Today we will tell the readers of Habr about how we bring this future closer using the example of Alice, the first voice assistant, which is not limited to a set of predefined answers and uses neural networks for communication.

    Despite its apparent simplicity, the voice assistant is one of the most ambitious technology projects in Yandex. In this post, you will find out what difficulties are faced by developers of voice interfaces, who actually writes answers for virtual assistants, and what Alice has in common with artificial intelligence from the movie "She".

    At the dawn of its existence, computers were mainly used in large scientific or defense enterprises. At that time, only science fiction writers were thinking about voice control, but in reality, operators loaded programs and data using a piece of cardboard. Not the most convenient way: one mistake, and everything has to start over.

    Over the years, computers have become more affordable and used by smaller companies. Experts control them using text commands entered in the terminal. Good, reliable way- it is used in a professional environment to this day, but requires training. Therefore, when computers began to appear in homes ordinary users, engineers began to look for more simple ways the interaction of a machine and a person.

    Concept emerges in the Xerox laboratory graphical interface WIMP (Windows, Icons, Menus, Point-n-Click) is widely used in products of other companies. Memorizing text commands to control your home computer was no longer required - they were replaced by gestures and mouse clicks. For its time, this was a real revolution. And now the world is getting closer to the next one.

    Now almost everyone has a smartphone in their pocket, the computing power of which is enough to land a ship on the moon. The mouse and keyboard have replaced fingers, but with them we make all the same gestures and clicks. It is convenient to do this while sitting on the couch, but not on the road or on the go. In the past, humans had to master the language of machines to interact with computer interfaces. We believe that now is the time to teach devices and applications to communicate in the language of people. It was this idea that formed the basis of Alice's voice assistant.

    You can ask Alice [Where can I have coffee nearby?], And not dictate something like [coffee house cosmonauts street]. Alice will look into Yandex and suggest a suitable place, and to the question [Great, but how to get there?] - she will give a link to the route already built in Yandex.Maps. She knows how to distinguish exact factual questions from the desire to see classic search results, rudeness - from a polite request, the command to open a site - from the desire to just chat.

    It may even seem that somewhere in the cloud a neural miracle network is working, which alone solves any problem. But in reality, behind any Alice's answer, there is a whole chain of technological problems, which we have been learning to solve for 5 years. And we will begin our excursion from the very first link - from the ability to listen.

    Hello Alice

    Science fiction artificial intelligence knows how to listen - people don't have to press special buttons to turn on "recording mode." And this requires voice activation - the application must understand that a person is referring to it. This is not as easy as it might seem.

    If you just start recording and processing the entire incoming audio stream on the server, you will very quickly drain the device's battery and waste the entire mobile traffic... In our case, this is solved using a special neural network, which is trained exclusively to recognize key phrases ("Hello, Alice", "Listen, Yandex" and some others). Support for a limited number of such phrases allows you to do this work locally and without going back to the server.

    If the network is only learning to understand a few phrases, you might think that this is easy and fast enough. But no. People do not pronounce phrases in ideal conditions, but surrounded by completely unpredictable noise. And everyone's voices are different. Therefore, to understand just one phrase, thousands of training notes are needed.

    Even a small local neural network consumes resources: you can't just take and start processing the entire stream from the microphone. Therefore, a less cumbersome algorithm is used on the front line, which cheaply and quickly recognizes the "speech started" event. It is he who turns on the neural network engine for recognizing key phrases, which in turn launches the hardest part - speech recognition.

    If thousands of examples are needed to teach just one phrase, then you can imagine how laborious it is to train a neural network to recognize any words and phrases. For the same reason, recognition is performed in the cloud, where the audio stream is transmitted, and from where ready-made answers are returned. The accuracy of the answers directly depends on the quality of recognition. That is why the main challenge is to learn to recognize speech as well as a person does. By the way, people also make mistakes. It is believed that a person recognizes 96-98% of speech (WER metric). We managed to achieve an accuracy of 89-95%, which is not only comparable to the level of a live interlocutor, but also unique for the Russian language.

    But even ideally transformed into text speech will mean nothing if we cannot understand the meaning of what was said.

    What is the weather tomorrow in St. Petersburg?

    If you want your application to display the weather forecast in response to a voice request [weather], then everything is simple - you compare the recognized text with the word "weather" and if you get a match, display the answer. And this is a very primitive way of interaction, because in real life people ask questions differently. A person can ask the assistant [What is the weather tomorrow in St. Petersburg?], And he should not get confused.

    The first thing Alice does when she gets a question is to recognize the script. Send a search query and show classic search results with 10 results? Looking for one exact answer and giving it to the user right away? Take an action like open a website? Or maybe just talk? It is incredibly difficult to teach a machine to accurately recognize scenarios of behavior. And any mistake here is unpleasant. Luckily we have all the power search engine Yandex, which faces millions of requests every day, searches for millions of answers and learns to understand which ones are good and which are not. This is a huge knowledge base, on the basis of which it is possible to train another neural network - one that would with a high probability "understand" what a person wants. Mistakes are, of course, inevitable, but people also make mistakes.

    With the help of machine learning, Alice "understands" that the phrase [What is the weather tomorrow in St. Petersburg?] Is a request for the weather (by the way, this is a deliberately simple example for clarity). But what city are we talking about? On what date? This is where the step of retrieving Named Entity Recognition from custom replicas begins. In our case important information carry two such objects: "Peter" and "tomorrow". And Alice, who has search technologies behind her shoulders, “understands” that “Peter” is a synonym for “St. Petersburg”, and “tomorrow” is “ The current date+ 1 ".

    Natural language - not only external form our replicas, but also their coherence. In life, we do not exchange short phrases, but we conduct a dialogue - it is impossible if we do not remember the context. Alice remembers it - this helps her to deal with complex linguistic phenomena: for example, to cope with an ellipsis (to recover missing words) or to resolve coreference (to identify an object by a pronoun). So, if you ask [Where is Elbrus?], And then clarify [And what is its height?], Then the assistant will find the correct answers in both cases. And if, after the query [What is the weather today?], You ask [And tomorrow?], Alice will understand that this is a continuation of the dialogue about the weather.

    And one more thing. The assistant must not only understand natural language, but also be able to speak it - like a person, not like a robot. For Alice, we synthesize a voice originally owned by the dubbing actress Tatyana Shitova (the official voice of Scarlett Johansson in Russia). She voiced artificial intelligence in the movie "She", although you may remember her from the voice of the sorceress Yennefer in "The Witcher." Moreover, we are talking about a fairly deep synthesis using neural networks, and not about slicing ready-made phrases - it is impossible to write down all their diversity in advance.

    Above, we described the features of natural communication (unpredictable form of replicas, missing words, pronouns, mistakes, noise, voice) that you need to be able to work with. But live communication has one more property - we do not always require a specific answer or action from the interlocutor, sometimes we just want to talk. If the application sends such requests to the search, then all the magic will be destroyed. That is why popular voice assistants use a database of editorial answers to popular phrases and questions. But we went even further.

    How about chatting?

    We taught the machine to answer our questions, conduct a dialogue in the context of certain scenarios and solve user problems. This is good, but is it possible to make her less soulless and endow her with human qualities: give her a name, teach her to talk about herself, maintain a conversation on free topics?

    In the voice assistant industry, this challenge is addressed with editorial responses. A special team of authors takes hundreds of the most popular questions from users and writes several answer options for each. Ideally, this should be done in the same style so that the whole personality of the assistant is formed from all the answers. For Alice, we also write answers - but we have something else. Something special.

    In addition to the top of popular questions, there is a long tail of low-frequency or even unique phrases to which it is impossible to prepare an answer in advance. You already guessed how we solve this problem, right? Using another neural network model. To answer questions and replies unknown to her, Alice uses a neural network trained on a huge database of texts from the Internet, books and films. Machine learning connoisseurs may be interested in the fact that we started with a 3-layer neural network, and now we are experimenting with a huge 120-layer one. We will save the details for specialized posts, but here we will say that the current version of Alice tries to respond to arbitrary phrases using a "neural network chatter" - as we call her internally.

    Alice learns from a huge number of very different texts, in which people and characters do not always behave politely. A neural network can learn something completely different from what we want to teach it.

    - Order me a sandwich.
    - You will manage.

    Like any child, Alice cannot be taught not to be rude, protecting her from all manifestations of rudeness and aggression - that is, by teaching the neural network on a "clean" basis, where there are no rudeness, provocations and other unpleasant things that are often found in the real world. If Alice does not know about the existence of such expressions, she will answer them thoughtlessly, with random phrases - for her they will remain unknown words. Better let her know what it is - and develop a definite position on these issues. If you know what a mate is, you can either swear back, or say that you will not talk to the swearing one. And we model Alice's behavior so that she chooses the second option.

    It so happens that Alice's replica itself is completely neutral, but in the context specified by the user, the answer ceases to be harmless. Once, even during closed testing, we asked the user to find some establishments - a cafe or something similar. He said: "Find another one like that." And at that moment, a bug occurred in Alice, and instead of launching a script for searching for an organization, she gave a rather daring answer - something like “look on the map”. And she didn't look for anything. The user was surprised at first, and then surprised us too, praising Alice's behavior.

    When Alice uses a "neural network chatter", a million different personalities can appear in her, since the neural network has absorbed a little bit from the author of each replica from the training set. Depending on the context, Alice can be polite or rude, cheerful or depressed. We want the personal assistant to be an integral personality with a well-defined set of qualities. This is where our editorial texts come in. Their peculiarity is that they were originally written on behalf of the person we want to recreate in Alice. It turns out that you can continue to train Alice on millions of lines of random texts, but she will answer with an eye to the standard of behavior inherent in the editorial responses. And this is what we are already working on.

    Alice became the first voice assistant we know who tries to maintain communication not only with the help of editorial responses, but also using a trained neural network. Of course, we are still very far from what is portrayed in modern science fiction. Alice does not always accurately recognize the essence of the replica, which affects the accuracy of the answer. Therefore, we still have a lot of work.

    We plan to make Alice the most humanoid helper in the world. Instill in her empathy and curiosity. To make it proactive - to teach how to set goals in dialogue, show initiative and involve the interlocutor in the conversation. Now we are at the same time at the very beginning of the path, and at the forefront of the sciences studying this area. To move on, you have to move this edge.

    16.03.2018

    The new version of Yandex Browser comes with voice assistant Alice. Some time ago, Alice was included in mobile version browser, now it is also in the Windows version of the browser.

    Alice is a voice assistant that integrates into operating system and allows you to control your computer and Yandex Browser using speech. The main task of the function is to execute commands such as opening sites and third party applications, performing search queries and getting answers to a variety of questions. New feature sits in the right corner of the taskbar and responds to the voice command "Listen, Alice" or to a mouse click on the icon.

    For example, Alice can be given such voice commands as "Open", "Mute the sound", "Open VKontakte", "Turn off the computer" and many others. The format of the commands is not at all strict - Alice successfully understands both the command “Open VKontakte” and “Go to the VKontakte website”, and to the question “What is VKontakte” she will give a brief information without opening the site.

    In fact, command execution is only a small part of the voice assistant's capabilities. One more key function are the answers to the questions asked by the user. It is important to clarify that the program responds with a voice with a duplicate response with text. The simplest thing is reference Information, type of weather, exchange rates, traffic jams, exact date and time, etc. For answers to more interesting questions, such as "What is ..." information from Wikipedia is used and own services Yandex, which is read out in short form. More difficult questions turn into search queries with the opening of search results in Yandex Browser.

    What's more, a lone user can even talk to a smart robot. The program willingly answers not very difficult questions and can easily tell you about your favorite color or why her name is Alice. She can even tell a couple of jokes and play word games like Cities and many others. And if the user asks to sing a song, the voice assistant will send it to the Yandex.Music service.

    The voice assistant from Yandex is focused on the Russian language. During its development, technologies were used that were created by Yandex itself, taking into account the peculiarities of the semantics of the Russian language. Alice's voice recognition and speech synthesis is at a sufficient high level... Simple commands are recognized and executed efficiently and quickly, and flaws can be noticed only when recognizing queries containing very complex word forms or with significant problems with the user's diction.

    If the Yandex Browser user does not need a voice assistant, then it can be disabled in the program settings.

    And yes, do not call Alice by the names Cortana and Siri - this makes her upset.

    Hello Alice.

    It becomes easier to get answers to many questions when the voice assistant Alice from Yandex is at hand. Yandex Alice is a personal assistant with artificial intelligence developed by Yandex, an alternative to its competitor Okay Google. Alice easily helps to cope with daily tasks and has a meaningful dialogue. The program is created on the basis of neural networks that recognize speech, accents in the voice, create responses and synthesize the assistant's voice. Thanks to such skills, Alice is able to improvise and communicate in a spoken language accessible to everyone. With each subsequent update of the voice assistant, the program has new opportunities and now, in addition to performing search queries

    Alice can:

    This is not the whole list of her capabilities, she is constantly learning new skills and improving herself.

    If you are bored or sad, she will joke, tell a joke or play with you. Would you like to watch a movie? Easy - movie posters, tickets and prices in an instant. For children, Alice can include a fairy tale. Her answers will always be varied, the creators of the application have worked for a long time and were able to put modern live speech into the voice assistant, which will be understandable to many.

    Russian actress Tatyana Shitova participated in the creation of the voice. She previously voiced American actress Scarlett Johansson. Coincidence or not, but with the sounded voice of Tatyana Shilova in the fantastic film She spoke the virtual assistant Samantha. Thanks to this dubbing, Alice turned out to be very lively. Sadness, joy and even insolence can be traced in her intonations.

    The creators explained why they decided to focus on the virtual assistant. First, the movement of the industry towards voice messages because today's generation of users prefers voice search over typing. Secondly, the construction of algorithms based on meaningful dialogues. That is, virtual assistant understands that subsequent phrases may be interrelated. This is what the dialogue is based on. The voice assistant Yandex Alice is now in, built-in by default, with her the browser has become much more convenient.

    Video review Yandex Alice

    How to install Alice Yandex

    1. Download the Alice application from the link below.
    2. Install the application.
    2. Allow the application to determine the geolocation.
    3. For full-fledged work, allow to record sound.
    4. For ease of use can be set to main screen widget or shortcut.

    Not so long ago, such a well-known search engine as Yandex released its own voice assistant and it is called very simply - Alice.

    I think you are very interested in such things and therefore, I decided to go through the questions that people are most often interested in.

    Voice assistant Alice from Yandex - what is it?

    Like other similar assistants, she can talk to you and, using voice or text dialogue, give you answers to the questions you need.

    Features of the voice assistant Alice from Yandex

    Alice does not stand out for anything special and in it you can find all the similar functions that you can find in similar variants from Google or Apple.

    Basically, it works with all services from Yandex. If you try to interact with other applications, problems may arise.

    All functions can be described with the following points:

    • conduct a simple dialogue;
    • give answers to various questions;
    • everything related to the weather forecast (in different cities, the weather for tomorrow, etc.);
    • clarification of the date and day (which is very important);
    • any information related to maps (get directions, find out the distance, advice on where to eat, etc.);
    • money transactions (find out the exchange rate, transfer from one currency to another, etc.);
    • other.

    Although we already have a full-fledged version, the assistant still has room to grow and despite limited opportunities, only positive reviews.

    The question is different: "How will it compete with the already existing options?"

    How to enable voice assistant Alice from Yandex?

    At the moment there are versions for iOS, Android, Windows (beta) and in the future it is planned to be built into Yandex Browser.


    If you are looking for a version for mobile device, you can find it in the Yandex application. The developers decided to simply embed the assistant into the ready-made program.

    To talk to Alice, you need to do one of these actions (while the Yandex application is running):

    • click on the purple round button with a microphone;
    • say "Hello Alice".

    In both cases, the result is exactly the same and then we start asking questions and Alice starts answering you.

    If the assistant does not know how to implement your request, then the Yandex search engine opens with your question and a list of results.

    Everything looks like a regular chat. I think there will be some changes in the future, but so far everything looks quite simple and tasteful.

    Who voiced Yandex's voice assistant Alice?

    Alice is voiced by a very famous actress Tatyana Shitova, and if you do not know who this is, I can say that she is the voice of Scarlett Johansson in the Russian dub.


    So when watching movies like Ghost in the Shell or Lucy, you can think about Alice and compare the voices. But this is optional.

    How to download voice assistant Alice from Yandex on iOS or Android?

    If you try to find an assistant simply by entering Alice in the search App Store or Google play, then the responses will contain the application named Yandex.

    Do not be alarmed, because this is it. Previously, this program was only dedicated to search engine, but now there is also a built-in helper.

    It weighs for different devices differently (for example, on the iPhone 5S - a little more than 60 MB), so it won't take up much space. Here are the links so you don't get confused:

    Good afternoon. The official release of the voice assistant Alice on smartphones took place, which pleased me, as well as the beta version of the assistant for Windows was released today. I installed it, tested it a bit and was just as pleasantly surprised.

    Voice assistant Alice for PC

    For installation " Alice"On a PC, you need to go to the site https://alice.yandex.ru/windows and click on" Install", After that you will download setup file... Launch and install.

    ATTENTION! Yandex removed Alice's installer, the official link now downloads a browser with built-in Alice!

    I still have the installer, if anyone needs it —

    (screenshot)

    After installation, you will see at the bottom left near the button Start the search bar, on Win 10 it is integrated into the standard search, on Win7 it is put in a separate widget. Let's take a look at what this assistant can now, which is in the Beta stage.

    In the first tab, frequently visited sites and trending news or queries in a search engine, as I understand it:

    In the second tab, a list of programs that you can open by clicking on the program itself with the mouse, as well as ask " Voice control»Open the app for you.

    If on the main (first) tab, click on the left icon " question mark “, Then you will see a small list of what Alice can do:

    For the test, I decided to ask her latest news, for what Alice said that she would give the floor to her colleague from “ Yandex.News“And a male voice began to read the news.

    Then I tried to talk to her, in principle, she answers in the same way as on the phone. Opens applications without, sites too. If she is asked to turn on the radio or a certain song / group, then Alice opens a browser, opens Yandex.Music in it and launches what you asked her. She does not know how to work with video yet.

    On PC control - can turn on and off the sound, turn off, restart the PC, as well as send it to sleep mode.

    Conclusion:

    What can you say? Yandex did a great job on its assistant. I hope they will not abandon it, but will further develop it. This is not a bad analogue of Cortana, which we are unlikely to see in Windows 10. Microsoft has been promising to release it in Russian for a long time, but so far it is dull. And then Yandex and Alice just arrived.

    Install, try, test.

    Share in the comments what other interesting functions and "tricks" it has, what it can do and how it really helps you in your everyday work with your PC.