Oleg Is for Sale

Tinkoff VoiceKit, a set of proprietary speech-to-text and text-to-speech technologies by Tinkoff Bank , is now available to its corporate customers.

Tinkoff VoiceKit features deep neural network models for speech recognition and synthesis developed by Tinkoff over the recent years as part of its AI First strategy and used to create Oleg, the world’s first proprietary financial voice assistant.

It can be used to:

  • create voice assistants;
  • create robots for automated call centres;
  • accelerate production of audiobooks and voice-overs and speed up video editing;
  • build a speech analytics system based on transcripts (e.g. to supervise operators at a call centre);
  • create applications for people with disabilities;
  • transcribe any public speeches recorded on audio;
  • facilitate SEO activities and full-text search of audio and video recordings.

Tinkoff VoiceKit is available for purchase at https://speech.tinkoff.ru. A version for individuals is presently under development, with release scheduled for this autumn.

Educational institutions and students will be able to get the technology free of charge as Tinkoff seeks to strengthen the Russian education system by spearheading its initiatives, supporting nationwide contests and cooperating with the country’s leading universities and educational centers.

Tinkoff has been working on its speech recognition technology since 2016. With terabytes of data and many thousands of hours of human speech to learn from, it can correctly recognise up to 95% of the words in a spoken phrase regardless of the audio quality, eliminating any noise in a phone conversation, as well as handling crystal clear speech.

In 2018, Tinkoff embraced such neural network models as WaveNet, Tacotron 2 and Deep Voice to roll out a proprietary speech synthesis technology. Thanks to the knowledge and expertise acquired over the two years, it took just around nine months to do the job. The voice synthesised using Tinkoff’s neural network architectures is nearly indistinguishable from genuine human speech.

The developers of Tinkoff VoiceKit leveraged the resources of the Kolmogorov cluster, one of the most powerful supercomputers in Russia. Created by Tinkoff Group in March 2019, it was also used to train the neural network models.

Beyond Tinkoff’s voice assistant, the speech technologies help automate its customer service. The speech recognition algorithms process about a million phone calls per day and facilitate quality assurance for dealing with customer requests, and the proprietary biometric system trained on customer voices helps identify any fraudulent activities at the call centre.

“We had a strong team of developers, 80 video cards, more than 15 thousand hours of audio from public sources, many thousands of hours of phone conversations coming through our call centre, the Kolmogorov supercomputer and a voice-over actor ready to spend five months recording the speech synthesis material. Over the three years, we have timestamped more than 4.5 thousand hours of speech and trained deep neural network models to create what is now available as Tinkoff VoiceKit, the set of our proprietary speech technologies. Our first customers pointed out its superior recognition quality over other solutions they had used, especially when it comes to phone conversations – this is where we have plenty of data to train our neural network models on using the Kolmogorov supercomputer. No matter the intended application, our solutions will only be available as APIs, both for live recognition and for batch offline processing. If the customer needs system reconfiguration or an on-site solution, we will seek to engage major integrators to help out. Mobile SDKs for iOS and Android are in the pipeline too.” – Vyacheslav Tsyganov, VP and CIO at Tinkoff.