The Electronics and Telecommunications Research Institute (ETRI) announced on the 3rd that it has developed a 'conversational AI technology' that can recognize the world's 24 major languages as voice and convert it into text.
ETRI explained that the performance of the speech recognition technology developed by ETRI is superior in Korean and comparable in other languages compared to global companies such as Google.
The research team solved the difficulties of language expansion through self-supervised learning, application of doctor's label, large-capacity multilingual dictionary learning model, and audio data generation (TTS) augmentation technology from voice data.
We improved usability by improving the shortcomings of the commonly used end-to-end voice recognition technology, and developed a streaming inference technology for the problem of slow response speed and improved it to enable real-time processing.
In addition, a hybrid end-to-end recognition technology was developed and applied to make it easy to specialize in specific domains such as medical, legal, and scientific technology.
Sang-Hoon Kim, Senior Researcher at ETRI's Complex Intelligence Lab, said, "It is significant that we have developed a voice recognition technology that is comparable to that of a global leader with domestic technology. I hope it will be of great help to you.”