top of page

voice recognition engine

Onkyo SPEECH is our proprietary speech recognition algorithm and software that converts voice data into text.

Unique AI Application Technology

We have developed our own proprietary voice recognition engine in-house. We are also good at telephone voice.

Recognition of a wide range of ages

Utilizes deep learning acoustic analysis. It boasts a recognition rate of 84% even for the voices of seniors, whose glottal lisp is aging.

Casmosis support

Customize the vocabulary to be followed and learned in dialects and industry-specific vocabulary.

Highly accurate speech recognition is achieved by learning from customer data with unique labels. Speech recognition is possible for speech from various environments and age groups.


Language Model Update

Text learning of incorrect parts from speech recognition results.

Acoustic model update

Transcriptions of industry/environment-specific audio will be transcribed and made available for study.

One of the characteristics of speech recognition is that the recognition rate drops when the voice is completely different from the one used in training or when the sentence contains unknown words. Onkyo SPEECH improves the speech recognition rate through customized learning.

deep learning

The system uses deep learning as the acoustic model to learn and classify voice features extracted by acoustic analysis. The deep learning model uses a "factored TDNN" engine with a bottleneck layer to efficiently learn important parts of the voice. It has a higher recognition rate than other speech recognition systems, especially for the elderly.

Automatic visualization and
Creation of meeting minutes

Simplification Support Tools

Visualization of

customer interactions

on sales calls

AR / VR Goggles
Visualization of communication using

For this type of business

Ideal for situations where conversations and spoken voices must be preserved in text.

Usage Fees

We offer two types of usage: batch and streaming versions. Please contact us if you are not sure which is more suitable for your needs.

API provision (batch version)

API provision (batch version)

API provision (batch version)

API provision (batch version)

One month unlimited use channel is the number of simultaneous accesses

Conference Accepted Papers

Onkyo SPEECH's speech recognition has been the subject of papers presented at conferences.

O-COCOSDA 2021 Best Paper AwardNobuya Tachimori (Onkyo Corporation, Japan),Sakriani Sakti and Satoshi Nakamura(Nara Institute of Science and Technology, Japan)MULTI-ENCODER SEQUENTIAL ATTENTION NETWORK FOR CONTEXT-AWARE SPEECH RECOGNITION IN JAPANESE DIALOG CONVERSATION


  • The batch version is an API request per file, while the streaming version is an API request per audio stream. We recommend the streaming version if your application requires real-time performance, and the batch version for post-checking and log visualization.

  • The batch version is file-based, so a 30-minute audio stream would be approximately 30 minutes to an hour. The real-time version is 5 to 10 seconds from the end of the audio stream (interruption). Demos of the batch and real-time versions are available free of charge, so please contact us for more information.

  • It will take 1 to 2 days to receive the API key required for API use.

  • The price includes communication fees and is an unlimited-use plan.

  • Currently under development and will take time to provide. Please contact us for more information as it can be considered depending on the volume.

  • We evaluate the recognition rate by WER (Word Error Rate) and the recognition rate must be 85% or more. 80% or less, we will accept the text study together with the text study free of charge.

  • We are considering 2 weeks for text study and 3 weeks for audio study.

  • Yes, we can. Please contact us for specific costs.


Onkyo IVR is used by a wide variety of companies.


Onkyo IVR is used by a wide variety of companies.

bottom of page