Recent advances in deep learning research have improved automatic speech recognition (ASR) technology so significantly that it's moving closer to human-level accuracy. This opens the door for many more exciting possibilities and functions for using the technology.

For example, speech-to-text application programming interfaces (APIs) already boast 92% accuracy compared to a human transcription as calculated by word error rate (WER). Recent strides in machine learning research, such as Data2vec and Perceiver, aim to boost accuracy further and increase the utility of ASR systems.

[ You might also be interested in reading 3 best practices for building speech recognition models. ]

As ASR systems are becoming more accurate, they're also becoming more affordable. This in turn increases their reach and accessibility. During the transition, expect to see pioneering ASR technology pop up in new smart TVs, laptops, and automobiles, further integrating the technology into our daily routines.

You can expect to find ASR applications in places you wouldn't expect, like self-checkout kiosks in grocery stores. In the near future, voice interfaces may become more popular than touch-screen devices. Voice interfaces could change the way people interact with the world.

Audio intelligence features are becoming transformative tools

Today's ASR systems go beyond basic speech-to-text transcription. Businesses may find great value in artificial intelligence (AI)-backed features that provide smart analytics, including the following:

  • Sentiment analysis extracts the sentiments in a speaker's speech segments to analyze feelings. An example would be the emotions expressed during customer-agent interactions in the telecom industry. A company can take this analytical data and use it to better inform agent training, targeted marketing messages, and customer interactions in call centers.

  • Entity detection identifies and classifies entities in a text. For example, engineer is an entity that could be classified as an occupation, while arm and foot could be classified as body parts. Entity detection can be used by the medical field to identify conditions and treatments to help automatically sort patient information and perform statistical analysis. Voice bots use entity detection to identify specific people or companies and then automatically trigger actions to personalize interactions.

  • Speaker diarization identifies distinct speakers in an audio or video file. Call centers use speaker diarization to identify speakers and then analyze a speaker's behavior in order to make future predictions. For example, a podcast might automatically label a transcription with the speakers' names to make the transcriptions more readable.

  • Content safety detection identifies and filters content for potentially harmful and sensitive information, such as hate speech, violence, drugs, and so on. Online podcast platforms may use content safety detection for content moderation.

  • Personal information removal identifies and redacts personally identifiable information (PII), such as social security numbers, credit card numbers, and addresses. Communications and telecom platforms use PII redaction to meet security and privacy requirements and regulations.

  • Summarization breaks audio or video transcripts into logical "chapters" and generates a summary for each one. Virtual meeting platforms use summarization to automatically create useful summaries after each meeting. Call center companies can use summarization to aid conversation reviews.

[ Download the eBook An architect's guide to multicloud infrastructure. ]

Speech recognition

With increased accuracy, accessibility, and analytical prowess, ASR products are quickly becoming deeply integrated into IT architecture. And open source frameworks like DeepSpeech make ASR highly accessible to those who wish to incorporate ASR into their business and IT systems.


About the author

Dylan is the Founder and CEO of AssemblyAI, a Y Combinator backed startup building the #1 rated API for Automatic Speech Recognition. Dylan is an experienced AI Researcher, with prior experience leading Machine Learning teams at Cisco Systems in San Francisco.

UI_Icon-Red_Hat-Close-A-Black-RGB

Browse by channel

automation icon

Automation

The latest on IT automation for tech, teams, and environments

AI icon

Artificial intelligence

Updates on the platforms that free customers to run AI workloads anywhere

open hybrid cloud icon

Open hybrid cloud

Explore how we build a more flexible future with hybrid cloud

security icon

Security

The latest on how we reduce risks across environments and technologies

edge icon

Edge computing

Updates on the platforms that simplify operations at the edge

Infrastructure icon

Infrastructure

The latest on the world’s leading enterprise Linux platform

application development icon

Applications

Inside our solutions to the toughest application challenges

Virtualization icon

Virtualization

The future of enterprise virtualization for your workloads on-premise or across clouds