Recent advances in deep learning research have improved automatic speech recognition (ASR) technology so significantly that it's moving closer to human-level accuracy. This opens the door for many more exciting possibilities and functions for using the technology.

For example, speech-to-text application programming interfaces (APIs) already boast 92% accuracy compared to a human transcription as calculated by word error rate (WER). Recent strides in machine learning research, such as Data2vec and Perceiver, aim to boost accuracy further and increase the utility of ASR systems.

[ You might also be interested in reading 3 best practices for building speech recognition models. ]

As ASR systems are becoming more accurate, they're also becoming more affordable. This in turn increases their reach and accessibility. During the transition, expect to see pioneering ASR technology pop up in new smart TVs, laptops, and automobiles, further integrating the technology into our daily routines.

You can expect to find ASR applications in places you wouldn't expect, like self-checkout kiosks in grocery stores. In the near future, voice interfaces may become more popular than touch-screen devices. Voice interfaces could change the way people interact with the world.

Audio intelligence features are becoming transformative tools

Today's ASR systems go beyond basic speech-to-text transcription. Businesses may find great value in artificial intelligence (AI)-backed features that provide smart analytics, including the following:

  • Sentiment analysis extracts the sentiments in a speaker's speech segments to analyze feelings. An example would be the emotions expressed during customer-agent interactions in the telecom industry. A company can take this analytical data and use it to better inform agent training, targeted marketing messages, and customer interactions in call centers.

  • Entity detection identifies and classifies entities in a text. For example, engineer is an entity that could be classified as an occupation, while arm and foot could be classified as body parts. Entity detection can be used by the medical field to identify conditions and treatments to help automatically sort patient information and perform statistical analysis. Voice bots use entity detection to identify specific people or companies and then automatically trigger actions to personalize interactions.

  • Speaker diarization identifies distinct speakers in an audio or video file. Call centers use speaker diarization to identify speakers and then analyze a speaker's behavior in order to make future predictions. For example, a podcast might automatically label a transcription with the speakers' names to make the transcriptions more readable.

  • Content safety detection identifies and filters content for potentially harmful and sensitive information, such as hate speech, violence, drugs, and so on. Online podcast platforms may use content safety detection for content moderation.

  • Personal information removal identifies and redacts personally identifiable information (PII), such as social security numbers, credit card numbers, and addresses. Communications and telecom platforms use PII redaction to meet security and privacy requirements and regulations.

  • Summarization breaks audio or video transcripts into logical "chapters" and generates a summary for each one. Virtual meeting platforms use summarization to automatically create useful summaries after each meeting. Call center companies can use summarization to aid conversation reviews.

[ Download the eBook An architect's guide to multicloud infrastructure. ]

Speech recognition

With increased accuracy, accessibility, and analytical prowess, ASR products are quickly becoming deeply integrated into IT architecture. And open source frameworks like DeepSpeech make ASR highly accessible to those who wish to incorporate ASR into their business and IT systems.


Über den Autor

Dylan is the Founder and CEO of AssemblyAI, a Y Combinator backed startup building the #1 rated API for Automatic Speech Recognition. Dylan is an experienced AI Researcher, with prior experience leading Machine Learning teams at Cisco Systems in San Francisco.

UI_Icon-Red_Hat-Close-A-Black-RGB

Nach Thema durchsuchen

automation icon

Automatisierung

Das Neueste zum Thema IT-Automatisierung für Technologien, Teams und Umgebungen

AI icon

Künstliche Intelligenz

Erfahren Sie das Neueste von den Plattformen, die es Kunden ermöglichen, KI-Workloads beliebig auszuführen

open hybrid cloud icon

Open Hybrid Cloud

Erfahren Sie, wie wir eine flexiblere Zukunft mit Hybrid Clouds schaffen.

security icon

Sicherheit

Erfahren Sie, wie wir Risiken in verschiedenen Umgebungen und Technologien reduzieren

edge icon

Edge Computing

Erfahren Sie das Neueste von den Plattformen, die die Operations am Edge vereinfachen

Infrastructure icon

Infrastruktur

Erfahren Sie das Neueste von der weltweit führenden Linux-Plattform für Unternehmen

application development icon

Anwendungen

Entdecken Sie unsere Lösungen für komplexe Herausforderungen bei Anwendungen

Virtualization icon

Virtualisierung

Erfahren Sie das Neueste über die Virtualisierung von Workloads in Cloud- oder On-Premise-Umgebungen