Recent advances in deep learning research have improved automatic speech recognition (ASR) technology so significantly that it's moving closer to human-level accuracy. This opens the door for many more exciting possibilities and functions for using the technology.
For example, speech-to-text application programming interfaces (APIs) already boast 92% accuracy compared to a human transcription as calculated by word error rate (WER). Recent strides in machine learning research, such as Data2vec and Perceiver, aim to boost accuracy further and increase the utility of ASR systems.
[ You might also be interested in reading 3 best practices for building speech recognition models. ]
As ASR systems are becoming more accurate, they're also becoming more affordable. This in turn increases their reach and accessibility. During the transition, expect to see pioneering ASR technology pop up in new smart TVs, laptops, and automobiles, further integrating the technology into our daily routines.
You can expect to find ASR applications in places you wouldn't expect, like self-checkout kiosks in grocery stores. In the near future, voice interfaces may become more popular than touch-screen devices. Voice interfaces could change the way people interact with the world.
Audio intelligence features are becoming transformative tools
Today's ASR systems go beyond basic speech-to-text transcription. Businesses may find great value in artificial intelligence (AI)-backed features that provide smart analytics, including the following:
-
Sentiment analysis extracts the sentiments in a speaker's speech segments to analyze feelings. An example would be the emotions expressed during customer-agent interactions in the telecom industry. A company can take this analytical data and use it to better inform agent training, targeted marketing messages, and customer interactions in call centers.
-
Entity detection identifies and classifies entities in a text. For example, engineer is an entity that could be classified as an occupation, while arm and foot could be classified as body parts. Entity detection can be used by the medical field to identify conditions and treatments to help automatically sort patient information and perform statistical analysis. Voice bots use entity detection to identify specific people or companies and then automatically trigger actions to personalize interactions.
-
Speaker diarization identifies distinct speakers in an audio or video file. Call centers use speaker diarization to identify speakers and then analyze a speaker's behavior in order to make future predictions. For example, a podcast might automatically label a transcription with the speakers' names to make the transcriptions more readable.
-
Content safety detection identifies and filters content for potentially harmful and sensitive information, such as hate speech, violence, drugs, and so on. Online podcast platforms may use content safety detection for content moderation.
-
Personal information removal identifies and redacts personally identifiable information (PII), such as social security numbers, credit card numbers, and addresses. Communications and telecom platforms use PII redaction to meet security and privacy requirements and regulations.
-
Summarization breaks audio or video transcripts into logical "chapters" and generates a summary for each one. Virtual meeting platforms use summarization to automatically create useful summaries after each meeting. Call center companies can use summarization to aid conversation reviews.
[ Download the eBook An architect's guide to multicloud infrastructure. ]
Speech recognition
With increased accuracy, accessibility, and analytical prowess, ASR products are quickly becoming deeply integrated into IT architecture. And open source frameworks like DeepSpeech make ASR highly accessible to those who wish to incorporate ASR into their business and IT systems.
Sobre el autor
Dylan is the Founder and CEO of AssemblyAI, a Y Combinator backed startup building the #1 rated API for Automatic Speech Recognition. Dylan is an experienced AI Researcher, with prior experience leading Machine Learning teams at Cisco Systems in San Francisco.
Más como éste
Smarter troubleshooting with the new MCP server for Red Hat Enterprise Linux (now in developer preview)
Navigating secure AI deployment: Architecture for enhancing AI system security and safety
Technically Speaking | Build a production-ready AI toolbox
Technically Speaking | Platform engineering for AI agents
Navegar por canal
Automatización
Las últimas novedades en la automatización de la TI para los equipos, la tecnología y los entornos
Inteligencia artificial
Descubra las actualizaciones en las plataformas que permiten a los clientes ejecutar cargas de trabajo de inteligecia artificial en cualquier lugar
Nube híbrida abierta
Vea como construimos un futuro flexible con la nube híbrida
Seguridad
Vea las últimas novedades sobre cómo reducimos los riesgos en entornos y tecnologías
Edge computing
Conozca las actualizaciones en las plataformas que simplifican las operaciones en el edge
Infraestructura
Vea las últimas novedades sobre la plataforma Linux empresarial líder en el mundo
Aplicaciones
Conozca nuestras soluciones para abordar los desafíos más complejos de las aplicaciones
Virtualización
El futuro de la virtualización empresarial para tus cargas de trabajo locales o en la nube