Insta
Swarajya Staff
Oct 20, 2016, 05:15 PM | Updated 05:15 PM IST
Save & read from anywhere!
Bookmark stories for easy access on any device or the Swarajya app.
What if someone told you that a technology existed which could transcribe a conversation you have with, say, a friend with no or fewer errors than someone who transcribes speech for a living?
That would be a hard sell. But in a major breakthrough in speech recognition, Microsoft has cracked the code to do just that.
The researchers at Microsoft have come up with a technology that recognises the words in a conversation as accurately as people do. Is the transcription error-ridden? Well, Microsoft says this technology operates at a word error rate of 5.9 per cent, which is about the same as that for a professional transcriptionist. It’s also the lowest ever recorded against the industry standard “Switchboard” speech recognition task.
The company’s chief speech scientist, Xuedong Huang, said in a Microsoft blog post:
We’ve reached human parity. This is an historic achievement.
This achievement comes after a goal Microsoft set less than a year ago – and they’ve exceeded it by quite a margin.
Harry Shum, executive vice president who heads the Microsoft Artificial Intelligence and Research group, said:
Even five years ago, I wouldn’t have thought we could have achieved this. I just wouldn’t have thought it would be possible.
This accomplishment comes on the back of decades of research undertaken in speech recognition, starting with the US agency DARPA in the early 1970s, which was tasked with producing breakthroughs in technology in the interest of national security.
So what are the implications of this latest feat? It will find itself making an impact on the consumer entertainment industry, in devices like the Xbox, in tools that are exclusively designed to transcribe speech to text and personal digital assistants like Microsoft’s Cortana.
The eventual goal of this ambitious team at Microsoft is to build a technology that not just recognises words but also understands it.
That’s taking human parity to an altogether different level.
With inputs from IANS