The Australian firm, which first started developing speech recognition systems in the 1990s, is once again claiming to be pushing the envelope by developing systems that can understand conversations – not just recognise individual pieces of speech.
“Speech recognition is nearly as accurate as the human ear in determining what gets said,” managing director of VeCommerce, Paul Magee, told iTnews.
Getting computer systems to actually understand what is said is challenging.
The reason for this is that the human ear typically picks up only a certain percentages of a conversation. It then uses ‘context’ stored in the brain to fill in the missing words to put together a complete understanding of what has been said.
The more different or foreign a subject is, the higher the need to listen with complete accuracy because there is less or in some cases no context to fill in any gaps, according to Magee.
Transferring that ability into speech recognition algorithms and systems represents the next logical evolution for the technology.
Magee uses a horse-racing gambling system as an example.
“The limitations of core speech recognition systems means it can be very hard to ‘hear’ the difference between ‘seven’ and ‘eleven’,” said Magee.
“In a gambling system, did the person calling in mean race seven or eleven? The challenge is how can the system make a sensible decision without asking the person what they said?”
Magee said newer systems were able to use ‘fuzzy logic’ to apply probabilities to all the potential options to work out the most likely one and gain the degree of certainty needed to make a correct judgement.
For example, if race seven has already been run, the caller is likely phoning about race 11. Or perhaps in race 11, is there a horse numbered seven?
“The system can infer certain things from what the user is saying,” said Magee.
The other major trend in the industry is towards biometric voice verification, according to Magee.
“Every one of us has a unique voice pattern,” he explained.
“It’s more accurate than a fingerprint in determining individual identity, and people aren’t scared about using their own voice [for verification].
“Iris or fingerprint scanners have hugely negative connotations when it comes to privacy. Voice is a non-invasive way to establish identity and people are voting with their feet,” said Magee.
Human understanding drives speech recognition research
By Ry Crozier on Nov 24, 2008 2:28PM