sign-in Register

Speech Technology

16 July 2010

Advances in phonetic analysis means speech technology is no longer hindered by accents and vernacular. Paul Golden examines what’s new in voice recognition.

 

Many people in the speech technology industry now freely acknowledge that the systems they have developed do not yet perform flawlessly in all circumstances and that they need to be implemented appropriately. Processing an entire insurance claim within an automated speech solution, for example, may not be desirable, but there are parts of the process that could be handled reasonably efficiently by such solutions.

 

It’s also widely accepted that while unusually heavy accents may trip up a speech system, a live agent may not do much better.

 

Katalyst Communications speech analytics consultant Neil Barnes believes technology that analyses speech based on phonetics has rendered the debate over how to deal with variables such as accents and vernacular redundant because it is interested in how words sound, not how they look on paper.

 

Language packs can be tuned to specific countries or regions, enabling them to recognise proper names, slang, jargon and even words specific to a city or town.

 

The extent to which the process of determining at what point a call needs to be handled by an agent rather than a speech technology application can be refined, but whether it can develop emotional intelligence is still the subject of conjecture.

 

Some observers point out that emotion is only one factor in deciding the switching point to a human backup system and that such decisions should be driven by content of the call or metrics, such as time spent in the system compared to the time expected for the transaction. The emotion may be undetectable here but the customer may still be experiencing difficulty with the automated system.

 

Darren Standing, vice president products and marketing Aurix explains that some initial work has been undertaken to build databases identifying scales of human emotion that can take into account the ‘grey’ areas between those emotions.

 

There are a small number of databases that have attempted to define these emotions, although there is a degree of argument over their classification, especially as emotion perception can be a very subjective matter. Also, emotions are generally expressed more through physical gestures than tone of voice alone, making them very hard to define over the phone.

 

As a result, the task of ‘emotion recognition’ today is roughly where speech recognition was 20 years ago - some applications are possible, but only under highly controlled conditions. By acknowledging these limitations and adapting to them, Standing reckons commercial applications may be feasible, but adds that it is likely to be some time before an effective system for general purpose interpretation of real emotions from audio is available.

 

In the meantime, companies are focusing on providing more intelligent applications involving techniques that are sensitive to a caller’s previous experience within a call and ideally relate to previous interactions, that are already logged. Factors to be monitored include:

- How many errors were encountered before recognition
- How much help was provided
- How much time was required to respond
- How close to the end of the transaction has the interaction progressed

 

These factors can be used to decide where assisted service should be triggered, suggests Peter Galloway, head of voice self-service at Sabio. By increasing intelligence, speech solutions are far more likely to make smarter decisions about what callers are saying, improving performance and removing the need for more reactive emotional intelligence techniques.

 

According to Sean Keane, EMEA general manager at Salmat, the sheer computing power required to build a system that could identify and process emotional responses in real time places is way beyond the reach of call centre operators. In the absence of such systems, a prudent strategy might be to say ‘when in doubt, transfer the call over to an agent’.

 

Martin Roberts, vice president marketing and business development at Nice Systems says that once multi-channel analytics are developed to the point where they can function in real time, it would be possible to provide these agents with support during the call (for example, suggesting the next best action) as well as automating certain processes. In this scenario, a knowledge base could be searched automatically based on the caller’s initial conversation and any previous interactions they may have had, which could have been via voice, email or chat.

 

Whilst measuring first call resolution post-call may be useful, being able to identify at an early stage of a call where there have been multiple previous contacts is far more powerful as it can dynamically change the agent’s behaviour, he explains.

 

Compliance is another area where real time analysis could be useful by guiding the agent to follow defined processes and state specific phrases.

 

Some call centre industry experts remain sceptical about the value of speech technology. Paul Hudson, consultancy director at Intersperience Research describes it as a solution that suits the organisation, not the consumer: a technology designed to fix a business problem not necessarily a customer experience problem.

 

“Many customers claim speech technology has to be 100 per cent accurate in understanding what they are saying. With up to 20 per cent of calls to some contact centres coming via mobile phones, they refer to feeling self-conscious when having to repeat words or talk louder to be heard. In these ‘social’ surroundings, SMS and text communication would be preferable if an agent wasn’t available - people actually call contact centres because they want human interaction.”

 

Galloway agrees that further refinement of noise resilience is crucial with so many customers using their mobiles to contact organisations. For speech applications, the absolute key is to distinguish between the voice and any other background noise.

 

Your voice print - As unique as your fingerprints

There are a number of niche applications where voice biometrics (a process by which someone is identified by their voice print) has seen some take-up, but it is a solution that is yet to achieve widespread penetration. Given that the technology has the potential to dramatically improve accessibility of contact centres where protecting against identify fraud and protection of data are increasingly becoming driving factors – companies such as Sabio are convinced that biometrics has a strong future in the call centre environment.

 

Identity verification is clearly an important application for biometrics and once customers start to use it for services such as banking and insurance, they will quickly expect the same level of security from their other service providers.

 

However, the technology has potential value in other applications, for example passive identification of callers. It could be used to support fraud, with biometrics spotting the voice prints of users who have already demonstrated a propensity towards fraud and immediately advising agents that the caller may be fraudulent.

 

The firm Nuance expects voice authentication to be commonplace in the UK within the next three years. Research it commissioned found that 60% of consumers now regard voice authentication as a secure form of identity verification.

 

Some users will inevitably be concerned that their voice could be recorded and used fraudulently.

 

However, there are a number of different voice biometric engines that can detect a voice recording playback attempt and Sean Keane says customers can take additional steps to further reduce the likelihood of identity theft.

 

“The voice print created by the user should include a degree of vocal inclination (which can be achieved by reading a piece of text rather than simply rattling off the alphabet). This makes accurate identification easier.”

 

The challenge for call centre operators is managing the trade off between having the highest level of security and having the best customer experience, but even with modest rejection rates many experts claim the technology is significantly more secure than any other system that already exists.”

 

Keane reckons the greatest value would be derived from organisations that customers rarely need to contact and therefore would be more likely to forget their password or security question when attempting to gain access to.

 

The potential boost to customer satisfaction levels from a system that removes the need to remember multiple passwords is an obvious one.

 

Customers could also set up multiple designated numbers depending on where they are, which is particularly useful for banks, but also for other organisations such as public sector bodies or tax authorities and would enable companies to open up more selfservice applications via the web.


     
Comment
RSS feeds Print content Share

Add Comment



 
 
 
 


 

UBM Information Ltd  |  Privacy Policy | Terms of useContact Us  |  Sitemap