The Future of Speech Recognition Software

A theory about the viability of an adequate speech recognition system.

Speech recognition software is still in its infancy … it’s not quite at the Star Trek level yet, but as the technology evolves, common sense dictates that it will.

I would like to think that it will ultimately form the basis of very easy and accessible communication between different languages and cultures for the common person.

But before we talk about that, we need to see where we stand now as far as technology is concerned.

It starts with what we have now – fairly accurate, but not-yet perfected Speech Recognition (SR) software.  These are the programs that turn the spoken word into text on your computer, very handy for people that don’t like to write.  At the level that we are at now, there are some existing problems, but there are a few companies making great strides in this area, so for the sake of argument, we’ll assume all problems solved.

The next piece of software we need is that of already existing Speech Translation (ST) technology.  ST programs convert one language into any number of others (in text form).  These programs are fairly accurate, but not perfect.  We will however, assume that they are.

Next we need to consider the use of Speech Synthezation (SS).  We have had speech synthesizors for years, although they are not very good.  One day (I assume) they will be able to correctly reproduce the human voice, so we will assume that they can.  But to go a step further, by recording a few words of a person’s voice, the computer would be able to synthesize the text with that person’s own voice.

Now, when we bundle these technologies together, the following scenario might be possible:

You need to talk to a friend or a colleague in, let’s say Italy.

Using your computer (or telephone even), you speak to Gabriella.

The software initially stores your voice patterns for synthezation later.

Your spoken word is then converted to text (this should be internal).

The system then translates your language into hers.

The synthesizer then takes over, to create your voice.

Gabriella then hears your voice, speaking her language, and vice versa.

Of course, this technology will need to be adaptive.  There are many variations in dialect and differing sound conditions, the computer will have to be powerful.  And as for the future, for the Star Trek set, I’m fairly sure that this technology will one day be available; but it would sure come in handy today.

Leave Your Response