Public Works and Government Services Canada
Symbol of the Government of Canada

Institutional Links

 
Search TERMIUM Plus®

Voice recognition for language professionals

André Guyon
(Language Update, Volume 6, Number 2, 2009, page 26)

I am one of those old-timer language technology specialists who wondered how they could use the new technologies that have come onto the scene over the years.

I have a paradoxical relationship with voice recognition. I have found it fascinating for almost 15 years, but have never completely integrated it into my work.

And yet, I have spent more time on voice recognition software than on most other types of applications. It all started in 1995 with a software program that understood only English and an entirely useless microphone. The program was called IBM Via Voice, and it forced me to pause after each word.

The software program was provided "free of charge" with a computer selling for about $4,000. I used it a bit to program, but I could not imagine using it to translate or even to write emails.

Over the years, I carried out new tests. Each time, I noticed significant improvements. However, for many reasons (including laziness, perhaps), I never thought that the time was ripe to use the tool for translation.

Among the many wonderful improvements, the sampling frequency1 increased significantly, continuous dictation replaced the jerky style I mentioned earlier, and word-processing software programs were fairly well integrated into the technology.

The two main competitors were Via Voice from IBM and Dragon from Nuance (which changed ownership a few times). Dragon, the first to offer continuous dictation, acquired a large share of the market.

Over the 2008-09 Christmas holiday break, I decided to try voice recognition again. I obtained the most recent (10th) “Preferred” version of Dragon. Here is what happened:

As was the case with each previous attempt, I noticed that progress had been made. As is my habit, I asked myself if work still needed to be done and if everyone would adopt the technology. Of course, people writing about voice recognition software are certain that the average person will soon be using it, Dragon in particular. They have been saying the same thing since 2000.

Nonetheless, you will forgive me for saying that this time, a greater number of language professionals may indeed opt for voice recognition because

  1. the technology now comes standard with the Microsoft operating system (at least with the 64-bit edition of Vista);
  2. it is increasingly difficult to find people who are good at transcribing dictation.

I have a recent computer equipped with the 64-bit Vista operating system (there is also a 32-bit version). Before buying the software, I checked the box. The instructions stated that it worked with Vista, but failed to specify which edition.

In information technology, you must always assume that what is not written on the box does not exist. The software I had bought did not work with the 64-bit Vista platform.

I turned to my backup PC, which was equipped with the Windows XP operating system. This time, I successfully installed Dragon, though it took a substantial amount of effort. Among other things, I had the privilege of digging through the company’s “knowledge database.”2

The software is still sold with a dictation headset whose quality I would consider worse than useless.3 Ironically, however, Dragon now checks that the headset sound quality is sufficient before letting the user start the practice session.

It is a good way to avoid disappointment. Taking an incredible leap of faith, I tried using Dragon with the headset provided. The software program told me that the sound quality was insufficient. So I switched to a chat headset, which the software deemed of satisfactory quality. It then authorized me to continue.

Now that I had passed the headset test, the application asked me to read a sentence or two. The training was now starting. But alas, I have a Montréal-French accent, and Dragon expected to hear a Parisian accent. I was forced to start over several times before I thought to imitate a Parisian accent. The trick worked: the voice recognition software gave me the green light to continue.

After I had read the practice text, which took two to three minutes, the software program set up its voice recognition models. At this point, the “dragon” let me start the actual dictation.

By using its special editor, I could now teach the software to adapt to my particular accent. This step was not mandatory, but it did help to eliminate entry errors almost entirely.

I could also teach Dragon new words, such as my surname and the given names of my children.

It is safe to say that with this version (the 10th), I finished in about two hours what used to take me ten hours. This was very encouraging.

The Preferred edition let me register several profiles, and I could even work in English. This suited me nicely, because I often have to write in English.

I attempted to create an AndréEN user profile, and the software let me do it.

However, it liked my English accent even less than my French one. It never let me move on to the short practice text and was unswayed by flattery or threats.

Being a true language technology specialist, I look for ways to solve the problems I am faced with. I noticed in the software program an option designed for people with a Spanish accent, so I tried that option. This time, Dragon was quite willing to tolerate my English. Though I was delighted with the result, I was jealous of how the software treated Spanish speakers better than French speakers.

The correction process cruelly and very precisely reveals dictation errors that are generally corrected when transcribed. A human transcriber tolerates pronunciation mistakes that voice recognition software cannot handle.

For example, one day I dictated “au moment et à l’heure qui vous conviendront le mieux” a little too fast, and “la maman et le beurre qui vous conviendront le mieux” appeared on the screen. Conversely, when a transcriber heard “le système a tété réinitialisé” instead of “le système a été réinitialisé,” she laughed a bit, and simply corrected the mistake.

When reviewing a word or group of words that have not been entered properly, you can listen to what was dictated and choose a suggested correction. You can even show the software program how to write what you just dictated.

One of the interesting new features developed in recent years allows you to dictate a text into a data entry device and then connect it to the software. Who knew that you could dictate on a bus, in the subway or on an airplane? But don’t try it unless you have trained the software in those environments.

Continuous noises like my computer fan do not cause any problems. However, my neighbour’s ferocious cough or a loud account of his vacation can produce unexpected results.

As well, people like me who get distracted and who read what they had wanted to write rather than what they have actually written can benefit from having their text revised or by setting it aside for a few days.

To conclude, language professionals who enjoy dictation should certainly look into voice recognition and invest in a good unidirectional microphone (that captures only sound coming from one specific direction and not all the ambient noise).

The average dictating translator can easily achieve a speed of 70 words a minute, twice the actual speed of most translators.4

Moreover, translators often overwrite the source text when they work, a habit that will be very difficult to change. If there were macros that allowed you to move freely within a text by selecting sentence after sentence and replacing them with what is dictated, most translators would have an easier time using the software.

Obviously, some people will never get used to dictaphones or voice recognition; it is just not for them.

On the other hand, co-workers who can no longer use keyboards have quickly adapted to voice recognition and become experts at it. This proves that people learn when they have to.

NOTES

  • Back to the note1 The quality of the recorded sound depends on the amount of information per second, usually expressed in MHz, that the computer can store in real time, much like for music on an MP3 or a CD.
  • Back to the note2 Warning: Consulting the knowledge database can cause side effects, including anger, tears, aggressiveness and a feeling of powerlessness. Avoid using without appropriate emotional preparation.
  • Back to the note3 I would have preferred their selling me the less expensive software and letting me buy a better quality microphone or headset instead of wasting my time, but I cannot plead ignorance.
  • Back to the note4 In general, we systematically overestimate our typing speed. The average person types at a speed of 25 to 35 words a minute. However, it sounds good in conversation to say that we can type up to 100 words a minute.