REBOL Technologies

A Speech Recognition Test - Dictating a Tutorial

Carl Sassenrath, CTO
REBOL Technologies
25-Jan-2005 18:13 GMT

Article #0115
Main page || Index || Prior Article [0114] || Next Article [0116] || Post Comments || Send feedback

Back in November, as a community, we started a CGI tutorial project to describe how to write a REBOL-based web bulletin board system (BBS). The purpose was to show a more substantial example of using REBOL for creating CGI programs.

Unfortunately, the project pretty much got lost in the holiday shuffle. So today I decided... what better than to spend a few hours and wrap it up, while at the same time evaluating speech recognition technology. The ultimate question being: would I become more productive?

For my speech recognition tool, I selected the Dragon NaturallySpeaking software from ScanSoft because it is widely considered one of the best programs on the market. (In fact I'm using it to dictate this very blog.)

I must say that I have had a limited degree of success with Dragon. I've had several people write to me and tell me that Dragon works very well for them, but I have yet to duplicate that experience. Although the software does a fairly good job in the actual speech-to-text conversion, there are a few "workflow" problems that really slow me down and almost negate the benefit of the software for real projects.

For example, here is a small problem that really gets in the way. The Dragon software absolutely cannot remember the word REBOL. I have trained and retrained the software over and over on this simple word. I have deleted similar words (like "rebel") from the vocabulary - just so it won't get confused. And, as soon as I think I have it trained, I return a few hours later or the next day, and it has reverted to its old behavior.

The big puzzle for me as a language designer is why speech recognition software can't be more tuned to the context of my writing. I typically use the word REBOL every few sentences. In a RISC way of thinking (as I always think), I would call REBOL a high-frequency word within the context of my writing. The software should recognize that and not continually give me the words Roybal, Preble, rabble, and others. I have never used those words in any of my documents. They are low-frequency words. The software should not be suggesting them. In fact, when I tell the software to correct the word, the menu of choices usually does not even include the word REBOL! In other words, the software doesn't even have a clue.

I'm sure that some of you will write to me and tell me that I'm not using the software correctly, that I'm not speaking properly, that I have a bad microphone, or my environment is too noisy, etc. I've been pretty careful about those, and as I said before the speech-to-text conversion works quite well.

The problem is in the overall design of the product. This is just not a great implementation of a speech recognition tool. Good software designs need to consider the most frequent workflow of their users and optimize for that. Otherwise, they are just wasting our time.

Post Comments

Updated 6-Mar-2024   -   Copyright Carl Sassenrath   -   WWW.REBOL.COM   -   Edit   -   Blogger Source Code