Punch-in

Your self-directed employment assistant

Power of Your Voice: Part Two

Accuracy and Commitment

 

Your voice is a powerful tool which can empower you. It promotes independence by accessing the environment or writes passages of text. This is vital when there is a spinal cord injuries, carpal tunnel syndrome, brain injury, or a learning disability. It is not an option for everyone. This resource explains the pros and cons of voice control programs. This a series of articles around utilizing voice-recognition and voice activation.

Part One -- Reaching for the Stars

Part Two -- Accuracy and Commitment

Part Three -- Look Mom, No Hands

Part Four -- It Doesn't Have To be DragonNaturally Speaking

Part Five -- Environmental Aids of Daily Living is a Word

Reaching for the Stars –

You are ready to begin speaking to your computer and see the words appear on your screen. There are many opportunities to use this software on laptops, desktops, chrome books, Apple computers, tablet,

System Requirements

With voice-recognition, these are the basic recommendations from Nuance website, however, it is suggested to purchase more power as you will be running different software packages on top of it in each will require you to have more RAM to prevent it from lagging behind what is spoken and when it will write it on the screen.

Minimum System Requirements:

  • CPU: 2.2 GHz Intel® dual core or equivalent AMD processor.
  • Note: SSE2 instruction set required.
  • Processor Cache: 512 KB.
  • Memory (RAM):
  • 32-bit: 2 GB for Windows 7, 8, 8.1, 10 and Windows Server 2012. ...
  • Free hard disk space: 4GB.
  • Supported Operating Systems:

Starting from The Beginning

Being Informed before Spending All of Your Dollars

Training a Voice File

Where Do I Start

Making It Most Accurate for Your Text Input

Is 200 Words per Minute Realistic?

What Else Can I Do on the Computer

Options for Better Performance

 

Baby Talk was the first speech recognition system which could understand only digits. Given the complexity of human language, it makes sense that inventors and engineers first focused on numbers. Bell Laboratories designed in 1952 the "Audrey" system, which recognized digits spoken by a single voice. (Nov 2, 2011[1])

Voice recognition is the process of taking the spoken word as an input to a computer program. This process is important to virtual reality because it provides a natural and intuitive way of controlling the simulation while allowing the user's hands to remain free. Voice recognition was first developed for soldiers in the war zone followed by gaming.

Listening to Siri, it is slightly snarky sense of humor, made us wonder how far speech recognition has come over the years. Here's a look at the developments in past decades that have made it possible for people to control devices using only their voice.

1950s and 1960s: Baby Talk

The first speech recognition systems could understand only digits. (Given the complexity of human language, it makes sense that inventors and engineers first focused on numbers.) Bell Laboratories designed in 1952 the "Audrey" system, which recognized digits spoken by a single voice. Ten years later, IBM demonstrated at the 1962 World's Fair its "Shoebox" machine, which could understand 16 words spoken in English.

Labs in the United States, Japan, England, and the Soviet Union developed other hardware and software dedicated to recognizing spoken sounds, expanding speech recognition technology to support four vowels and nine consonants.

They may not sound like much, but these first efforts were an impressive start, especially when you consider how primitive computers themselves were at the time.

70s: Speech Recognition Takes Off

Speech recognition technology made major strides in the 1970s, thanks to interest and funding from the U.S. Department of Defense. The DoD's DARPA Speech Understanding Research (SUR) program, from 1971 to 1976, was one of the largest of its kind in the history of speech recognition, and among other things it was responsible for Carnegie Mellon's "Harpy" speech-understanding system. Harpy could understand 1011 words, approximately the vocabulary of an average three-year-old.

Harpy was significant because it introduced a more efficient search approach, called beam search, to "prove the finite-state network of possible sentences," according to "Speech Recognition" hole by Alex Waibel and Kai-Fu Lee. The story of speech recognition is very much tied to advances in search methodology and technology, as Google's entrance into speech recognition on mobile devices proved just a few years ago.

1970s Development and Accomplishments

The '70s also marked a few other important milestones in speech recognition technology, including the founding of the first commercial speech recognition company, Threshold Technology, as well as Bell Laboratories' introduction of a system that could interpret multiple people's voices.

1970s: Speech Recognition Takes Off

Speech recognition technology made major strides in the 1970s, thanks to interest and funding from the U.S. Department of Defense. The DoD's DARPA Speech Understanding Research (SUR) program, from 1971 to 1976, was one of the largest of its kind in the history of speech recognition, and among other things it was responsible for Carnegie Mellon's "Harpy" speech-understanding system. Harpy could understand 1011 words, approximately the vocabulary of an average three-year-old.

Harpy was significant because it introduced a more efficient search approach, called beam search, to "prove the finite-state network of possible sentences," according to Readings in Speech Recognition by Alex Waibel and Kai-Fu Lee. (The story of speech recognition is very much tied to advances in search methodology and technology, as Google's entrance into speech recognition on mobile devices proved just a few years ago.)

The '70s also marked a few other important milestones in speech recognition technology, including the founding of the first commercial speech recognition company, Threshold Technology, as well as Bell Laboratories' introduction of a system that could interpret multiple people's voices[2].

80s: Speech Recognition Turns Toward Prediction

Over the next decade, thanks to new approaches to understanding what people say, speech recognition vocabulary jumped from about a few hundred words to several thousand words, and had the potential to recognize an unlimited number of words. One major reason was a new statistical method known as the hidden Markov model.

Rather than simply using templates for words and looking for sound patterns, HMM considered the probability of unknown sounds' being words. This foundation would be in place for the next two decades

 Equipped with this expanded vocabulary, speech recognition started to work its way into commercial applications for business and specialized industry (for instance, medical use). It even entered the home, in the form of Worlds of Wonder's Julie doll (1987), which children could train to respond to their voice.

Voice-Recognition Today

Three short years after Julie, the world was introduced to Dragon, debuting its first speech recognition system, the “Dragon Dictate”. Around the same time, AT&T was playing with over-the-phone speech recognition software to help field their customer service calls. In 1997, Dragon released “Naturally Speaking,” which allowed for natural speech to be processed without the need for pauses. What started out as a painfully simple and often inaccurate system is now easy for customers to use.

Speech Recognition Software Today

Developments in speech recognition software plateaued for over a decade as technology fought to catch-up to our hopes for innovation. Recognition systems were limited to their processing power and memory, and still had to “guess” what words were being said based on phonemes. This proved difficult for travelers around the globe with thick accents and/or a different vocabulary. Speech recognition products were not localized or globalized by any means, and thus were only successful in specific markets.

In 2010, Google made a game-changing development which brought speech recognition technology to the forefront of innovation: the Google Voice Search app. It aimed to reduce the hassle of typing on your phone’s tiny keyboard, and was the first of its kind to utilize cloud data centers. It was, also personalized to your voice and could ‘learn’ your speech patterns for higher accuracy. This all paved the way for Siri.

One year later in 2011, Apple debuted ‘Siri’. ‘She’ became instantly famous for her incredible ability to accurately process natural utterances. And, for her ability to respond using conversational – and often shockingly sassy – language. You’re sure to have seen a few screen-captures of her pre-programmed humor floating around the internet. Her success, boosted by zealous Apple fans, brought speech recognition technology to the forefront of innovation and technology. With the ability to respond using natural language and to ‘learn’ using cloud-based processing, Siri catalyzed the birth of other likeminded technologies such as Amazon’s Alexa and Microsoft’s Cortana. he Future: Accurate, Localized, and Ubiquitous

Thanks to ongoing data collection projects and cloud-based processing, many larger speech recognition systems no longer struggle with accents. They have, in a way, undergone a series of ‘brain transplants’ that have improved their ability to ‘hear’ a wider variety of words, languages, and accents. At this point of writing, Apple CarPlay is available in five languages, and Siri is available in around 20.

We have certainly made significant strides in speech recognition. However, we are still far from inventing intelligent systems like in so many of our favorite sci-fi movies. Tell Siri that you love her, and she’ll respond, “I hope you don’t say that to those other mobile phones”. It does not truly understand you. To know love and other emotions is to go beyond software; whereas the voice we hear is merely linked to a few lines of code.

There is no doubt that we will be surprised by where this technology takes us in the future. Especially when we really consider the implications of how speech recognition will play a role in AI and deep learning. We have already begun to nudge our way deeper into a world where we are more dependent on our technology



[1] What Is Voice-Recognition, Google University, November 2, 2011.

[2] (Palima, 2011)

Views: 35

Tags: Dragon, Naturally, Speaking, Voice, recognition

Comment

You need to be a member of Punch-in to add comments!

Join Punch-in

© 2019   Created by Great Lakes ADA Center.   Powered by

Badges  |  Report an Issue  |  Terms of Service