A short history of text to speech

Featured in

    Voice synthesis technology, more commonly known as text-to-speech, as evolved rapidly over the years. Learn more about the history of text-to-speech.

    Speech synthesis, or the artificial production of the human voice, has come a long way over the last 70 years. Whether you currently use text-to-speech services to listen to books, study, or proofread your own written work, there’s no doubt that text-to-speech services have made life easier for people in a variety of professions.

    Here, we’ll take a look at how text-to-speech processing works, and how the assistive technology has changed over time.

    Introduction

    In the 1700s, Russian professor Christian Kratzenstein created acoustic resonators that mimicked the sound of the human voice. Two decades later, the VODER (Voice Operating Demonstrator) made big headlines at the New York World’s Fair when creator Homer Dudley showed crowds how human speech could be created through artificial means. The device was tough to play–Dudley had to control the fundamental frequency using foot pedals.

    In the early 1800s, Charles Wheatstone developed the first mechanical speech synthesizer. This kick started a rapid evolution of articulatory synthesis tools and technologies.

    It can be tough to pin down exactly what makes a good text-to-speech program, but like many things in life, you know it when you hear it. A high-quality text-to-speech program offers natural-sounding voices with real-life inflection and tone.

    Text-to-speech technology can help people who are visually impaired and live with other disabilities get the information they need to thrive at work and to communicate with others. The software also allows students and others with heavy workloads of reading to listen to their information via human speech when they’re on the go. Synthetic speech allows people to get more done in less time, and can be useful in a variety of settings, from video game creation to helping people with language processing differences.

    1950s and 60s

    In the late 1950s, the first speech synthesis systems were created. This systems were computer-based. In 1961, John Larry Kelly Jr., a physicist at Bell Labs, used an IBM computer to synthesize speech. His vocoder (voice recorder synthesizer) recreated the song Daisy Bell.

    At the time that Kelly was perfecting his vocoder, Arthur C. Clarke, author of 2001: A Space Odyessey, used Kelly’s demonstration in his book’s screenplay. During the scene the HAL 9000 computer sings Daisy Bell.

    In 1966, linear predictive coding came onto the scene. This form of speech coding began it’s development under Fumitada Itakura and Shuzo Saito. Bishnu S. Atal and Manfred R. Schroeder also contributed to the development of linear predictive coding.

    1970s

    In 1975, the line spectral pairs method was developed by Itakura. This high-compression speech coding method helped Itakura learn more about speech analysis and synthesis, finding weak spots and figuring out how to make them better.

    During this year, MUSA was also released. This stand-alone speech synthesis system used an algorithm to read Italian out loud. A version released three years later was able to sing in Italian.

    Int he 70s, the first articulatory synthesizer was developed and based on the human vocal tract. The first known synthesizer was developed by Tom Baer, and Paul Mermelstein, and Philip Rubin at Haskins Laboratories. The trio used information from the vocal tract models created at Bell Laboratories in the 60s and 70s.

    In 1976, Kurzweil Reading Machines for the Blind were introduced. While these devices were far too expensive for the general public, libraries often provided them for people with visual impairments to listen to books.

    Linear predictive coding became the starting point for synthesizer chips. Texas Instruments LPC Speech Chips and the Speak & Spell toys of the late 1970s both used synthesizer chip technology. These toys were examples of human voice synthesis that had accurate intonations, differentiating the voice from the commonly robotic-sounding synthesized voices of the time. Many handheld electronics with the ability to synthesize speech became popular during this decade, including the Telesensory Systems Speech+ calculator for the blind. The Fidelity Voice Chess Challenger, a chess computer that was able to synthesize speech, was released in 1979.

    1980s

    In the 1980s, speech synthesis began to rock the video game world. The 1980 release of Stratovox (a shooting style arcade game) was released by Sun Electronics. Manbiki Shoujo (translated in English to Shoplifting Girl) was the first personal computer game with the ability to synthesize speech. The electronic game Milton was also released in 1980–it was The Milton Bradley Company’s first electronic game that had the ability to synthesize the human voice.

    In 1983, the standalone acoustic-mechanical speech machine called DECtalk. DECtalk understood phonetic spellings of words, allowing customized pronunciation of unusual words. These phonetic spellings could also include a tone indicator which DECtalk would use when enunciating the phonetic components. This allowed DECtalk to sing.

    In the late 80s, Steve Jobs created NeXT, a system that was developed by Trillium Sound Research. While NeXT didn’t take off, Jobs eventually merged the program with Apple in the 90s.

    1990s

    Earlier versions of synthesized text-to-speech systems sounded distinctly robotic, but that began to change in the late 80s and early 90s. Softer consonants allow speaking machines to lose the electronic edge and sound more human. In 1990, Ann Syrdal at AT&T Bell Laboratories developed a female speech synthesizer voice. Engineers worked to make voices more natural-sounding during the 90s.

    In 1999, Microsoft released Narrator, a screen reader solution that is now included in every copy of Microsoft Windows.

    2000s

    Speech synthesis ran into some hiccups during the 2000s, as developers struggled to create agreed-upon standards for synthesized speech. Since speech is highly individual, it’s hard for people around the world to come together and agree on proper pronunciation of phonemes, diphones, intonation, tone, pattern playback, and inflection.

    Quality of formant synthesis speech audio also became more of a concern in the 90s, as engineers and researchers noticed that the quality of the systems used in a lab to play back synthesized speech was often far more advanced than the equipment the user had. When thinking of speech synthesis, many people think of Stephen Hawking’s voice synthesizer, which provided a robotic-sounding voice with little human tone.

    In 2005, researchers finally came to some agreement and began to use a common speech dataset, allowing them to work from the same basic ideals when creating high-level speech synthesis systems.

    In 2007, a study was done showing that listeners can figure out whether a person who is speaking is smiling. Researchers are continuing to work to figure out how to use this information to create speech recognition and speech synthesis software that is more natural.

    2010s

    Today, speech synthesis products that use speech signals are everywhere, from Siri to Alexa. Electronic speech synthesizers don’t just make life easier–they also make life more fun. Whether you’re using a TTS system to listen to novels on the go or you’re using apps that make it easier to learn a foreign language, it’s likely that you’re using text to speech technology to activate your neural networks on a daily basis.

    The future

    In coming years, it’s likely that voice synthesis technology will focus on creating a model of the brain to better understand how we record speech data in our minds. Speech technology will also work to better understand the role that emotion plays in speech, and will use this information to create AI voices that are indistinguishable from actual humans.

    The Latest In Voice Synthesis Technology: Speechify

    When learning about transitions from earlier speech synthesis technology, it’s amazing to imagine how far science has come. Today, apps like Speechify make it easy to translate any text into audio files. With just the touch of a button (or tap on an app), Speechify is able to take websites, documents, and images of text and translate them into natural-sounding speech. Speechify’s library syncs across all your devices, making it simple for you to keep learning and working on the go. Check out the Speechify app in both Apple’s App Store and Android’s Google Play.  

    FAQs

    Who invented text-to-speech?

    Text-to-speech for English was invented by Noriko Umeda. The system was developed in the Electrotechnical Laboratory in Japan in 1968.

    What is the purpose of text-to-speech?

    Many people use text-to-speech technology. For people who prefer to get their information in audio format, TTS technology can make it simple to get the information necessary to work or learn, without having to spend hours in front of a book. Busy professionals also use TTS technology to stay on top of their work when they’re unable to sit in front of a computer screen. Many types of TTS technology were originally developed for people with visual impairments, and TTS is still a fantastic way for people who struggle to see to get the information that they need.

    How do you synthesize a speech?

    Pieces of recorded speech are stored in a database in various units. Software prepares audio files through unit selection. From there, a voice is created. Often, the larger the output range of a program, the more the program struggles to provide users with vocal clarity.

    Tyler Weitzman

    Tyler Weitzman

    Tyler Weitzman is the Co-Founder, Head of Artificial Intelligence & President at Speechify, the #1 text-to-speech app in the world, totaling over 100,000 5-star reviews. Weitzman is a graduate of Stanford University, where he received a BS in mathematics and a MS in Computer Science in the Artificial Intelligence track. He has been selected by Inc. Magazine as a Top 50 Entrepreneur, and he has been featured in Business Insider, TechCrunch, LifeHacker, CBS, among other publications. Weitzman’s Masters degree research focused on artificial intelligence and text-to-speech, where his final paper was titled: “CloneBot: Personalized Dialogue-Response Predictions.”

    MS in Computer Science, Stanford University Dyslexia & Accessibility Advocate, CEO/Founder of Speechify

    Recent Blogs

    • Ultimate guide to ElevenLabs
      Ultimate guide to ElevenLabs
      Arrow
    • Voice changer for Discord
      Voice changer for Discord
      Arrow
    • How to download YouTube audio
      How to download YouTube audio
      Arrow
    • Speechify 3.0 Released.
      Speechify 3.0 is the Best Text to Speech App Yet.
      Arrow
    • Voice API
      Voice API: Everything You Need to Know
      Arrow
    • Text to audio
      Best text to speech generator apps
      Arrow
    • The best AI tools other than ChatGPT
      The best AI tools other than ChatGPT
      Arrow
    • Top voice over marketplaces reviewed
      Top voice over marketplaces reviewed
      Arrow
    • Speechify Studio vs. Descript
      Speechify Studio vs. Descript
      Arrow
    • Google Cloud Text to Speech API
      Everything to Know About Google Cloud Text to Speech API
      Arrow
    • Source of Joe Biden deepfake revealed after election interference
      Source of Joe Biden deepfake revealed after election interference
      Arrow
    • How to listen to scientific papers
      How to listen to scientific papers
      Arrow
    • How to add music to CapCut
      How to add music to CapCut
      Arrow
    • What is CapCut?
      What is CapCut?
      Arrow
    • VEED vs. InVideo
      VEED vs. InVideo
      Arrow
    • Speechify Studio vs. Kapwing
      Speechify Studio vs. Kapwing
      Arrow
    • Voices.com vs. Voice123
      Voices.com vs. Voice123
      Arrow
    • Voices.com vs. Fiverr Voice Over
      Voices.com vs. Fiverr Voice Over
      Arrow
    • Fiverr voice overs vs. Speechify Voice Over Studio
      Fiverr voice overs vs. Speechify Voice Over Studio
      Arrow
    • Voices.com vs. Speechify Voice Over Studio
      Voices.com vs. Speechify Voice Over Studio
      Arrow
    • Voice123 vs. Speechify Voice Over Studio
      Voice123 vs. Speechify Voice Over Studio
      Arrow
    • Voice123 vs. Fiverr voice overs
      Voice123 vs. Fiverr voice overs
      Arrow
    • HeyGen vs. Synthesia
      HeyGen vs. Synthesia
      Arrow
    • Hour One vs. Synthesia
      Hour One vs. Synthesia
      Arrow
    • HeyGen vs. Hour One
      HeyGen vs. Hour One
      Arrow
    • Speechify makes Google’s Favorite Chrome Extensions of 2023 list
      Speechify makes Google’s Favorite Chrome Extensions of 2023 list
      Arrow
    • How to Add a Voice Over to Vimeo Video: A Comprehensive Guide
      How to Add a Voice Over to Vimeo Video: A Comprehensive Guide
      Arrow
    • How to Add a Voice Over to Canva Video: A Comprehensive Guide
      How to Add a Voice Over to Canva Video: A Comprehensive Guide
      Arrow
    • What is Speech AI: Explained
      What is Speech AI: Explained
      Arrow
    • How to Add a Voice Over to Canva Video
      How to Add a Voice Over to Canva Video
      Arrow
    • How to Add a Voice Over to Canva Video
      The Best Celebrity Voice Generators in 2024
      Arrow
    • How to Add a Voice Over to Canva Video
      YouTube Text to Speech: Elevating Your Video Content with Speechify
      Arrow
    • How to Add a Voice Over to Canva Video
      The 7 best alternatives to Synthesia.io
      Arrow
    • How to Add a Voice Over to Canva Video
      Everything you need to know about text to speech on TikTok
      Arrow
    • How to Add a Voice Over to Canva Video
      The 10 best text-to-speech apps for Android
      Arrow
    • How to Add a Voice Over to Canva Video
      How to convert a PDF to speech
      Arrow
    • How to Add a Voice Over to Canva Video
      The top girl voice changers
      Arrow
    • How to Add a Voice Over to Canva Video
      How to use Siri text to speech
      Arrow
    • How to Add a Voice Over to Canva Video
      Obama text to speech
      Arrow
    • How to Add a Voice Over to Canva Video
      Robot Voice Generators: The Futuristic Frontier of Audio Creation
      Arrow
    • How to Add a Voice Over to Canva Video
      PDF Read Aloud: Free & Paid Options
      Arrow
    • How to Add a Voice Over to Canva Video
      Alternatives to FakeYou text to speech
      Arrow
    • How to Add a Voice Over to Canva Video
      All About Deepfake Voices
      Arrow
    • How to Add a Voice Over to Canva Video
      TikTok voice generator
      Arrow
    • How to Add a Voice Over to Canva Video
      Text to speech GoAnimate
      Arrow
    • How to Add a Voice Over to Canva Video
      The best celebrity text to speech voice generators
      Arrow
    • How to Add a Voice Over to Canva Video
      PDF Audio Reader
      Arrow
    • How to Add a Voice Over to Canva Video
      How to get text to speech Indian voices
      Arrow
    • How to Add a Voice Over to Canva Video
      Elevating Your Anime Experience with Anime Voice Generators
      Arrow
    • How to Add a Voice Over to Canva Video
      Best text to speech online
      Arrow
    • How to Add a Voice Over to Canva Video
      Top 50 movies based on books you should read
      Arrow
    • How to Add a Voice Over to Canva Video
      Download audio
      Arrow
    • How to Add a Voice Over to Canva Video
      How to use text-to-speech for Quandale Dingle meme sounds
      Arrow
    • How to Add a Voice Over to Canva Video
      Top 5 apps that read out text
      Arrow
    • How to Add a Voice Over to Canva Video
      The top female text to speech voices
      Arrow
    • How to Add a Voice Over to Canva Video
      Female voice changer
      Arrow
    • How to Add a Voice Over to Canva Video
      Sonic text to speech voice generator online
      Arrow
    • How to Add a Voice Over to Canva Video
      Best AI voice generators – The Ultimate List
      Arrow
    • How to Add a Voice Over to Canva Video
      Voice changer
      Arrow
    • How to Add a Voice Over to Canva Video
      Text to speech in Powerpoint
      Arrow
    footer-waves