A short history of text to speech

Featured in

    Voice synthesis technology, more commonly known as text-to-speech, as evolved rapidly over the years. Learn more about the history of text-to-speech.

    Speech synthesis, or the artificial production of the human voice, has come a long way over the last 70 years. Whether you currently use text-to-speech services to listen to books, study, or proofread your own written work, there’s no doubt that text-to-speech services have made life easier for people in a variety of professions.

    Here, we’ll take a look at how text-to-speech processing works, and how the assistive technology has changed over time.

    Introduction

    In the 1700s, Russian professor Christian Kratzenstein created acoustic resonators that mimicked the sound of the human voice. Two decades later, the VODER (Voice Operating Demonstrator) made big headlines at the New York World’s Fair when creator Homer Dudley showed crowds how human speech could be created through artificial means. The device was tough to play–Dudley had to control the fundamental frequency using foot pedals.

    In the early 1800s, Charles Wheatstone developed the first mechanical speech synthesizer. This kick started a rapid evolution of articulatory synthesis tools and technologies.

    It can be tough to pin down exactly what makes a good text-to-speech program, but like many things in life, you know it when you hear it. A high-quality text-to-speech program offers natural-sounding voices with real-life inflection and tone.

    Text-to-speech technology can help people who are visually impaired and live with other disabilities get the information they need to thrive at work and to communicate with others. The software also allows students and others with heavy workloads of reading to listen to their information via human speech when they’re on the go. Synthetic speech allows people to get more done in less time, and can be useful in a variety of settings, from video game creation to helping people with language processing differences.

    1950s and 60s

    In the late 1950s, the first speech synthesis systems were created. This systems were computer-based. In 1961, John Larry Kelly Jr., a physicist at Bell Labs, used an IBM computer to synthesize speech. His vocoder (voice recorder synthesizer) recreated the song Daisy Bell.

    At the time that Kelly was perfecting his vocoder, Arthur C. Clarke, author of 2001: A Space Odyessey, used Kelly’s demonstration in his book’s screenplay. During the scene the HAL 9000 computer sings Daisy Bell.

    In 1966, linear predictive coding came onto the scene. This form of speech coding began it’s development under Fumitada Itakura and Shuzo Saito. Bishnu S. Atal and Manfred R. Schroeder also contributed to the development of linear predictive coding.

    1970s

    In 1975, the line spectral pairs method was developed by Itakura. This high-compression speech coding method helped Itakura learn more about speech analysis and synthesis, finding weak spots and figuring out how to make them better.

    During this year, MUSA was also released. This stand-alone speech synthesis system used an algorithm to read Italian out loud. A version released three years later was able to sing in Italian.

    Int he 70s, the first articulatory synthesizer was developed and based on the human vocal tract. The first known synthesizer was developed by Tom Baer, and Paul Mermelstein, and Philip Rubin at Haskins Laboratories. The trio used information from the vocal tract models created at Bell Laboratories in the 60s and 70s.

    In 1976, Kurzweil Reading Machines for the Blind were introduced. While these devices were far too expensive for the general public, libraries often provided them for people with visual impairments to listen to books.

    Linear predictive coding became the starting point for synthesizer chips. Texas Instruments LPC Speech Chips and the Speak & Spell toys of the late 1970s both used synthesizer chip technology. These toys were examples of human voice synthesis that had accurate intonations, differentiating the voice from the commonly robotic-sounding synthesized voices of the time. Many handheld electronics with the ability to synthesize speech became popular during this decade, including the Telesensory Systems Speech+ calculator for the blind. The Fidelity Voice Chess Challenger, a chess computer that was able to synthesize speech, was released in 1979.

    1980s

    In the 1980s, speech synthesis began to rock the video game world. The 1980 release of Stratovox (a shooting style arcade game) was released by Sun Electronics. Manbiki Shoujo (translated in English to Shoplifting Girl) was the first personal computer game with the ability to synthesize speech. The electronic game Milton was also released in 1980–it was The Milton Bradley Company’s first electronic game that had the ability to synthesize the human voice.

    In 1983, the standalone acoustic-mechanical speech machine called DECtalk. DECtalk understood phonetic spellings of words, allowing customized pronunciation of unusual words. These phonetic spellings could also include a tone indicator which DECtalk would use when enunciating the phonetic components. This allowed DECtalk to sing.

    In the late 80s, Steve Jobs created NeXT, a system that was developed by Trillium Sound Research. While NeXT didn’t take off, Jobs eventually merged the program with Apple in the 90s.

    1990s

    Earlier versions of synthesized text-to-speech systems sounded distinctly robotic, but that began to change in the late 80s and early 90s. Softer consonants allow speaking machines to lose the electronic edge and sound more human. In 1990, Ann Syrdal at AT&T Bell Laboratories developed a female speech synthesizer voice. Engineers worked to make voices more natural-sounding during the 90s.

    In 1999, Microsoft released Narrator, a screen reader solution that is now included in every copy of Microsoft Windows.

    2000s

    Speech synthesis ran into some hiccups during the 2000s, as developers struggled to create agreed-upon standards for synthesized speech. Since speech is highly individual, it’s hard for people around the world to come together and agree on proper pronunciation of phonemes, diphones, intonation, tone, pattern playback, and inflection.

    Quality of formant synthesis speech audio also became more of a concern in the 90s, as engineers and researchers noticed that the quality of the systems used in a lab to play back synthesized speech was often far more advanced than the equipment the user had. When thinking of speech synthesis, many people think of Stephen Hawking’s voice synthesizer, which provided a robotic-sounding voice with little human tone.

    In 2005, researchers finally came to some agreement and began to use a common speech dataset, allowing them to work from the same basic ideals when creating high-level speech synthesis systems.

    In 2007, a study was done showing that listeners can figure out whether a person who is speaking is smiling. Researchers are continuing to work to figure out how to use this information to create speech recognition and speech synthesis software that is more natural.

    2010s

    Today, speech synthesis products that use speech signals are everywhere, from Siri to Alexa. Electronic speech synthesizers don’t just make life easier–they also make life more fun. Whether you’re using a TTS system to listen to novels on the go or you’re using apps that make it easier to learn a foreign language, it’s likely that you’re using text to speech technology to activate your neural networks on a daily basis.

    The future

    In coming years, it’s likely that voice synthesis technology will focus on creating a model of the brain to better understand how we record speech data in our minds. Speech technology will also work to better understand the role that emotion plays in speech, and will use this information to create AI voices that are indistinguishable from actual humans.

    The Latest In Voice Synthesis Technology: Speechify

    When learning about transitions from earlier speech synthesis technology, it’s amazing to imagine how far science has come. Today, apps like Speechify make it easy to translate any text into audio files. With just the touch of a button (or tap on an app), Speechify is able to take websites, documents, and images of text and translate them into natural-sounding speech. Speechify’s library syncs across all your devices, making it simple for you to keep learning and working on the go. Check out the Speechify app in both Apple’s App Store and Android’s Google Play.  

    FAQs

    Who invented text-to-speech?

    Text-to-speech for English was invented by Noriko Umeda. The system was developed in the Electrotechnical Laboratory in Japan in 1968.

    What is the purpose of text-to-speech?

    Many people use text-to-speech technology. For people who prefer to get their information in audio format, TTS technology can make it simple to get the information necessary to work or learn, without having to spend hours in front of a book. Busy professionals also use TTS technology to stay on top of their work when they’re unable to sit in front of a computer screen. Many types of TTS technology were originally developed for people with visual impairments, and TTS is still a fantastic way for people who struggle to see to get the information that they need.

    How do you synthesize a speech?

    Pieces of recorded speech are stored in a database in various units. Software prepares audio files through unit selection. From there, a voice is created. Often, the larger the output range of a program, the more the program struggles to provide users with vocal clarity.

    Tyler Weitzman

    Tyler Weitzman

    Tyler Weitzman is the Co-Founder, Head of Artificial Intelligence & President at Speechify, the #1 text-to-speech app in the world, totaling over 100,000 5-star reviews. Weitzman is a graduate of Stanford University, where he received a BS in mathematics and a MS in Computer Science in the Artificial Intelligence track. He has been selected by Inc. Magazine as a Top 50 Entrepreneur, and he has been featured in Business Insider, TechCrunch, LifeHacker, CBS, among other publications. Weitzman’s Masters degree research focused on artificial intelligence and text-to-speech, where his final paper was titled: “CloneBot: Personalized Dialogue-Response Predictions.”

    MS in Computer Science, Stanford University Dyslexia & Accessibility Advocate, CEO/Founder of Speechify

    Recent Blogs

    • Voice Simulator & Content Creation with AI-Generated Voices
      Voice Simulator & Content Creation with AI-Generated Voices
      Arrow
    • Convert Audio and Video to Text: Transcription Has Never Been Easier.
      Convert Audio and Video to Text: Transcription Has Never Been Easier.
      Arrow
    • How to Record Voice Overs Properly Over Gameplay: Everything You Need to Know
      How to Record Voice Overs Properly Over Gameplay: Everything You Need to Know
      Arrow
    • Voicemail Greeting Generator: The New Way to Engage Callers
      Voicemail Greeting Generator: The New Way to Engage Callers
      Arrow
    • How to Avoid AI Voice Scams
      How to Avoid AI Voice Scams
      Arrow
    • Character AI Voices: Revolutionizing Audio Content with Advanced Technology
      Character AI Voices: Revolutionizing Audio Content with Advanced Technology
      Arrow
    • Best AI Voices for Video Games
      Best AI Voices for Video Games
      Arrow
    • How to Monetize YouTube Channels with AI Voices
      How to Monetize YouTube Channels with AI Voices
      Arrow
    • Multilingual Voice API: Bridging Communication Gaps in a Diverse World
      Multilingual Voice API: Bridging Communication Gaps in a Diverse World
      Arrow
    • Resemble.AI vs ElevenLabs: A Comprehensive Comparison
      Resemble.AI vs ElevenLabs: A Comprehensive Comparison
      Arrow
    • Apps to Read PDFs on Mobile and Desktop
      Apps to Read PDFs on Mobile and Desktop
      Arrow
    • How to Convert a PDF to an Audiobook: A Step-by-Step Guide
      How to Convert a PDF to an Audiobook: A Step-by-Step Guide
      Arrow
    • AI for Translation: Bridging Language Barriers
      AI for Translation: Bridging Language Barriers
      Arrow
    • IVR Conversion Tool: A Comprehensive Guide for Healthcare Providers
      IVR Conversion Tool: A Comprehensive Guide for Healthcare Providers
      Arrow
    • Best AI Speech to Speech Tools
      Best AI Speech to Speech Tools
      Arrow
    • AI Voice Recorder: Everything You Need to Know
      AI Voice Recorder: Everything You Need to Know
      Arrow
    • The Best Multilingual AI Speech Models
      The Best Multilingual AI Speech Models
      Arrow
    • Program that will Read PDF Aloud: Yes it Exists
      Program that will Read PDF Aloud: Yes it Exists
      Arrow
    • How to Convert Your Emails to an Audiobook: A Step-by-Step Tutorial
      How to Convert Your Emails to an Audiobook: A Step-by-Step Tutorial
      Arrow
    • How to Convert iOS Files to an Audiobook
      How to Convert iOS Files to an Audiobook
      Arrow
    • How to Convert Google Docs to an Audiobook
      How to Convert Google Docs to an Audiobook
      Arrow
    • How to Convert Word Docs to an Audiobook
      How to Convert Word Docs to an Audiobook
      Arrow
    • Alternatives to Deepgram Text to Speech API
      Alternatives to Deepgram Text to Speech API
      Arrow
    • Is Text to Speech HSA Eligible?
      Is Text to Speech HSA Eligible?
      Arrow
    • Can You Use an HSA for Speech Therapy?
      Can You Use an HSA for Speech Therapy?
      Arrow
    • Surprising HSA-Eligible Items
      Surprising HSA-Eligible Items
      Arrow
    • Ultimate guide to ElevenLabs
      Ultimate guide to ElevenLabs
      Arrow
    • Voice changer for Discord
      Voice changer for Discord
      Arrow
    • How to download YouTube audio
      How to download YouTube audio
      Arrow
    • Speechify 3.0 Released.
      Speechify 3.0 is the Best Text to Speech App Yet.
      Arrow
    • Speechify 3.0 Released.
      The Best Celebrity Voice Generators in 2024
      Arrow
    • Speechify 3.0 Released.
      YouTube Text to Speech: Elevating Your Video Content with Speechify
      Arrow
    • Speechify 3.0 Released.
      The 7 best alternatives to Synthesia.io
      Arrow
    • Speechify 3.0 Released.
      Everything you need to know about text to speech on TikTok
      Arrow
    • Speechify 3.0 Released.
      The 10 best text-to-speech apps for Android
      Arrow
    • Speechify 3.0 Released.
      How to convert a PDF to speech
      Arrow
    • Speechify 3.0 Released.
      The top girl voice changers
      Arrow
    • Speechify 3.0 Released.
      How to use Siri text to speech
      Arrow
    • Speechify 3.0 Released.
      Obama text to speech
      Arrow
    • Speechify 3.0 Released.
      Robot Voice Generators: The Futuristic Frontier of Audio Creation
      Arrow
    • Speechify 3.0 Released.
      PDF Read Aloud: Free & Paid Options
      Arrow
    • Speechify 3.0 Released.
      Alternatives to FakeYou text to speech
      Arrow
    • Speechify 3.0 Released.
      All About Deepfake Voices
      Arrow
    • Speechify 3.0 Released.
      TikTok voice generator
      Arrow
    • Speechify 3.0 Released.
      Text to speech GoAnimate
      Arrow
    • Speechify 3.0 Released.
      The best celebrity text to speech voice generators
      Arrow
    • Speechify 3.0 Released.
      PDF Audio Reader
      Arrow
    • Speechify 3.0 Released.
      How to get text to speech Indian voices
      Arrow
    • Speechify 3.0 Released.
      Elevating Your Anime Experience with Anime Voice Generators
      Arrow
    • Speechify 3.0 Released.
      Best text to speech online
      Arrow
    • Speechify 3.0 Released.
      Top 50 movies based on books you should read
      Arrow
    • Speechify 3.0 Released.
      Download audio
      Arrow
    • Speechify 3.0 Released.
      How to use text-to-speech for Quandale Dingle meme sounds
      Arrow
    • Speechify 3.0 Released.
      Top 5 apps that read out text
      Arrow
    • Speechify 3.0 Released.
      The top female text to speech voices
      Arrow
    • Speechify 3.0 Released.
      Female voice changer
      Arrow
    • Speechify 3.0 Released.
      Sonic text to speech voice generator online
      Arrow
    • Speechify 3.0 Released.
      Best AI voice generators – The Ultimate List
      Arrow
    • Speechify 3.0 Released.
      Voice changer
      Arrow
    • Speechify 3.0 Released.
      Text to speech in Powerpoint
      Arrow
    footer-waves