Everything to Know About Google Cloud Text to Speech API

Featured in

    Generative AI and Artificial intelligence has come a long way. Text to speech is a relatively older concept, it’s been around for a while. There’s much to unpack here and categorise and I’ll break it down and look at this from all angles. Whether you are a beginner or a pro, this should bring overall clarity to the Google Text to Speech API.

    Okay, before we dive into any topic, it’s a must that we establish the ground rules. Let’s define a few terms and build up our foundation so we can rest firmly on it.

    Let’s separate the two technologies here; text to speech and APIs, and what’s the role of Google Cloud.

    Editors note: Looking for the leading text to speech API? Check out Speechify’s well documented and easy to use text to speech API.

    Text to Speech

    I’ve written extensively on this topic and you can read my What is text to speech blog and also read up on speech synthesis to get a firm grasp on this topic. These go more in depth and you can skip them for now. I’ll summarise them in a few sentences.

    Text to speech relies on a technology called speech synthesis to convert words into AI generated speech. The use cases for this are abundant. From helping people with reading barriers such as dyslexia and bad vision to those simply blazing the efficiency trail.

    API

    API stands for Application Programming Interface. It simply acts as a bridge between two applications. If you were developing an app that had audio content and required text to speech functionality then you would have to build the text to speech functionality yourself, or you could simply connect to an existing text to speech API.

    You would focus on building your app and rely on a third-party API as a bridge, to import the text to speech functionality to synthesize your text.

    Google Cloud API

    This is where Google Cloud comes into play. Google has developed a robust text to speech API and offers it up to developers in various fee structures. Any developer looking to build custom apps or web apps that require text to speech functionality could simply bridge that gap by using Google’s TTS features. Yes, TTS is short for text to speech.

    Find the quickstart at Google Cloud Console https://cloud.google.com/. You can find tutorials, manage your service account, access wavenet voices and more.

    Google Cloud itself is a cloud platform offered by Google and it offers a host of modular services. You can choose to use one, many, or all of its services. All you’d need to do is to create access keys for authentication of each API – the bridge. Most, if not all, services come with a cost though there might be a free threshold.

    Google bought DeepMind in 2014 for its text to speech technology and work in neural network development. So, if you come across DeepMind, it is now Google DeepMind and they are all one and the same.

    Now that we have a solid understanding, let’s dive deep into the Google Cloud Text to Speech API.

    Google Text to Speech API Features

    Google is a global tech pioneer and leader, there’s no doubt about that. When it comes to the TTS API, you can expect to find world class features that continue to evolve.

    High Fidelity Speech

    Google’s text to speech voices are some of the best in the industry. They sound very human like and with natural sounding intonation. TTS is in its earliest stages and those that can best synthesize audio to sound like a human is speaking is going to win this race.

    Selection of Voices

    Google claims the widest selection of voices so your project does not have to sound the same as the other 1000 out there or worse yet, your competitors’ app.

    Create Your Own Voice

    This borders voice cloning tech. You can create your custom voice by recording you or someone else, with their permission. You can then use this sample to be the voice that reads aloud all your text.

    Neural Voices

    Neural voices offer the best quality among the vast selection of voices. You can also internationalize these voices to grow your international audience.

    Studio Voices

    Studio voices are more top of the line voices and they sound very professional as if they were recorded the traditional method.

    Voice Tuning

    Pick a voice and then adjust the speed, the pitch, and more so that you can customise the tone or a voice.

    How much does the Google Text to Speech API Cost?

    It all comes down to voice quality and the length of your text. The more natural sounding you want your voice to be, the more expensive it will be. Though, expensive is relative here. Even the high quality voices are relatively inexpensive.

    Voice typeFree per monthAfter free usage has been reachedd
    Neural2 voices0 to 1 million bytes$16 per one million bytes
    Polyglot voices0 to 1 million bytes$16 per one million bytes
    Studio voices0 to 100,000 bytes$160 per one million bytes
    Standard voices0 to 4 million characters$4 per one million characters
    Wavenet voices0 to 1 million characters$16 per one million characters

    What’s the Difference Between Characters & Bytes

    As you can see, the pricing varies significantly based on the quality of the voice. The audioencoding and processing it takes to turn text into speech varies from tier to tier. For the lower, the Standard Voices for example, the pricing is lower and is counted by characters.

    This means, if your project has 4 million characters, it would cost you $16 to convert those characters into speech using the Standard Characters.

    The Studio Voices on the other hand require greater processing power and are charged based on bytes. In some languages, like Japanese for example, a single character could be composed of multiple bytes.

    So for the most accurate pricing it’s important to know which language you are working on and a basic understanding of an average amount of bytes for each character and estimate that accordingly.

    How to Setup Your Google Cloud Platform Text to Speech API Project?

    1. Create Google Cloud account or login at this page
    2. Create a new project and name it appropriately
    3. Add a billing method. You will only get charged for what you use.
    4. Then choose your project and associate it with a billing account.
    5. Activate the Text-to-Speech API. Go to the search products and resources bar located at the top of the page, and type in “speech.”
    6. From the displayed results, choose the Cloud Text-to-Speech API
    7. Set up authentication for your development environment. For instructions, see Set up authentication for Text-to-Speech.

    You can also try Text-to-Speech without linking it to your project:

    1. Choose the TRY THIS API option.
    2. To enable the Text-to-Speech API for use with your project, click ENABLE.

    Check out the Google Cloud Documentation for further help.

    How to Disable the Text to Speech API

    To deactivate the Text-to-Speech API, go to your Google Cloud Platform dashboard and click on the “Go to APIs overview” link within the APIs box. Locate the Text-to-Speech API and then click on it, followed by selecting the “DISABLE API” button at the top of the page.

    Get Started with Google Text to Speech API

    Now that you have your project set up, you can use command line to get started.

    gcloud init

    Create local authentication

    gcloud auth application-default login

    Now you can install a client library. In this example, we’ll look at Node.js

    npm install --save @google-cloud/text-to-speech

    Google Cloud Text to Speech API Supports a These Languages:

    1. Go
    2. Java
    3. Node.js
    4. C++
    5. C#
    6. PHP
    7. Python
    8. Ruby
    9. TypeScript
    10. Terraform
    11. YAML

    How Does the Google Cloud API Work?

    It all begins with a simple API call. You would send your text in a transcript call and then you would receive an audio file of your spoken text. With your request, you can make specific requirements. Choose a voice, a language, and more and then the text to speech API will send you back the audio file.

    You can learn how to install and use the text to speech client libraries here. Our code samples will be for Node.js. But you can choose anything else from Python to PHP. Whatever you are comfortable with.

    const textToSpeech = require('@google-cloud/text-to-speech');
    const fs = require('fs');
    const util = require('util');
    
    const client = new textToSpeech.TextToSpeechClient();
    
    /**
     * TODO(developer): Uncomment the following lines before running the sample.
     */
    // const text = 'Text to synthesize, eg. hello';
    // const outputFile = 'Local path to save audio file to, e.g. output.mp3';
    
    const request = {
      input: {text: text},
      voice: {languageCode: 'en-US', ssmlGender: 'FEMALE'},
      audioConfig: {audioEncoding: 'MP3'},
    };
    const [response] = await client.synthesizeSpeech(request);
    const writeFile = util.promisify(fs.writeFile);
    await writeFile(outputFile, response.audioContent, 'binary');
    console.log(`Audio content written to file: ${outputFile}`);

    And that’s it. You set up Google Cloud Text to Speech API and sent your first request to convert text to speech. You can get the file back in various formats; from OGG to MP3.

    Here are a Few Ways to Use the Google Text to Speech API

    The Google Text-to-Speech (TTS) API offers a versatile solution for various use cases across different industries. Some common use cases include:

    1. Text-to-Speech for Visually Impaired Users: Implementing TTS in applications to convert written content into spoken words, making digital information accessible for visually impaired users.
    2. Automated Phone Systems: Utilizing TTS to create natural-sounding prompts and responses for interactive voice response systems in customer service or information hotlines.
    3. Voiceovers for Media Content: Generating natural-sounding voiceovers for videos, podcasts, or other multimedia content to enhance user experience.
    4. Text-to-Speech for Translated Content: Converting translated text into spoken words to facilitate language learning, international communication, or content consumption in various languages.
    5. Reading Assistance for Dyslexic Users: Providing TTS functionality to assist individuals with dyslexia or reading difficulties in consuming written content.
    6. Voice Navigation in Applications: Integrating TTS into navigation applications to provide turn-by-turn directions or location-based information audibly.
    7. Text-to-Speech for Educational Content: Enhancing e-learning experiences by converting educational text content into spoken words, aiding comprehension and engagement.
    8. Speech Synthesis for Productivity Apps: Integrating TTS into productivity tools, such as note-taking or task management apps, to enable spoken feedback or information retrieval.
    9. Natural Voice for Virtual Assistants: Powering voice assistants with natural-sounding TTS to improve user interactions and provide information in a conversational manner.
    10. Auditory Alerts and Notifications: Using TTS to provide audible alerts, notifications, or status updates on Internet of Things (IoT) devices for enhanced user awareness.

    Best Alternatives to Google Cloud TTS API

    As of my last knowledge update in January 2022, there are several alternatives to the Google Text-to-Speech API. Keep in mind that the popularity and capabilities of these services may have changed since then. Here are some notable alternatives:

    1. Speechify Text to Speech API: We’re thrilled to unveil the development of a text-to-speech API that delivers Speechify’s most natural and beloved AI voices directly to developers worldwide. Save your seat today.
    2. Amazon Polly: Offered by Amazon Web Services (AWS), Polly provides natural-sounding speech synthesis in various languages and voices. It integrates well with other AWS services.
    3. Microsoft Azure Speech Service: Azure Speech Service includes Text-to-Speech capabilities and supports a variety of applications, including voice assistants, navigation systems, and more.
    4. IBM Watson Text to Speech: IBM Watson offers a Text to Speech service that allows developers to convert written text into natural-sounding speech using various voices.
    5. Nuance Communications: Nuance provides a range of speech and voice recognition solutions, including text-to-speech, for applications in healthcare, automotive, and customer service.
    6. CereProc: CereProc is a text-to-speech technology company that offers high-quality synthetic voices for applications like accessibility, entertainment, and communication.
    7. iSpeech: iSpeech provides cloud-based text-to-speech services with support for multiple languages and voices. It is suitable for various applications, including mobile apps and websites.
    8. ResponsiveVoice: ResponsiveVoice is a simple and affordable text-to-speech API that supports multiple languages and can be used in various web-based applications.
    9. Neospeech: Neospeech offers text-to-speech solutions with a focus on natural-sounding voices. Their technology is used in applications like e-learning and entertainment.
    10. ReadSpeaker: ReadSpeaker provides online and offline text-to-speech solutions for diverse applications, including websites, e-learning, and accessibility services.
    11. Acapelabox: Acapela Group offers a cloud-based text-to-speech API, Acapelabox, which supports multiple languages and voices for applications in various industries.

    Google Text to Speech API FAQs

    Google does have a multiple tiers of voices and almost each tier has a free limit. For example, the standard voices is free up to the first million bytes. After that it is $16 per million bytes. So yes, it can be free with limited characters or bytes.

    Simply create an account at https://cloud.google.com/text-to-speech/ and follow the steps there. Also, I’ve outlined the process in detail in this blog, just above.

    You can get a google text to speech API key by logging into your Google Cloud account and then create a project. Once you create your project you can generate an API key.

    The URL for Google text to speech API is https://cloud.google.com/text-to-speech/

    There is technically no free trial period for Google Cloud. There are multiple services within Google Cloud and each service has its own terms and free tiers.

    No. The Google Cloud text to speech API requires an internet connection.

    Authentication to Google Cloud services, including the Text-to-Speech API, can be done using API keys, OAuth 2.0, or service accounts. The appropriate authentication method depends on the use case and the type of application.

    I’d rate it 5 stars. It’s easy to use, the search feature is great and is used the most. The pricing is decent and it’s overall a great product.

    Google Text-to-Speech API provides client libraries for various programming languages, including Python. It also supports RESTful API requests, making it compatible with languages that can make HTTP requests.

    Integrating Google Text-to-Speech API into an Android app involves using the TextToSpeech class and making API requests. Detailed instructions can be found in the official documentation for Android developers.

    To implement Google Text-to-Speech API in a JavaScript application, you can make HTTP requests to the API endpoint. The process involves constructing the appropriate API request and handling the response in your JavaScript code. Refer to the official documentation for details.

    Cliff Weitzman

    Cliff Weitzman

    Cliff Weitzman is a dyslexia advocate and the CEO and founder of Speechify, the #1 text-to-speech app in the world, totaling over 100,000 5-star reviews and ranking first place in the App Store for the News & Magazines category. In 2017, Weitzman was named to the Forbes 30 under 30 list for his work making the internet more accessible to people with learning disabilities. Cliff Weitzman has been featured in EdSurge, Inc., PC Mag, Entrepreneur, Mashable, among other leading outlets.

    Dyslexia & Accessibility Advocate, CEO/Founder of Speechify Dyslexia & Accessibility Advocate, CEO/Founder of Speechify

    Recent Blogs

    • AI Speech Recognition: Everything You Should Know
      AI Speech Recognition: Everything You Should Know
      Arrow
    • AI Speech to Text: Revolutionizing Transcription
      AI Speech to Text: Revolutionizing Transcription
      Arrow
    • Real-Time AI Dubbing with Voice Preservation
      Real-Time AI Dubbing with Voice Preservation
      Arrow
    • How to Add Voice Over to Video: A Step-by-Step Guide
      How to Add Voice Over to Video: A Step-by-Step Guide
      Arrow
    • Voice Simulator & Content Creation with AI-Generated Voices
      Voice Simulator & Content Creation with AI-Generated Voices
      Arrow
    • Convert Audio and Video to Text: Transcription Has Never Been Easier.
      Convert Audio and Video to Text: Transcription Has Never Been Easier.
      Arrow
    • How to Record Voice Overs Properly Over Gameplay: Everything You Need to Know
      How to Record Voice Overs Properly Over Gameplay: Everything You Need to Know
      Arrow
    • Voicemail Greeting Generator: The New Way to Engage Callers
      Voicemail Greeting Generator: The New Way to Engage Callers
      Arrow
    • How to Avoid AI Voice Scams
      How to Avoid AI Voice Scams
      Arrow
    • Character AI Voices: Revolutionizing Audio Content with Advanced Technology
      Character AI Voices: Revolutionizing Audio Content with Advanced Technology
      Arrow
    • Best AI Voices for Video Games
      Best AI Voices for Video Games
      Arrow
    • How to Monetize YouTube Channels with AI Voices
      How to Monetize YouTube Channels with AI Voices
      Arrow
    • Multilingual Voice API: Bridging Communication Gaps in a Diverse World
      Multilingual Voice API: Bridging Communication Gaps in a Diverse World
      Arrow
    • Resemble.AI vs ElevenLabs: A Comprehensive Comparison
      Resemble.AI vs ElevenLabs: A Comprehensive Comparison
      Arrow
    • Apps to Read PDFs on Mobile and Desktop
      Apps to Read PDFs on Mobile and Desktop
      Arrow
    • How to Convert a PDF to an Audiobook: A Step-by-Step Guide
      How to Convert a PDF to an Audiobook: A Step-by-Step Guide
      Arrow
    • AI for Translation: Bridging Language Barriers
      AI for Translation: Bridging Language Barriers
      Arrow
    • IVR Conversion Tool: A Comprehensive Guide for Healthcare Providers
      IVR Conversion Tool: A Comprehensive Guide for Healthcare Providers
      Arrow
    • Best AI Speech to Speech Tools
      Best AI Speech to Speech Tools
      Arrow
    • AI Voice Recorder: Everything You Need to Know
      AI Voice Recorder: Everything You Need to Know
      Arrow
    • The Best Multilingual AI Speech Models
      The Best Multilingual AI Speech Models
      Arrow
    • Program that will Read PDF Aloud: Yes it Exists
      Program that will Read PDF Aloud: Yes it Exists
      Arrow
    • How to Convert Your Emails to an Audiobook: A Step-by-Step Tutorial
      How to Convert Your Emails to an Audiobook: A Step-by-Step Tutorial
      Arrow
    • How to Convert iOS Files to an Audiobook
      How to Convert iOS Files to an Audiobook
      Arrow
    • How to Convert Google Docs to an Audiobook
      How to Convert Google Docs to an Audiobook
      Arrow
    • How to Convert Word Docs to an Audiobook
      How to Convert Word Docs to an Audiobook
      Arrow
    • Alternatives to Deepgram Text to Speech API
      Alternatives to Deepgram Text to Speech API
      Arrow
    • Is Text to Speech HSA Eligible?
      Is Text to Speech HSA Eligible?
      Arrow
    • Can You Use an HSA for Speech Therapy?
      Can You Use an HSA for Speech Therapy?
      Arrow
    • Surprising HSA-Eligible Items
      Surprising HSA-Eligible Items
      Arrow
    • Surprising HSA-Eligible Items
      The Best Celebrity Voice Generators in 2024
      Arrow
    • Surprising HSA-Eligible Items
      YouTube Text to Speech: Elevating Your Video Content with Speechify
      Arrow
    • Surprising HSA-Eligible Items
      The 7 best alternatives to Synthesia.io
      Arrow
    • Surprising HSA-Eligible Items
      Everything you need to know about text to speech on TikTok
      Arrow
    • Surprising HSA-Eligible Items
      The 10 best text-to-speech apps for Android
      Arrow
    • Surprising HSA-Eligible Items
      How to convert a PDF to speech
      Arrow
    • Surprising HSA-Eligible Items
      The top girl voice changers
      Arrow
    • Surprising HSA-Eligible Items
      How to use Siri text to speech
      Arrow
    • Surprising HSA-Eligible Items
      Obama text to speech
      Arrow
    • Surprising HSA-Eligible Items
      Robot Voice Generators: The Futuristic Frontier of Audio Creation
      Arrow
    • Surprising HSA-Eligible Items
      PDF Read Aloud: Free & Paid Options
      Arrow
    • Surprising HSA-Eligible Items
      Alternatives to FakeYou text to speech
      Arrow
    • Surprising HSA-Eligible Items
      All About Deepfake Voices
      Arrow
    • Surprising HSA-Eligible Items
      TikTok voice generator
      Arrow
    • Surprising HSA-Eligible Items
      Text to speech GoAnimate
      Arrow
    • Surprising HSA-Eligible Items
      The best celebrity text to speech voice generators
      Arrow
    • Surprising HSA-Eligible Items
      PDF Audio Reader
      Arrow
    • Surprising HSA-Eligible Items
      How to get text to speech Indian voices
      Arrow
    • Surprising HSA-Eligible Items
      Elevating Your Anime Experience with Anime Voice Generators
      Arrow
    • Surprising HSA-Eligible Items
      Best text to speech online
      Arrow
    • Surprising HSA-Eligible Items
      Top 50 movies based on books you should read
      Arrow
    • Surprising HSA-Eligible Items
      Download audio
      Arrow
    • Surprising HSA-Eligible Items
      How to use text-to-speech for Quandale Dingle meme sounds
      Arrow
    • Surprising HSA-Eligible Items
      Top 5 apps that read out text
      Arrow
    • Surprising HSA-Eligible Items
      The top female text to speech voices
      Arrow
    • Surprising HSA-Eligible Items
      Female voice changer
      Arrow
    • Surprising HSA-Eligible Items
      Sonic text to speech voice generator online
      Arrow
    • Surprising HSA-Eligible Items
      Best AI voice generators – The Ultimate List
      Arrow
    • Surprising HSA-Eligible Items
      Voice changer
      Arrow
    • Surprising HSA-Eligible Items
      Text to speech in Powerpoint
      Arrow
    footer-waves