1. Home
  2. TTS
  3. gtts
Updated on TTS

gtts

Cliff Weitzman

Cliff Weitzman

CEO/Founder of Speechify

apple logo2025 Apple Design Award
50M+ Users

What is gTTS?

gTTS is an open-source Python library and command-line tool that turns text into spoken MP3 audio by calling Google Translate's text-to-speech endpoint. You can write the output to a file, to a file-like object for further audio processing, or straight to stdout. It's authored by Pierre Nicolas Durette, distributed under the MIT license, and is one of the most downloaded TTS packages on PyPI with roughly 175,000 weekly downloads at the time of writing. If you've ever needed to turn a string into an MP3 in three lines of Python, gTTS is probably the first result you found.

But it’s important to note that gTTS is not Google Cloud Text-to-Speech. It talks to the same undocumented backend that powers the little "Listen" button in Google Translate. That distinction shapes everything below,  what gTTS is great at, where it breaks, and when you should reach for something else.

gTTS

When Should You Use gTTS?

Use gTTS if you need free, fast prototyping; a one-liner to generate MP3 files from text; multilingual demos; a hobby project, classroom example, or accessibility script that reads a Google Docs export aloud. Don't use gTTS if you need production reliability, a documented SLA, voice cloning, SSML control, neural or expressive voices, streaming audio, or unambiguous commercial licensing.

How Does gTTS work?

gTTS does not synthesize speech locally. It builds a request to the same backend that powers Google Translate's "Listen" feature, downloads the resulting MP3, and hands you the bytes. That means you need an active internet connection because there is no offline mode, and the audio is generated on Google's servers, not your machine. The endpoint is also unofficial. The project is not affiliated with Google or Google Cloud, and upstream changes can break it without warning.

Installation

bash

pip install gTTS

gTTS requires Python 3.7 or newer and works on macOS, Windows, and Linux. The current PyPI release is 2.5.4 (November 2024). On Debian-based systems, including Raspberry Pi OS, note the case mismatch: the pip package is gTTS, while the apt package is python3-gtts. If pip install fails with an  externally-managed-environment error on a recent OS, install into a virtual environment instead.

Basic Usage

The minimum viable example:

python

from gtts import gTTS
tts = gTTS("Hello, world.")tts.save("hello.mp3")

From the Command Line:

bash

gtts-cli "hello" --output hello.mp3

Choosing a Language and Accent

python

tts = gTTS("Bonjour le monde", lang="fr")tts.save("bonjour.mp3")

gTTS also exposes regional sub-tags through the tld parameter — for example 

tld="co.uk" for a British English accent or  tld="ca" for a Canadian French accent — by routing the request through different Google Translate top-level domains.

Slow Mode

python

tts = gTTS("Read this slowly.", lang="en", slow=True)tts.save("slow.mp3")

That's effectively the entire speech-controls surface area. There is no pitch parameter, no rate slider beyond slow=True, no per-voice selection, and no SSML.

Stream to a Buffer Instead of Disk

python

from io import BytesIOfrom gtts import gTTS
buf = BytesIO()gTTS("Stream me").write_to_fp(buf)buf.seek(0)# now feed buf into pydub, ffmpeg, a web response, etc.

Pre-processing and Long Text

One of gTTS's better-engineered features is its tokenizer. It splits arbitrarily long input into chunks the backend will accept (the upstream endpoint caps each request at around 100 characters), preserves intonation across the seams, and handles abbreviations, decimals, and other punctuation edge cases. You can also plug in custom pre-processors to fix recurring pronunciation problems — for example, mapping product names or acronyms to phonetic spellings.

What are the Pros of gTTS?

gTTS (Google Text-to-Speech) is popular among developers because it is lightweight, easy to implement, and integrates well into Python workflows. It can generate MP3 audio files and save output directly to files, file-like objects, or stdout, making it flexible for automation and scripting projects. With support for around 60 languages and multiple dialect variants through language and top-level domain settings, it provides broad multilingual coverage for simple applications. Developers also benefit from its command-line interface (gtts-cli), which works smoothly with shell scripts, along with customizable tokenizers and pre-processors for handling abbreviations, numbers, and text substitutions. Its minimal Python API makes it straightforward to add speech functionality to Jupyter notebooks, Flask apps, Discord bots, and other lightweight projects without a steep learning curve.

What are the Cons of gTTS?

Despite its simplicity, gTTS has notable limitations compared to modern AI voice platforms. The voices are based on standard Google Translate speech output, meaning they sound functional but lack the natural intonation, emotion, and realism of newer neural text to speech systems. Users cannot choose between multiple voice styles within a language, and there are no advanced controls such as SSML support, pitch adjustment, or precise speech rate customization. gTTS also requires downloading the full MP3 before playback rather than supporting real-time streaming, which can increase latency for interactive applications. Additionally, because every request depends on an internet connection and requires a network call, gTTS cannot operate offline, making it less suitable for environments where reliability or low-latency speech generation is critical.

What are the Limitations of gTTS for Developers?

1. Rate limiting on an undocumented endpoint

This is the single biggest gotcha for anyone moving past "hello world." gTTS doesn't publish a usage quota because the upstream service doesn't either. In practice, a single IP can usually push a few tens of thousands of characters per hour before Google starts returning HTTP 429s, and the exact ceiling varies with traffic patterns. If your app generates audio for many users from one server, you will eventually hit those limits with no SLA to appeal to.

2. The endpoint can change without warning

Because gTTS targets an internal Google Translate route rather than a versioned public API, Google can and historically has broken gTTS overnight by changing request signatures or response shapes. The maintainer ships a fix, you 

pip install -U gTTS, and life goes on. That's fine for a hobby script. It is not fine for a production deploy at 2 a.m.

3. Maintenance cadence

The project still ships releases, at least one in the past 12 months, but issue triage is slow and the bus factor is essentially one person. Some package-health trackers classify the repo as "inactive." For a free MIT-licensed library, that's normal; as a load-bearing dependency in a paid product, it's worth thinking about.

4. Commercial and TOS ambiguity

Because gTTS hits Google Translate's frontend instead of Google Cloud TTS, the licensing of the generated audio for commercial use is not clearly spelled out anywhere. The library itself is MIT-licensed; the audio bytes you receive are governed by Google's terms for a service that isn't formally exposed as a TTS API. If your legal team needs a clean answer, gTTS won't give them one.

5. Sensitive data leaves your machine

Every string you synthesize is sent to Google's servers. If you're voicing internal documents, customer PII, or content pulled out of Google Docs and other knowledge stores, that's a data-governance question worth answering before you ship.

What is the Difference Between gTTS vs. Google Cloud Text-to-Speech?

While gTTS and Google Cloud Text-to-Speech are often confused, they are not the same product. The differences are as follows: 


gTTS

Google Cloud TTS


Endpoint

Undocumented Google Translate route

Versioned, documented public API

Auth

None

Service account / API key

Cost

Free

Paid (per character)

Voices

One per language

Neural (WaveNet, Studio, Chirp)

SSML

No

Yes

SLA

None

Published SLA

Commercial use

Ambiguous

Explicitly licensed

If you need the Google Voice in production, you almost certainly want Google Cloud TTS, not gTTS.

When Should You Upgrade to a Professional TTS API?

The right time to move from gTTS to a professional text to speech API depends on how critical audio quality, reliability, and customization are to your project. gTTS works well for prototypes, portfolio projects, personal accessibility tools, educational demos, and lightweight experiments because it is simple, free, and easy to implement. However, if you are launching a product for paying customers, relying on speech quality as part of your user experience, or need predictable latency backed by service-level agreements, a professional solution becomes more important. Upgrading also makes sense when you need advanced capabilities such as multiple voice options, voice cloning, SSML support, streaming audio, detailed control over pacing and pronunciation, or clear commercial licensing terms for legal and business requirements. As projects move from experimentation to production, these features often shift from being optional to essential.

Should you Choose gTTS or Speechify's API? 

Speechify's text to speech API is an officially supported, paid service with neural voices, multiple voice options per language, SSML support, and commercial licensing baked into the contract, not a wrapper around an undocumented endpoint. If gTTS's rate limits, voice quality, or TOS ambiguity are starting to block you, that's the kind of migration path worth evaluating.

FAQ

Is gTTS free to use? 

Yes, gTTS is a free, MIT-licensed Python library, but for commercial-grade, licensed audio you'll want a paid service like the Speechify API.

Does gTTS work offline? 

No, gTTS requires an internet connection because it calls Google's servers, and the same is true of the Speechify API, which is a cloud service.

Can I use gTTS in a commercial product? 

The licensing of gTTS output for commercial use is ambiguous since it relies on an undocumented Google endpoint, whereas the Speechify API provides explicit commercial licensing.

How do I change voices in gTTS? 

You can't really. gTTS gives you one voice per language, while the Speechify API offers a catalog of neural voices to choose from.

Does gTTS support SSML? 

No, gTTS has no SSML support, no pitch control, and no fine-grained rate control, but the Speechify API supports SSML for full prosody control.

Why is gTTS returning HTTP 429 errors? 

You've hit Google Translate's undocumented rate limit, which is a common reason developers migrate to a service with a real SLA like the Speechify API.

Is gTTS the same as Google Cloud Text-to-Speech? 

No, gTTS wraps an unofficial Google Translate endpoint, while Google Cloud TTS is a separate paid product, and the Speechify API is another paid alternative with neural voices.

What's the best Python TTS library for production? 

gTTS is fine for prototypes but not production; for production workloads most developers move to a paid API such as the Speechify API.

Can gTTS clone a voice? 

No, voice cloning is not supported in gTTS, but it is available through the Speechify API.

How do I stream audio with gTTS? 

gTTS doesn't support real-time streaming, it returns a completed MP3, so for low-latency streaming use the Speechify API instead.


Enjoy the most advanced AI voices, unlimited files, and 24/7 support

Try For Free
tts banner for blog

Share This Article

Cliff Weitzman

Cliff Weitzman

CEO/Founder of Speechify

Cliff Weitzman is a dyslexia advocate and the CEO and founder of Speechify, the #1 text-to-speech app in the world, totaling over 100,000 5-star reviews and ranking first place in the App Store for the News & Magazines category. In 2017, Weitzman was named to the Forbes 30 under 30 list for his work making the internet more accessible to people with learning disabilities. Cliff Weitzman has been featured in EdSurge, Inc., PC Mag, Entrepreneur, Mashable, among other leading outlets.

speechify logo

About Speechify

#1 Text to Speech Reader

Speechify is the world’s leading text to speech platform, trusted by over 50 million users and backed by more than 500,000 five-star reviews across its text to speech iOS, Android, Chrome Extension, web app, and Mac desktop apps. In 2025, Apple awarded Speechify the prestigious Apple Design Award at WWDC, calling it “a critical resource that helps people live their lives.” Speechify offers 1,000+ natural-sounding voices in 60+ languages and is used in nearly 200 countries. Celebrity voices include Snoop Dogg and Gwyneth Paltrow. For creators and businesses, Speechify Studio provides advanced tools, including AI Voice Generator, AI Voice Cloning, AI Dubbing, and its AI Voice Changer. Speechify also powers leading products with its high-quality, cost-effective text to speech API. Featured in The Wall Street Journal, CNBC, Forbes, TechCrunch, and other major news outlets, Speechify is the largest text to speech provider in the world. Visit speechify.com/news, speechify.com/blog, and speechify.com/press to learn more.