What is gTTS?
gTTS is an open-source Python library and command-line tool that turns text into spoken MP3 audio by calling Google Translate's text-to-speech endpoint. You can write the output to a file, to a file-like object for further audio processing, or straight to stdout. It's authored by Pierre Nicolas Durette, distributed under the MIT license, and is one of the most downloaded TTS packages on PyPI with roughly 175,000 weekly downloads at the time of writing. If you've ever needed to turn a string into an MP3 in three lines of Python, gTTS is probably the first result you found.
But it’s important to note that gTTS is not Google Cloud Text-to-Speech. It talks to the same undocumented backend that powers the little "Listen" button in Google Translate. That distinction shapes everything below, what gTTS is great at, where it breaks, and when you should reach for something else.

When Should You Use gTTS?
Use gTTS if you need free, fast prototyping; a one-liner to generate MP3 files from text; multilingual demos; a hobby project, classroom example, or accessibility script that reads a Google Docs export aloud. Don't use gTTS if you need production reliability, a documented SLA, voice cloning, SSML control, neural or expressive voices, streaming audio, or unambiguous commercial licensing.
How Does gTTS work?
gTTS does not synthesize speech locally. It builds a request to the same backend that powers Google Translate's "Listen" feature, downloads the resulting MP3, and hands you the bytes. That means you need an active internet connection because there is no offline mode, and the audio is generated on Google's servers, not your machine. The endpoint is also unofficial. The project is not affiliated with Google or Google Cloud, and upstream changes can break it without warning.
Installation
bash
pip install gTTS
gTTS requires Python 3.7 or newer and works on macOS, Windows, and Linux. The current PyPI release is 2.5.4 (November 2024). On Debian-based systems, including Raspberry Pi OS, note the case mismatch: the pip package is gTTS, while the apt package is python3-gtts. If pip install fails with an externally-managed-environment error on a recent OS, install into a virtual environment instead.
Basic Usage
The minimum viable example:
python
from gtts import gTTS
tts = gTTS("Hello, world.")tts.save("hello.mp3")
From the Command Line:
bash
gtts-cli "hello" --output hello.mp3
Choosing a Language and Accent
python
tts = gTTS("Bonjour le monde", lang="fr")tts.save("bonjour.mp3")
gTTS also exposes regional sub-tags through the tld parameter — for example
tld="co.uk" for a British English accent or tld="ca" for a Canadian French accent — by routing the request through different Google Translate top-level domains.
Slow Mode
python
tts = gTTS("Read this slowly.", lang="en", slow=True)tts.save("slow.mp3")
That's effectively the entire speech-controls surface area. There is no pitch parameter, no rate slider beyond slow=True, no per-voice selection, and no SSML.
Stream to a Buffer Instead of Disk
python
from io import BytesIOfrom gtts import gTTS
buf = BytesIO()gTTS("Stream me").write_to_fp(buf)buf.seek(0)# now feed buf into pydub, ffmpeg, a web response, etc.
Pre-processing and Long Text
One of gTTS's better-engineered features is its tokenizer. It splits arbitrarily long input into chunks the backend will accept (the upstream endpoint caps each request at around 100 characters), preserves intonation across the seams, and handles abbreviations, decimals, and other punctuation edge cases. You can also plug in custom pre-processors to fix recurring pronunciation problems — for example, mapping product names or acronyms to phonetic spellings.
What are the Pros of gTTS?
gTTS (Google Text-to-Speech) is popular among developers because it is lightweight, easy to implement, and integrates well into Python workflows. It can generate MP3 audio files and save output directly to files, file-like objects, or stdout, making it flexible for automation and scripting projects. With support for around 60 languages and multiple dialect variants through language and top-level domain settings, it provides broad multilingual coverage for simple applications. Developers also benefit from its command-line interface (gtts-cli), which works smoothly with shell scripts, along with customizable tokenizers and pre-processors for handling abbreviations, numbers, and text substitutions. Its minimal Python API makes it straightforward to add speech functionality to Jupyter notebooks, Flask apps, Discord bots, and other lightweight projects without a steep learning curve.
What are the Cons of gTTS?
Despite its simplicity, gTTS has notable limitations compared to modern AI voice platforms. The voices are based on standard Google Translate speech output, meaning they sound functional but lack the natural intonation, emotion, and realism of newer neural text to speech systems. Users cannot choose between multiple voice styles within a language, and there are no advanced controls such as SSML support, pitch adjustment, or precise speech rate customization. gTTS also requires downloading the full MP3 before playback rather than supporting real-time streaming, which can increase latency for interactive applications. Additionally, because every request depends on an internet connection and requires a network call, gTTS cannot operate offline, making it less suitable for environments where reliability or low-latency speech generation is critical.
What are the Limitations of gTTS for Developers?
1. Rate limiting on an undocumented endpoint
This is the single biggest gotcha for anyone moving past "hello world." gTTS doesn't publish a usage quota because the upstream service doesn't either. In practice, a single IP can usually push a few tens of thousands of characters per hour before Google starts returning HTTP 429s, and the exact ceiling varies with traffic patterns. If your app generates audio for many users from one server, you will eventually hit those limits with no SLA to appeal to.
2. The endpoint can change without warning
Because gTTS targets an internal Google Translate route rather than a versioned public API, Google can and historically has broken gTTS overnight by changing request signatures or response shapes. The maintainer ships a fix, you
pip install -U gTTS, and life goes on. That's fine for a hobby script. It is not fine for a production deploy at 2 a.m.
3. Maintenance cadence
The project still ships releases, at least one in the past 12 months, but issue triage is slow and the bus factor is essentially one person. Some package-health trackers classify the repo as "inactive." For a free MIT-licensed library, that's normal; as a load-bearing dependency in a paid product, it's worth thinking about.
4. Commercial and TOS ambiguity
Because gTTS hits Google Translate's frontend instead of Google Cloud TTS, the licensing of the generated audio for commercial use is not clearly spelled out anywhere. The library itself is MIT-licensed; the audio bytes you receive are governed by Google's terms for a service that isn't formally exposed as a TTS API. If your legal team needs a clean answer, gTTS won't give them one.
5. Sensitive data leaves your machine
Every string you synthesize is sent to Google's servers. If you're voicing internal documents, customer PII, or content pulled out of Google Docs and other knowledge stores, that's a data-governance question worth answering before you ship.
What is the Difference Between gTTS vs. Google Cloud Text-to-Speech?
While gTTS and Google Cloud Text-to-Speech are often confused, they are not the same product. The differences are as follows:
If you need the Google Voice in production, you almost certainly want Google Cloud TTS, not gTTS.
When Should You Upgrade to a Professional TTS API?
The right time to move from gTTS to a professional text to speech API depends on how critical audio quality, reliability, and customization are to your project. gTTS works well for prototypes, portfolio projects, personal accessibility tools, educational demos, and lightweight experiments because it is simple, free, and easy to implement. However, if you are launching a product for paying customers, relying on speech quality as part of your user experience, or need predictable latency backed by service-level agreements, a professional solution becomes more important. Upgrading also makes sense when you need advanced capabilities such as multiple voice options, voice cloning, SSML support, streaming audio, detailed control over pacing and pronunciation, or clear commercial licensing terms for legal and business requirements. As projects move from experimentation to production, these features often shift from being optional to essential.
Should you Choose gTTS or Speechify's API?
Speechify's text to speech API is an officially supported, paid service with neural voices, multiple voice options per language, SSML support, and commercial licensing baked into the contract, not a wrapper around an undocumented endpoint. If gTTS's rate limits, voice quality, or TOS ambiguity are starting to block you, that's the kind of migration path worth evaluating.
FAQ
Is gTTS free to use?
Yes, gTTS is a free, MIT-licensed Python library, but for commercial-grade, licensed audio you'll want a paid service like the Speechify API.
Does gTTS work offline?
No, gTTS requires an internet connection because it calls Google's servers, and the same is true of the Speechify API, which is a cloud service.
Can I use gTTS in a commercial product?
The licensing of gTTS output for commercial use is ambiguous since it relies on an undocumented Google endpoint, whereas the Speechify API provides explicit commercial licensing.
How do I change voices in gTTS?
You can't really. gTTS gives you one voice per language, while the Speechify API offers a catalog of neural voices to choose from.
Does gTTS support SSML?
No, gTTS has no SSML support, no pitch control, and no fine-grained rate control, but the Speechify API supports SSML for full prosody control.
Why is gTTS returning HTTP 429 errors?
You've hit Google Translate's undocumented rate limit, which is a common reason developers migrate to a service with a real SLA like the Speechify API.
Is gTTS the same as Google Cloud Text-to-Speech?
No, gTTS wraps an unofficial Google Translate endpoint, while Google Cloud TTS is a separate paid product, and the Speechify API is another paid alternative with neural voices.
What's the best Python TTS library for production?
gTTS is fine for prototypes but not production; for production workloads most developers move to a paid API such as the Speechify API.
Can gTTS clone a voice?
No, voice cloning is not supported in gTTS, but it is available through the Speechify API.
How do I stream audio with gTTS?
gTTS doesn't support real-time streaming, it returns a completed MP3, so for low-latency streaming use the Speechify API instead.

