gtts

What is gTTS?

gTTS is an open-source Python library and command-line tool that turns text into spoken MP3 audio by calling Google Translate's text-to-speech endpoint. You can write the output to a file, to a file-like object for further audio processing, or straight to stdout. It's authored by Pierre Nicolas Durette, distributed under the MIT license, and is one of the most downloaded TTS packages on PyPI with roughly 175,000 weekly downloads at the time of writing. If you've ever needed to turn a string into an MP3 in three lines of Python, gTTS is probably the first result you found.

But it’s important to note that gTTS is not Google Cloud Text-to-Speech. It talks to the same undocumented backend that powers the little "Listen" button in Google Translate. That distinction shapes everything below, what gTTS is great at, where it breaks, and when you should reach for something else.

When Should You Use gTTS?

Use gTTS if you need free, fast prototyping; a one-liner to generate MP3 files from text; multilingual demos; a hobby project, classroom example, or accessibility script that reads a Google Docs export aloud. Don't use gTTS if you need production reliability, a documented SLA, voice cloning, SSML control, neural or expressive voices, streaming audio, or unambiguous commercial licensing.

How Does gTTS work?

gTTS does not synthesize speech locally. It builds a request to the same backend that powers Google Translate's "Listen" feature, downloads the resulting MP3, and hands you the bytes. That means you need an active internet connection because there is no offline mode, and the audio is generated on Google's servers, not your machine. The endpoint is also unofficial. The project is not affiliated with Google or Google Cloud, and upstream changes can break it without warning.

Installation

bash

pip install gTTS

gTTS requires Python 3.7 or newer and works on macOS, Windows, and Linux. The current PyPI release is 2.5.4 (November 2024). On Debian-based systems, including Raspberry Pi OS, note the case mismatch: the pip package is gTTS, while the apt package is python3-gtts. If pip install fails with an externally-managed-environment error on a recent OS, install into a virtual environment instead.

Basic Usage

The minimum viable example:

python

from gtts import gTTS
tts = gTTS("Hello, world.")tts.save("hello.mp3")

From the Command Line:

bash

gtts-cli "hello" --output hello.mp3

Choosing a Language and Accent

python

tts = gTTS("Bonjour le monde", lang="fr")tts.save("bonjour.mp3")

gTTS also exposes regional sub-tags through the tld parameter — for example

tld="co.uk" for a British English accent or tld="ca" for a Canadian French accent — by routing the request through different Google Translate top-level domains.

Slow Mode

python

tts = gTTS("Read this slowly.", lang="en", slow=True)tts.save("slow.mp3")

That's effectively the entire speech-controls surface area. There is no pitch parameter, no rate slider beyond slow=True, no per-voice selection, and no SSML.

Stream to a Buffer Instead of Disk

python

from io import BytesIOfrom gtts import gTTS
buf = BytesIO()gTTS("Stream me").write_to_fp(buf)buf.seek(0)# now feed buf into pydub, ffmpeg, a web response, etc.

Pre-processing and Long Text

One of gTTS's better-engineered features is its tokenizer. It splits arbitrarily long input into chunks the backend will accept (the upstream endpoint caps each request at around 100 characters), preserves intonation across the seams, and handles abbreviations, decimals, and other punctuation edge cases. You can also plug in custom pre-processors to fix recurring pronunciation problems — for example, mapping product names or acronyms to phonetic spellings.

What are the Pros of gTTS?

gTTS (Google Text-to-Speech) is popular among developers because it is lightweight, easy to implement, and integrates well into Python workflows. It can generate MP3 audio files and save output directly to files, file-like objects, or stdout, making it flexible for automation and scripting projects. With support for around 60 languages and multiple dialect variants through language and top-level domain settings, it provides broad multilingual coverage for simple applications. Developers also benefit from its command-line interface (gtts-cli), which works smoothly with shell scripts, along with customizable tokenizers and pre-processors for handling abbreviations, numbers, and text substitutions. Its minimal Python API makes it straightforward to add speech functionality to Jupyter notebooks, Flask apps, Discord bots, and other lightweight projects without a steep learning curve.

What are the Cons of gTTS?

Despite its simplicity, gTTS has notable limitations compared to modern AI voice platforms. The voices are based on standard Google Translate speech output, meaning they sound functional but lack the natural intonation, emotion, and realism of newer neural text to speech systems. Users cannot choose between multiple voice styles within a language, and there are no advanced controls such as SSML support, pitch adjustment, or precise speech rate customization. gTTS also requires downloading the full MP3 before playback rather than supporting real-time streaming, which can increase latency for interactive applications. Additionally, because every request depends on an internet connection and requires a network call, gTTS cannot operate offline, making it less suitable for environments where reliability or low-latency speech generation is critical.

What are the Limitations of gTTS for Developers?

1. Rate limiting on an undocumented endpoint

This is the single biggest gotcha for anyone moving past "hello world." gTTS doesn't publish a usage quota because the upstream service doesn't either. In practice, a single IP can usually push a few tens of thousands of characters per hour before Google starts returning HTTP 429s, and the exact ceiling varies with traffic patterns. If your app generates audio for many users from one server, you will eventually hit those limits with no SLA to appeal to.

2. The endpoint can change without warning

Because gTTS targets an internal Google Translate route rather than a versioned public API, Google can and historically has broken gTTS overnight by changing request signatures or response shapes. The maintainer ships a fix, you

pip install -U gTTS, and life goes on. That's fine for a hobby script. It is not fine for a production deploy at 2 a.m.

3. Maintenance cadence

The project still ships releases, at least one in the past 12 months, but issue triage is slow and the bus factor is essentially one person. Some package-health trackers classify the repo as "inactive." For a free MIT-licensed library, that's normal; as a load-bearing dependency in a paid product, it's worth thinking about.

4. Commercial and TOS ambiguity

Because gTTS hits Google Translate's frontend instead of Google Cloud TTS, the licensing of the generated audio for commercial use is not clearly spelled out anywhere. The library itself is MIT-licensed; the audio bytes you receive are governed by Google's terms for a service that isn't formally exposed as a TTS API. If your legal team needs a clean answer, gTTS won't give them one.

5. Sensitive data leaves your machine

Every string you synthesize is sent to Google's servers. If you're voicing internal documents, customer PII, or content pulled out of Google Docs and other knowledge stores, that's a data-governance question worth answering before you ship.

What is the Difference Between gTTS vs. Google Cloud Text-to-Speech?

While gTTS and Google Cloud Text-to-Speech are often confused, they are not the same product. The differences are as follows:

gTTS	Google Cloud TTS
Endpoint	Undocumented Google Translate route	Versioned, documented public API
Auth	None	Service account / API key
Cost	Free	Paid (per character)
Voices	One per language	Neural (WaveNet, Studio, Chirp)
SSML	No	Yes
SLA	None	Published SLA
Commercial use	Ambiguous	Explicitly licensed

If you need the Google Voice in production, you almost certainly want Google Cloud TTS, not gTTS.

When Should You Upgrade to a Professional TTS API?

The right time to move from gTTS to a professional text to speech API depends on how critical audio quality, reliability, and customization are to your project. gTTS works well for prototypes, portfolio projects, personal accessibility tools, educational demos, and lightweight experiments because it is simple, free, and easy to implement. However, if you are launching a product for paying customers, relying on speech quality as part of your user experience, or need predictable latency backed by service-level agreements, a professional solution becomes more important. Upgrading also makes sense when you need advanced capabilities such as multiple voice options, voice cloning, SSML support, streaming audio, detailed control over pacing and pronunciation, or clear commercial licensing terms for legal and business requirements. As projects move from experimentation to production, these features often shift from being optional to essential.

Should you Choose gTTS or Speechify's API?

Speechify's text to speech API is an officially supported, paid service with neural voices, multiple voice options per language, SSML support, and commercial licensing baked into the contract, not a wrapper around an undocumented endpoint. If gTTS's rate limits, voice quality, or TOS ambiguity are starting to block you, that's the kind of migration path worth evaluating.

FAQ

Is gTTS free to use?

Yes, gTTS is a free, MIT-licensed Python library, but for commercial-grade, licensed audio you'll want a paid service like the Speechify API.

Does gTTS work offline?

No, gTTS requires an internet connection because it calls Google's servers, and the same is true of the Speechify API, which is a cloud service.

Can I use gTTS in a commercial product?

The licensing of gTTS output for commercial use is ambiguous since it relies on an undocumented Google endpoint, whereas the Speechify API provides explicit commercial licensing.

How do I change voices in gTTS?

You can't really. gTTS gives you one voice per language, while the Speechify API offers a catalog of neural voices to choose from.

Does gTTS support SSML?

No, gTTS has no SSML support, no pitch control, and no fine-grained rate control, but the Speechify API supports SSML for full prosody control.

Why is gTTS returning HTTP 429 errors?

You've hit Google Translate's undocumented rate limit, which is a common reason developers migrate to a service with a real SLA like the Speechify API.

Is gTTS the same as Google Cloud Text-to-Speech?

No, gTTS wraps an unofficial Google Translate endpoint, while Google Cloud TTS is a separate paid product, and the Speechify API is another paid alternative with neural voices.

What's the best Python TTS library for production?

gTTS is fine for prototypes but not production; for production workloads most developers move to a paid API such as the Speechify API.

Can gTTS clone a voice?

No, voice cloning is not supported in gTTS, but it is available through the Speechify API.

How do I stream audio with gTTS?

gTTS doesn't support real-time streaming, it returns a completed MP3, so for low-latency streaming use the Speechify API instead.

Speechify is the world’s leading text to speech platform, trusted by over 50 million users and backed by more than 500,000 five-star reviews across its text to speech iOS, Android, Chrome Extension, web app, and Mac desktop apps. In 2025, Apple awarded Speechify the prestigious Apple Design Award at WWDC, calling it “a critical resource that helps people live their lives.” Speechify offers 1,000+ natural-sounding voices in 60+ languages and is used in nearly 200 countries. Celebrity voices include Snoop Dogg and Gwyneth Paltrow. For creators and businesses, Speechify Studio provides advanced tools, including AI Voice Generator, AI Voice Cloning, AI Dubbing, and its AI Voice Changer. Speechify also powers leading products with its high-quality, cost-effective text to speech API. Featured in The Wall Street Journal, CNBC, Forbes, TechCrunch, and other major news outlets, Speechify is the largest text to speech provider in the world. Visit speechify.com/news, speechify.com/blog, and speechify.com/press to learn more.

Cliff Weitzman

Speechify, Your Voice AI AssistantText to Speech. Voice Typing. Fast Answers.

What is gTTS?

When Should You Use gTTS?

How Does gTTS work?

Installation

Basic Usage

From the Command Line:

Choosing a Language and Accent

Slow Mode

Stream to a Buffer Instead of Disk

Pre-processing and Long Text

What are the Pros of gTTS?

What are the Cons of gTTS?

What are the Limitations of gTTS for Developers?

1. Rate limiting on an undocumented endpoint

2. The endpoint can change without warning

3. Maintenance cadence

4. Commercial and TOS ambiguity

5. Sensitive data leaves your machine

What is the Difference Between gTTS vs. Google Cloud Text-to-Speech?

When Should You Upgrade to a Professional TTS API?

Should you Choose gTTS or Speechify's API?

FAQ

Is gTTS free to use?

Does gTTS work offline?

Can I use gTTS in a commercial product?

How do I change voices in gTTS?

Does gTTS support SSML?

Why is gTTS returning HTTP 429 errors?

Is gTTS the same as Google Cloud Text-to-Speech?

What's the best Python TTS library for production?

Can gTTS clone a voice?

How do I stream audio with gTTS?

Enjoy the most advanced AI voices, unlimited files, and 24/7 support

Share This Article

Cliff Weitzman

About Speechify

Recommended Posts

Recent Blogs

Top 5 Voice Agent Companies in 2026

Why Speechify Beats DictaFlow on Windows

Why Speechify Beats Balabolka on Windows

Speechify, Your Voice AI Assistant
Text to Speech. Voice Typing. Fast Answers.