Generate professional, studio-quality podcasts from text scripts using high-performance AI voice models.
- About The Project
- β¨ Key Features
- π§ Example Output
- π Getting Started
βΆοΈ Usage- π Project Structure
- π€ Contributing
- π License
- π§ Contact
Podcast AI is an open-source tool designed to automate audio content creation. It leverages the power of Piper TTS, a fast and high-quality neural text-to-speech system, with efficient ONNX models to transform plain text scripts into engaging, ready-to-publish podcasts.
Whether you're a content creator looking to streamline your workflow or a developer interested in AI-powered audio generation, this project provides a solid foundation.
- β High-Fidelity Speech Synthesis: Generates natural-sounding speech from text using state-of-the-art models.
- β Fast & Efficient: Built on Piper TTS and ONNX Runtime for rapid, local inference without relying on cloud APIs.
- β
Customizable Voices: Easily switch between different voice models by providing the corresponding
.onnxand.jsonfiles. - β Audio Assembly: Seamlessly combines generated speech segments, intro/outro music, and sound effects.
- β Cross-Platform: Runs on any system with Python, including Windows, macOS, and Linux.
- β Fully Open-Source: Free to use, modify, and distribute under the MIT License.
Listen to a sample podcast generated with this tool:
β‘οΈ Listen to final_podcast.mp3
The audio was generated from a simple script like this:
(intro_music)
Welcome to AI Spotlight, the show where we explore the latest breakthroughs in artificial intelligence.
(transition_sound)
Today, we're discussing generative models. These models can create brand new content, from text and images to music and even code. It's a revolution in creativity.
(outro_music)Follow these steps to get the project running on your local machine.
- Python 3.8+
pipandvenv(usually included with Python)gitfor cloning the repository
-
Clone the Repository
git clone https://github.com/santhoshsharuk/podcast-ai.git cd podcast-ai -
Download Core Dependencies & Voice Models Large files like the Piper engine and voice models are hosted on GitHub Releases to keep the repository lightweight.
- Go to the Releases Page.
- Download the latest
piper.zipand the desired voice model (e.g.,voice-en-us-amy-medium.zip). - Extract them into the project directory.
-
Organize Your Files Create a
voicesdirectory and place your model files inside. Your project structure should look like this:. βββ assemble_podcast.py βββ script.txt βββ voices/ β βββ en_US-amy-medium/ β βββ en_US-amy-medium.onnx β βββ en_US-amy-medium.onnx.json βββ piper/ β βββ piper β βββ ... (other piper files) -
Create a Virtual Environment & Install Requirements It's best practice to use a virtual environment to manage dependencies.
# Create and activate the virtual environment python -m venv venv source venv/bin/activate # On Windows, use: venv\Scripts\activate # Install the required Python packages pip install -r requirements.txt
You are now ready to generate your first podcast!
The main script assemble_podcast.py reads your script.txt, generates audio for each line, and combines them into a final file.
Simply run the script with Python:
python assemble_podcast.pyThis will:
- Read
script.txt. - Use the default voice model found in the
voices/directory. - Output the final audio to
final_podcast.mp3. - Play a success sound upon completion.
You can customize the behavior using command-line arguments for greater flexibility.
python assemble_podcast.py \
--script my_new_episode.txt \
--voice voices/en_GB-alan-medium \
--output episode_01.wav \
--no-soundAvailable Arguments:
| Argument | Shorthand | Description | Default |
|---|---|---|---|
--script |
-s |
Path to the input script file. | script.txt |
--voice |
-v |
Path to the voice model directory. | First directory in voices/ |
--output |
-o |
Path for the final output audio file. | final_podcast.mp3 |
--no-sound |
Disable the success sound upon completion. | N/A (flag) |
.
βββ .gitignore # Files to ignore for Git
βββ assemble_podcast.py # Main script to generate the podcast
βββ LICENSE # Project license file
βββ README.md # You are here!
βββ requirements.txt # Python package dependencies
βββ script.txt # Default input text script
βββ final_podcast.mp3 # Example output file
βββ assets/ # (Optional) For sounds like intros, outros
β βββ success.wav
βββ piper/ # Piper TTS engine (from releases)
βββ voices/ # Directory for voice models
βββ en_US-amy-medium/
βββ en_US-amy-medium.onnx
βββ en_US-amy-medium.onnx.json
Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project (π΄)
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request (π)
Please open an issue first to discuss any major changes you would like to make.
This project is licensed under the MIT License. See the LICENSE file for more details.
Santhosh Sharuk
Generated by Podcast AI - Where words find their voice.