MLAGA — Multi-Lingual Audiobook GenerAtor
Demonstrating the ease & utility of modern AI APIs
MLAGA is an open source, multilingual audiobook generator, built to demonstrate the power & ease of integrating AI APIs into your products.
MLAGA converts YouTube videos to multilingual audiobooks. Learn a language on the go while still consuming the content you love.
ML APIs are playing a bigger & bigger role in deploying ML-empowered products. There are a few clear reasons for this
- “Big tech” has access to exponentially more ML resources (compute, engineers & data). Making it impossible for smaller companies to compete in building expensive General Purpose Large Models (GPLMs). Scaling is a prerequisite when building these types of models.
- Many ML architectures are built on the cloud (AWS, GCP, Azure) — since the same companies build GPLMs, they are designed to seamlessly integrate into cloud architectures.
- The ease of accessibility of model APIs allows for much broader adoption — no ML expertise is required to get things running.
- ML APIs actually work. Just a few years ago most general-purpose models were fragile & impractical — proving unreliable in production. The quality of these models has exponentially improved.
Everybody Benefits
It follows that a large part of the ecosystems relies on ML APIs.
- Non-ML products/teams can add ML functionality by integrating the relevant SDKs.
- ML products/teams can either use the GPLMs to outsource (a component of) a model or as a starting point (transfer learning etc) when building a custom model.
To demonstrate the power & ease of use of these APIs, I built MLAGA.
Getting Started
Configure your AWS account according to the specification under “Getting Started” in the MLAGA repo
Stage 1. YouTube to Audio (s3) locally using the built-in CLI. Usage:
exe: '.FLAC' # -e: The desired audio file extention.
yt_url: 'youtube url' # -y: YouTube video URL.
bucket_name: 's3-bucket' # -b: S3 bucket location to store audiofile.
path: './temp_store' # -p: Path to (temperarily) save audio file.
cached_audio: # -c: If used, use file that is already downloaded in path.
Stage 2. Transcribe Audio & Stage 3. Translate & TTS will be triggered automatically if stage 1 succeeds.
The (original & translated) audio files will be available in an s3 bucket.
Example Usage
Given a YouTube video URL:
Simply run:
clear; python \
-y "" \
-e ".FLAC" \
-b $[S3/BUCKET-1] \
-p "./input"
The pipeline will be triggered automatically, on complete 4 files will be written to $[S3/BUCKET-3]
(example output available here).
- English text transcription.
- French text transcription.
- English audio (speech synthesis).
- French audio (speech synthesis).
Voila! Vous disposez d’un générateur de livres audio multilingues! 🎉🎉🎉
The full implementation is available here.