MLAGA — Multi-Lingual Audiobook GenerAtor

Demonstrating the ease & utility of modern AI APIs

2 min readMar 30, 2023

MLAGA is an open source, multilingual audiobook generator, built to demonstrate the power & ease of integrating AI APIs into your products.
MLAGA converts YouTube videos to multilingual audiobooks. Learn a language on the go while still consuming the content you love.

ML APIs are playing a bigger & bigger role in deploying ML-empowered products. There are a few clear reasons for this

“Big tech” has access to exponentially more ML resources (compute, engineers & data). Making it impossible for smaller companies to compete in building expensive General Purpose Large Models (GPLMs). Scaling is a prerequisite when building these types of models.
Many ML architectures are built on the cloud (AWS, GCP, Azure) — since the same companies build GPLMs, they are designed to seamlessly integrate into cloud architectures.
The ease of accessibility of model APIs allows for much broader adoption — no ML expertise is required to get things running.
ML APIs actually work. Just a few years ago most general-purpose models were fragile & impractical — proving unreliable in production. The quality of these models has exponentially improved.

Everybody Benefits

It follows that a large part of the ecosystems relies on ML APIs.

Non-ML products/teams can add ML functionality by integrating the relevant SDKs.
ML products/teams can either use the GPLMs to outsource (a component of) a model or as a starting point (transfer learning etc) when building a custom model.

To demonstrate the power & ease of use of these APIs, I built MLAGA.

MLAGA converts YouTube videos to multilingual audiobooks. Learn a language on the go while still consuming the content you love.

Getting Started

Configure your AWS account according to the specification under “Getting Started” in the MLAGA repo README.md.

Stage 1. YouTube to Audio (s3) locally using the built-in CLI. Usage:

arguments:
  exe:          '.FLAC'           # -e:   The desired audio file extention.
  yt_url:       'youtube url'     # -y:   YouTube video URL.
  bucket_name:  's3-bucket'       # -b:   S3 bucket location to store audiofile.
  path:         './temp_store'    # -p:   Path to (temperarily) save audio file.
  cached_audio:                   # -c:   If used, use file that is already downloaded in path.

Stage 2. Transcribe Audio & Stage 3. Translate & TTS will be triggered automatically if stage 1 succeeds.

The (original & translated) audio files will be available in an s3 bucket.

Example Usage

Given a YouTube video URL: https://youtube.com/shorts/y3gMoSopy8I

Simply run:

clear; python stage_01.py \
    -y "https://www.youtube.com/shorts/y3gMoSopy8I" \
    -e ".FLAC" \
    -b $[S3/BUCKET-1] \
    -p "./input"

Output

The pipeline will be triggered automatically, on complete 4 files will be written to $[S3/BUCKET-3](example output available here).

English text transcription.
French text transcription.
English audio (speech synthesis).
French audio (speech synthesis).

Voila! Vous disposez d’un générateur de livres audio multilingues! 🎉🎉🎉

Github

The full implementation is available here.