Deploying a Custom LLM on AWS

How to deploy a custom LLM in a serverless architecture

7 min readSep 30, 2024

LLMs are getting better, & increasingly useful. Everyone now has a custom LLM for everything, which is absolutely ridiculous. Regardless, personalised LLMs (custom prompts, fine-tuning open-source models, transfer learning or feature extraction) is awesome.

Here is a simple guide to deploying a serverless LLM.

LLM Application: Tour Guide

Application: A personalised tour guide deployed as a Telegram bot.

To demonstrate, I’ve deployed a personalised tour guide that I can interact with via Telegram. When I’m travelling I find myself Googling the same initial questions before exploring a new area. I’m generally looking for:

Pescatarian/Vegetarian-friendly local restaurants.
History of the area.
Fun facts about the culture.
Work-friendly cafes.
Trendy cafes/restaurants/streets & neighbourhoods.
Iconic landmarks or attractions.

These are easy enough to find but require several different Google searches and saving locations on a map — why not get an LLM to do this for me?

To solve this I built TourGuide77Bot, a public Telegram bot that compiles a personalised travel agenda on demand. The bot is deployed publically and you can interact with her freely via Telegram.

In this article, I will explain how I built TourGuide77Bot, as a demonstration of how to deploy custom LLMs.

A screenshot of TourGuide77Bot on Telegram.

Summary

Our build is pretty straightforward:

The LLM: Google Gemini API: I used Google Gemini’s API with custom prompts as my LLM. This is the easiest way to get started but of course costs per query. If I find the project useful I’ll likely migrate to an open source hugging face model.
AWS Lambda + Docker: Docker is a great way to containerise the application (code containing the queries etc) and Lambda is a great way to host the application logic and run it on request.
AWS API Gateway: API Gateway is used to trigger the application via webhooks, and provide some layer of security.
DynamoDB: A DynamoDB table is used to monitor requests per user, enforce limits and prevent spam.
Telegram Bot: We need a UI to interact with our bot. Telegram offers a comprehensive, free API — providing a world-class interface to interact with our Bot.

Prerequisites

If you would like to follow along I suggest cloning the open-source repo:

GitHub - ZachWolpe/TourGuide77Bot: AI-powered Tour Guide deployed as a Telegram bot.

AI-powered Tour Guide deployed as a Telegram bot. Contribute to ZachWolpe/TourGuide77Bot development by creating an…

github.com

git clone https://github.com/ZachWolpe/TourGuide77Bot.git

More detailed instructions are available on the GitHub README.md.

I’m using Python. You will need to:

Set up an AWS account & configure the CLI
Set up a Gemini account and generate API keys
Set up a Telegram bot and configure API keys.

GEMINI_API_KEY=<YOUR_GOOGLE_GEMINI_API_KEY>
BOT_API_TOKEN=<YOUR_TELEGRAM_BOT_API_TOKEN>
BOT_USERNAME=<YOUR_TELEGRAM_BOT_USERNAME>

Very useful tutorials

Step 1. Building the LLM Application

I used Google Gemini’s API with custom prompts as my LLM. This project is only possible because LLMs are now able to return very specific info with the correct prompt.

The codebase is structured as follows:

lambda_function.py : application entry point. It can be used on both lambda and locally for testing.
lambda_function_for_polling.py : provides an alternative solution where polling is deployed.
query_gemini.py : accessing the Gemini API with a custom prompt.
telegram_*.py: Telegram files handle interaction with the Telegram API.

Step 2. AWS Lambda + Docker

Docker & AWS ECR

Packaging our app as a docker container is the easiest way to run it anywhere — living by the mantra “build once debug everywhere”.

Make sure to use an official AWS Image base.

# FROM public.ecr.aws/lambda/python:3.10.2024.09.17.16: Official AWS:Python image is required.
FROM --platform=linux/amd64 public.ecr.aws/lambda/python:3.12

# copy requirements.txt
COPY build/requirements.txt ${LAMBDA_TASK_ROOT}

# install dependencies
RUN pip install -r requirements.txt

# copy the source code
COPY src/*.py ${LAMBDA_TASK_ROOT}

# specify lambda handler
CMD [ "lambda_function.lambda_handler" ]

Host your image on AWS ECR for quick lambda integration.

```
docker push your-account-id.dkr.ecr.your-region.amazonaws.com/<Image-name-on-ECR>:<Tag>
```

Lambda

Create a lambda function with the AWS CLI.

aws lambda create-function 
    --function-name <AWS-Lambda-function-name> \
    --package-type Image \
    --code ImageUri=your-account-id.dkr.ecr.your-region.amazonaws.com/<Image-name-on-ECR>:<Tag> \
    --role your-role-arn \
    --region your-region

Step 3. Rerun Build for Convenience

For convenience, you may wish to make changes and push these changes through the pipeline with a single bash script.

aws ecr get-login-password --region region | docker login --username AWS --password-stdin aws_account_id.dkr.ecr.region.amazonaws.com \ 
docker build -f build/dockerfile -t <> .
docker tag <Local-Docker-Image-Name>:<Tag> aws_account_id.dkr.ecr.us-west-2.amazonaws.com/<Image-name-on-ECR>:<Tag>
docker push aws_account_id.dkr.ecr.us-west-2.amazonaws.com/<Image-name-on-ECR>:<Tag>
aws lambda update-function-code --function-name <AWS-Lambda-function-name> --image-uri aws_account_id.dkr.ecr.us-west-2.amazonaws.com/<Image-name-on-ECR>:<Tag> --region your-region

Step 4. Configure AWS API Gateway

We want a service that launches a lambda job whenever we get a message. The easiest way to achieve that is to spinup a server (EC2 instance) and and poll for requests. I’ve implemented this in lambda_function_for_polling.py.

A more elegant solution is to use webhooks to query lambda via an API — launching an inference call whenever a message is posted to Telegram.

Set up a HTTP or REST API on AWS API Gateway. Add the API as a lambda trigger:


Create a Webhook API and connect it to your Lambda function:

1. Go to the AWS Management Console and navigate to API Gateway.

2. Click "Create API" and choose "HTTP API".

3. Under "Integrations", select "Add integration" and choose:
   - Integration type: Lambda
   - Lambda function: Select your Lambda function (telegram-bot-lambda)

4. Under "Configure routes":
   - Method: POST
   - Resource path: `/telegram`
   - Click "Next"

5. Review and create:
   - API name: TelegramBotWebhookAPI (or your preferred name)
   - Click "Create"

6. After creation, go to the "Stages" section:
   - You should see a default stage (often named $default)
   - Note the "Invoke URL" - this is your API endpoint

7. Go to your Lambda function in the AWS Lambda console:
   - Under "Configuration", click on "Permissions"
   - Scroll down to "Resource-based policy"
   - You should see a policy allowing API Gateway to invoke your function

You should now have an endpoint route:

https://<API-ID>.execute-api.<REGION>.amazonaws.com/<stage>/telegram

Create a stage and deploy the API.

Setup Webhooks

Set the webhook URL to the API Gateway URL:

curl -X POST "https://api.telegram.org/bot<YOUR_BOT_TOKEN>/setWebhook?url=https://<api-id>.execute-api.<region>.amazonaws.com/<stage>/telegram"

Step 5. Set API traffic limits: DynamoDB

We’re using paid services here, so you might want to set up traffic limits. Setup a DynamoDB table to store queries the lambda API.

This is implemented in the application layer so that it can be managed in lambda directly. Another option is to implement traffic throttling at the API Gateway level, however, we’re working with an HTTP API so this was the easiest to configure on AWS. However there are some obvious issues here as the check is only handled after launching the lambda instance.

Our lambda_function.py function contains the logic to:

Connect to DynamoDB.
Write the request/user name to the database.
Check if the user does not exceed request limits.
Terminate the request if the limit is exceeded.

from datetime import datetime, timedelta, timezone
from dotenv import load_dotenv
import logging
import boto3


# load env --------------------------------------------------------------------------------------------->>
logging.basicConfig(format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', level=logging.INFO)
load_dotenv()
DYNAMODB_TABLE: str = os.getenv("RATE_LIMIT_TABLE")
# load env --------------------------------------------------------------------------------------------->>

# Initialise DynamoDB client --------------------------------------------------------------------------->>
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(DYNAMODB_TABLE)
RATE_LIMIT = 10  # requests
TIME_WINDOW = 60  # seconds
# Initialise DynamoDB client --------------------------------------------------------------------------->>

def check_rate_limit(user_id):
    now = datetime.now(timezone.utc)
    start_time = now - timedelta(seconds=TIME_WINDOW)

    # Update the request count
    response = table.update_item(
        Key={'user_id': str(user_id)},
        UpdateExpression='SET request_count = if_not_exists(request_count, :zero) + :inc, last_request = :now',
        ExpressionAttributeValues={
            ':inc': 1,
            ':now': now.isoformat(),
            ':zero': 0,
            ':window_start': start_time.isoformat(),
            ':limit': RATE_LIMIT
        },
        ConditionExpression='attribute_not_exists(last_request) OR last_request < :window_start OR request_count < :limit',
        ReturnValues='UPDATED_NEW'
    )

    # Check if the update was successful (i.e., rate limit not exceeded)
    return 'Attributes' in response

Final Step. Get Travel Recommendations!

Our bot is now deployed on Telegram. Search TourGuide77Bot on Telegram and use the prompt /tour <location> to get a list of travel recommendations

Here is an example output:

Repository

Clone or visit the code.

git clone https://github.com/ZachWolpe/TourGuide77Bot.git

GitHub - ZachWolpe/TourGuide77Bot: AI powered Tour Guide deployed as a Telegram bot.

AI powered Tour Guide deployed as a Telegram bot. Contribute to ZachWolpe/TourGuide77Bot development by creating an…