Deploying a Custom LLM on AWS
How to deploy a custom LLM in a serverless architecture
LLMs are getting better, & increasingly useful. Everyone now has a custom LLM for everything, which is absolutely ridiculous. Regardless, personalised LLMs (custom prompts, fine-tuning open-source models, transfer learning or feature extraction) is awesome.
Here is a simple guide to deploying a serverless LLM.
LLM Application: Tour Guide
Application: A personalised tour guide deployed as a Telegram bot.
To demonstrate, I’ve deployed a personalised tour guide that I can interact with via Telegram. When I’m travelling I find myself Googling the same initial questions before exploring a new area. I’m generally looking for:
- Pescatarian/Vegetarian-friendly local restaurants.
- History of the area.
- Fun facts about the culture.
- Work-friendly cafes.
- Trendy cafes/restaurants/streets & neighbourhoods.
- Iconic landmarks or attractions.
These are easy enough to find but require several different Google searches and saving locations on a map — why not get an LLM to do this for me?
To solve this I built TourGuide77Bot, a public Telegram bot that compiles a personalised travel agenda on demand. The bot is deployed publically and you can interact with her freely via Telegram.
In this article, I will explain how I built TourGuide77Bot, as a demonstration of how to deploy custom LLMs.
Summary
Our build is pretty straightforward:
- The LLM: Google Gemini API: I used Google Gemini’s API with custom prompts as my LLM. This is the easiest way to get started but of course costs per query. If I find the project useful I’ll likely migrate to an open source hugging face model.
- AWS Lambda + Docker:
Docker
is a great way to containerise the application (code containing the queries etc) andLambda
is a great way to host the application logic and run it on request. - AWS API Gateway: API Gateway is used to trigger the application via webhooks, and provide some layer of security.
- DynamoDB: A DynamoDB table is used to monitor requests per user, enforce limits and prevent spam.
- Telegram Bot: We need a UI to interact with our bot. Telegram offers a comprehensive, free API — providing a world-class interface to interact with our Bot.
Prerequisites
If you would like to follow along I suggest cloning the open-source repo:
git clone https://github.com/ZachWolpe/TourGuide77Bot.git
More detailed instructions are available on the GitHub README.md
.
I’m using Python. You will need to:
- Set up an
AWS account
&configure the CLI
- Set up a
Gemini
account andgenerate API keys
- Set up a
Telegram bot
andconfigure API
keys.
GEMINI_API_KEY=<YOUR_GOOGLE_GEMINI_API_KEY>
BOT_API_TOKEN=<YOUR_TELEGRAM_BOT_API_TOKEN>
BOT_USERNAME=<YOUR_TELEGRAM_BOT_USERNAME>
Very useful tutorials
Step 1. Building the LLM Application
I used Google Gemini’s API with custom prompts as my LLM. This project is only possible because LLMs are now able to return very specific info with the correct prompt.
The codebase is structured as follows:
lambda_function.py
: application entry point. It can be used on both lambda and locally for testing.lambda_function_for_polling.py
: provides an alternative solution where polling is deployed.query_gemini.py
: accessing the Gemini API with a custom prompt.telegram_*.py
: Telegram files handle interaction with the Telegram API.
Step 2. AWS Lambda + Docker
Docker & AWS ECR
Packaging our app as a docker container is the easiest way to run it anywhere — living by the mantra “build once debug everywhere”.
Make sure to use an official AWS Image base.
# FROM public.ecr.aws/lambda/python:3.10.2024.09.17.16: Official AWS:Python image is required.
FROM --platform=linux/amd64 public.ecr.aws/lambda/python:3.12
# copy requirements.txt
COPY build/requirements.txt ${LAMBDA_TASK_ROOT}
# install dependencies
RUN pip install -r requirements.txt
# copy the source code
COPY src/*.py ${LAMBDA_TASK_ROOT}
# specify lambda handler
CMD [ "lambda_function.lambda_handler" ]
Host your image on AWS ECR
for quick lambda integration.
```
docker push your-account-id.dkr.ecr.your-region.amazonaws.com/<Image-name-on-ECR>:<Tag>
```
Lambda
Create a lambda function with the AWS CLI.
aws lambda create-function
--function-name <AWS-Lambda-function-name> \
--package-type Image \
--code ImageUri=your-account-id.dkr.ecr.your-region.amazonaws.com/<Image-name-on-ECR>:<Tag> \
--role your-role-arn \
--region your-region
Step 3. Rerun Build for Convenience
For convenience, you may wish to make changes and push these changes through the pipeline with a single bash script.
aws ecr get-login-password --region region | docker login --username AWS --password-stdin aws_account_id.dkr.ecr.region.amazonaws.com \
docker build -f build/dockerfile -t <> .
docker tag <Local-Docker-Image-Name>:<Tag> aws_account_id.dkr.ecr.us-west-2.amazonaws.com/<Image-name-on-ECR>:<Tag>
docker push aws_account_id.dkr.ecr.us-west-2.amazonaws.com/<Image-name-on-ECR>:<Tag>
aws lambda update-function-code --function-name <AWS-Lambda-function-name> --image-uri aws_account_id.dkr.ecr.us-west-2.amazonaws.com/<Image-name-on-ECR>:<Tag> --region your-region
Step 4. Configure AWS API Gateway
We want a service that launches a lambda job whenever we get a message. The easiest way to achieve that is to spinup a server (EC2 instance) and and poll for requests. I’ve implemented this in lambda_function_for_polling.py
.
A more elegant solution is to use webhooks to query lambda via an API — launching an inference call whenever a message is posted to Telegram.
Set up a HTTP
or REST
API on AWS API Gateway. Add the API as a lambda trigger:
Create a Webhook API and connect it to your Lambda function:
1. Go to the AWS Management Console and navigate to API Gateway.
2. Click "Create API" and choose "HTTP API".
3. Under "Integrations", select "Add integration" and choose:
- Integration type: Lambda
- Lambda function: Select your Lambda function (telegram-bot-lambda)
4. Under "Configure routes":
- Method: POST
- Resource path: `/telegram`
- Click "Next"
5. Review and create:
- API name: TelegramBotWebhookAPI (or your preferred name)
- Click "Create"
6. After creation, go to the "Stages" section:
- You should see a default stage (often named $default)
- Note the "Invoke URL" - this is your API endpoint
7. Go to your Lambda function in the AWS Lambda console:
- Under "Configuration", click on "Permissions"
- Scroll down to "Resource-based policy"
- You should see a policy allowing API Gateway to invoke your function
You should now have an endpoint route:
https://<API-ID>.execute-api.<REGION>.amazonaws.com/<stage>/telegram
Create a stage and deploy the API.
Setup Webhooks
Set the webhook URL to the API Gateway URL:
curl -X POST "https://api.telegram.org/bot<YOUR_BOT_TOKEN>/setWebhook?url=https://<api-id>.execute-api.<region>.amazonaws.com/<stage>/telegram"
Step 5. Set API traffic limits: DynamoDB
We’re using paid services here, so you might want to set up traffic limits. Setup a DynamoDB table to store queries the lambda API.
This is implemented in the application layer so that it can be managed in lambda directly. Another option is to implement traffic throttling at the API Gateway level, however, we’re working with an HTTP API so this was the easiest to configure on AWS. However there are some obvious issues here as the check is only handled after launching the lambda instance.
Our lambda_function.py
function contains the logic to:
- Connect to DynamoDB.
- Write the request/user name to the database.
- Check if the user does not exceed request limits.
- Terminate the request if the limit is exceeded.
from datetime import datetime, timedelta, timezone
from dotenv import load_dotenv
import logging
import boto3
# load env --------------------------------------------------------------------------------------------->>
logging.basicConfig(format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', level=logging.INFO)
load_dotenv()
DYNAMODB_TABLE: str = os.getenv("RATE_LIMIT_TABLE")
# load env --------------------------------------------------------------------------------------------->>
# Initialise DynamoDB client --------------------------------------------------------------------------->>
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(DYNAMODB_TABLE)
RATE_LIMIT = 10 # requests
TIME_WINDOW = 60 # seconds
# Initialise DynamoDB client --------------------------------------------------------------------------->>
def check_rate_limit(user_id):
now = datetime.now(timezone.utc)
start_time = now - timedelta(seconds=TIME_WINDOW)
# Update the request count
response = table.update_item(
Key={'user_id': str(user_id)},
UpdateExpression='SET request_count = if_not_exists(request_count, :zero) + :inc, last_request = :now',
ExpressionAttributeValues={
':inc': 1,
':now': now.isoformat(),
':zero': 0,
':window_start': start_time.isoformat(),
':limit': RATE_LIMIT
},
ConditionExpression='attribute_not_exists(last_request) OR last_request < :window_start OR request_count < :limit',
ReturnValues='UPDATED_NEW'
)
# Check if the update was successful (i.e., rate limit not exceeded)
return 'Attributes' in response
Final Step. Get Travel Recommendations!
Our bot is now deployed on Telegram. Search TourGuide77Bot
on Telegram and use the prompt /tour <location>
to get a list of travel recommendations
Here is an example output:
Repository
Clone or visit the code.
git clone https://github.com/ZachWolpe/TourGuide77Bot.git
More detailed instructions are available on the GitHub README.md
.
Connect
The project is open source and I will develop more interesting features. If you’d like to offer feedback or contribute contact me on:
email: zachcolinwolpe@gmail.com