How to Monitor and Optimize LLM Pipelines with LangWatch

Are you building AI chatbots or custom LLM workflows but struggling to measure and improve their performance? LangWatch is a dedicated platform designed to help developers and API teams monitor, evaluate, and optimize large language model (LLM) pipelines—making it easier than ever to fine-tune your AI applications for accuracy and reliability.

💡 Looking for a robust API testing platform that also generates beautiful API Documentation? Want your developer team to collaborate efficiently on an all-in-one platform for maximum productivity? Apidog combines these features and offers a more affordable alternative to Postman!

button

What Is LangWatch? Why It Matters for LLM Developers

LangWatch is built for teams who need to evaluate and monitor generative AI systems—especially those that go beyond standard, deterministic models. Unlike traditional ML evaluation (where metrics like F1 score or BLEU suffice), LLMs are unpredictable and often require custom evaluation aligned with your data and business goals.

Key benefits for technical teams:

Experiment & Optimize: Rapidly test different LLM pipelines and configurations.
Real-Time Monitoring: Track how your LLM behaves in production.
Custom Evaluation: Use datasets and evaluators tailored to your workflows.
Flexible Integrations: Works with your data, models, and pipelines.

Whether you’re building chatbots, translation tools, or complex AI-driven APIs, LangWatch helps ensure your LLMs deliver consistent, high-quality results.

Step-by-Step Guide: Installing and Using LangWatch

Prerequisites

To follow this guide, you’ll need:

Python 3.8+
LangWatch Account (Sign up at app.langwatch.ai)
OpenAI API Key (get one at platform.openai.com)
Code Editor (VS Code, PyCharm, etc.)
Git & Docker (optional, for local setup)

Visit app.langwatch.ai and sign up.
A default project ("AI Bites") is created. Use it or create your own.
Find your LANGWATCH_API_KEY in Project Settings—you’ll need this for integration.

create an account with langwatch

2. Create a Python Chatbot Project with LangWatch

Set up a simple chatbot and integrate LangWatch to track its messages.

a. Create Your Project Folder

mkdir langwatch-demo
cd langwatch-demo

b. Set Up a Virtual Environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

c. Install Dependencies

pip install langwatch chainlit openai

d. Build the Chatbot (app.py)

Paste the following code in app.py:

import os
import chainlit as cl
import asyncio
from openai import AsyncClient

openai_client = AsyncClient()  # Assumes OPENAI_API_KEY is set in environment
model_name = "gpt-4o-mini"
settings = {
    "temperature": 0.3,
    "max_tokens": 500,
    "top_p": 1,
    "frequency_penalty": 0,
    "presence_penalty": 0,
}

@cl.on_chat_start
async def start():
    cl.user_session.set(
        "message_history",
        [
            {
                "role": "system",
                "content": "You are a helpful assistant that only reply in short tweet-like responses, using lots of emojis."
            }
        ]
    )

async def answer_as(name: str):
    message_history = cl.user_session.get("message_history")
    msg = cl.Message(author=name, content="")
    stream = await openai_client.chat.completions.create(
        model=model_name,
        messages=message_history + [{"role": "user", "content": f"speak as {name}"}],
        stream=True,
        **settings,
    )
    async for part in stream:
        if token := part.choices[0].delta.content or "":
            await msg.stream_token(token)
    message_history.append({"role": "assistant", "content": msg.content})
    await msg.send()

@cl.on_message
async def main(message: cl.Message):
    message_history = cl.user_session.get("message_history")
    message_history.append({"role": "user", "content": message.content})
    await asyncio.gather(answer_as("AI Bites"))

e. Set Your OpenAI API Key

export OPENAI_API_KEY="your-openai-api-key"

f. Run the Chatbot

chainlit run app.py

Open http://localhost:8000 to test your chatbot.

test chainlit application

3. Integrate LangWatch Tracing

To track messages, decorate your chatbot handler with LangWatch:

Update app.py:

import langwatch
# ...other imports...

@cl.on_message
@langwatch.trace()
async def main(message: cl.Message):
    # ...rest of handler...

Restart the app:

chainlit run app.py

Ask: “What’s the French word for today?”
Check your LangWatch dashboard under Messages—your input and the chatbot’s response (e.g., "Aujourd’hui! 🇫🇷😊") should appear.

langwatch trace message

4. Evaluate Your Chatbot with LangWatch Datasets

LangWatch lets you define datasets and evaluators to systematically test your LLM’s output.

a. Create a Dataset

In LangWatch, go to Datasets → New Dataset
Add questions and expected answers (e.g., “What’s the French word for today?” → “Aujourd’hui”)

b. Add an Evaluator

Go to Evaluators
Drag the “LLM Answer Match” evaluator into your workspace
Configure input and expected output fields to match your dataset (e.g., question/answer pairs)
Optionally select different LLMs for evaluation

workflow structure

c. Run the Evaluation Workflow

Click “Run Workflow Until Here”
Review results to see if chatbot output matches expectations

run the llm answer match

pop up result

d. Evaluate Workflow on All Test Entries

In the top navbar, click “Evaluate Workflow” → “Test Entries”
Results appear after processing

evaluate the entire llm workfolw

5. Optimize Your LLM Workflow

Once you have evaluation results, use LangWatch to automatically optimize your chatbot’s prompts.

a. Run Optimization

Click “Optimize” in the top navbar
Choose “Prompt Only” to optimize just the prompt
Wait for optimization to finish

start the llm optimization process

b. Review Improvements

See updated results and suggestions for improved prompt or model settings

workflow optimization results

6. (Optional) Local LangWatch Setup with Docker

If you need to test with sensitive data, you can run LangWatch locally:

a. Clone the Repo

git clone https://github.com/langwatch/langwatch.git
cd langwatch

b. Set Up Environment

cp langwatch/.env.example langwatch/.env

c. Launch with Docker

docker compose up -d --wait --build

d. Access the Dashboard

Visit http://localhost:5560 and follow onboarding.

Note: Local Docker setup is for testing only. For production, use LangWatch Cloud or Enterprise On-Prem.

Why LangWatch Stands Out for LLM Monitoring

LangWatch consolidates LLM monitoring, evaluation, and optimization into one developer-focused platform. Instead of juggling custom scripts and ad-hoc metrics, you get:

Centralized tracking of every prompt and response
Repeatable evaluations to benchmark model changes
Prompt and pipeline optimization for higher accuracy

Integrating LangWatch with modern Python stacks (e.g., Chainlit, OpenAI SDK) is straightforward, making it easy to bring observability to your LLM projects.

If you’re working with APIs, having a reliable evaluation tool like LangWatch complements platforms such as Apidog for your broader API lifecycle—ensuring both your endpoints and AI logic are robust and production-ready.

Conclusion

LangWatch empowers API and AI developers to monitor, evaluate, and optimize LLM workflows with confidence. From setting up a Python chatbot to tracking, testing, and improving its performance, LangWatch makes LLM observability accessible and actionable. Try it today at app.langwatch.ai.

💡 For teams managing both APIs and AI workflows, Apidog streamlines your development process—offering beautiful documentation, advanced collaboration for maximum productivity, and a better value than Postman.