From Agno CLI Script to Interactive Streamlit UI in Minutes
Transform your Agno CLI scripts into interactive Streamlit prototypes with streaming chat interfaces.
I recently published a tutorial on transforming Agno CLI scripts into production-ready FastAPI endpoints, and the response was encouraging. Several developers reached out with positive feedback and requested more content like this.
One question kept coming up: "This is great for APIs, but what about creating quick prototypes and demos? How do I turn my Agno agent into something non-technical people can actually use?"
That's exactly what we're tackling today. We're taking a more sophisticated Agno agent—one with knowledge bases and persistent storage—and transforming it into an interactive Streamlit web application that anyone can use.
Why Streamlit Changes Everything for AI Prototypes
While FastAPI excels at production APIs, Streamlit dominates the prototyping and demo space for Python developers. And for good reasons: it requires zero frontend knowledge, automatically handles state management, includes beautiful built-in components, and deploys anywhere with minimal configuration.
For AI applications specifically, Streamlit's chat interface, real-time streaming, and session management make it perfect for showcasing agent capabilities to stakeholders, clients, or team members who need to see the AI in action, not just read about it.
What We're Building: A Knowledge-Powered Assistant
Today we're working with a Level 2 Agno agent—significantly more sophisticated than our previous stock price example. This agent includes:
Knowledge Base: Loads and searches through Agno documentation
Persistent Storage: Maintains conversation history across sessions
Vector Search: Uses hybrid search for intelligent knowledge retrieval
Memory: Remembers context from previous interactions
This represents the kind of AI assistant you'd actually deploy in a business environment.
The Starting Point: A Powerful CLI Agent
Here's our starting script—a knowledge-powered assistant that can answer questions about Agno by searching through the official documentation:
from agno.agent import Agent
from agno.embedder.openai import OpenAIEmbedder
from agno.knowledge.url import UrlKnowledge
from agno.models.openai import OpenAIChat
from agno.storage.sqlite import SqliteStorage
from agno.vectordb.lancedb import LanceDb, SearchType
from dotenv import load_dotenv
load_dotenv()
# Load Agno documentation in a knowledge base
knowledge = UrlKnowledge(
urls=["https://docs.agno.com/introduction.md"],
vector_db=LanceDb(
uri="tmp/lancedb",
table_name="agno_docs",
search_type=SearchType.hybrid,
# Use OpenAI for embeddings
embedder=OpenAIEmbedder(id="text-embedding-3-small", dimensions=1536),
),
)
# Store agent sessions in a SQLite database
storage = SqliteStorage(table_name="agent_sessions", db_file="tmp/agent.db")
agent = Agent(
name="Agno Assist",
model=OpenAIChat(id="gpt-4o-mini"),
instructions=[
"Search your knowledge before answering the question.",
"Only include the output in your response. No other text.",
],
knowledge=knowledge,
storage=storage,
add_datetime_to_instructions=True,
# Add the chat history to the messages
add_history_to_messages=True,
# Number of history runs
num_history_runs=3,
markdown=True,
)
if __name__ == "__main__":
# Load the knowledge base, comment out after first run
# Set recreate to True to recreate the knowledge base if needed
agent.knowledge.load(recreate=False)
agent.print_response("What is Agno?", stream=True)
This agent works beautifully in the terminal, but it's locked away from anyone who doesn't want to run Python scripts. Let's fix that.
Step 1: Import Streamlit and Set Up the Basic Structure
First, we add Streamlit to our imports and create the foundation of our web app:
import streamlit as st
st.title("📊 Level 2: Agent with knowledge and storage")
# Initialize session state for messages
if "messages" not in st.session_state:
st.session_state.messages = []
The st.session_state
is Streamlit's way of maintaining data between user interactions. We'll use it to store our chat history.
Step 2: Add Resource Caching to Prevent Recreation
This step is crucial for performance. Without caching, Streamlit would recreate your agent, reload the knowledge base, and reinitialize storage on every user interaction:
@st.cache_resource
def initialize_components():
knowledge = UrlKnowledge(
urls=["https://docs.agno.com/introduction.md"],
vector_db=LanceDb(
uri="tmp/lancedb",
table_name="agno_docs",
search_type=SearchType.hybrid,
embedder=OpenAIEmbedder(id="text-embedding-3-small", dimensions=1536),
),
)
knowledge.load(recreate=False)
storage = SqliteStorage(table_name="agent_sessions", db_file="tmp/agent.db")
agent = Agent(
name="Agno Assist",
model=OpenAIChat(id="gpt-4o-mini"),
instructions=[
"Search your knowledge before answering the question.",
"Only include the output in your response. No other text.",
],
knowledge=knowledge,
storage=storage,
add_datetime_to_instructions=True,
add_history_to_messages=True,
num_history_runs=3,
markdown=True,
)
return knowledge, storage, agent
# Get agent instance
knowledge, storage, agent = initialize_components()
The @st.cache_resource
decorator ensures these expensive operations happen only once, not on every user interaction.
Step 3: Create the Chat Interface
Now we build the interactive chat interface using Streamlit's built-in chat components:
# Display chat message history
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
# Handle user input
if prompt := st.chat_input("What do you want to learn about Agno?"):
# Add user message to chat history
st.session_state.messages.append({"role": "user", "content": prompt})
# Display user message in chat
with st.chat_message("user"):
st.markdown(prompt)
This creates a chat interface that feels familiar to anyone who's used ChatGPT or similar tools.
Step 4: Implement Streaming Responses (The Magic)
Here's where things get really interesting. Instead of waiting for the complete response, we'll stream the agent's output in real-time:
# Process and display agent response
with st.chat_message("assistant"):
with st.spinner("Thinking..."):
# Create a placeholder for the streaming response
message_placeholder = st.empty()
full_response = ""
# Get streaming response instead of waiting for complete response
for chunk in agent.run(prompt, stream=True):
if chunk.content:
# Accumulate the response content
full_response += chunk.content
# Update the display with each chunk
message_placeholder.markdown(full_response + "▌")
# Final update without cursor
message_placeholder.markdown(full_response)
# Add assistant response to chat history
st.session_state.messages.append(
{"role": "assistant", "content": full_response}
)
This streaming implementation is crucial for AI applications. Users see the response building in real-time rather than staring at a loading spinner, creating a much more engaging experience.
Step 5: Add User Experience Enhancements
Finally, we add quality-of-life features that make the app production-ready:
# "Clear Chat" button below the chat
if st.session_state.messages and st.button(
"Clear Chat", use_container_width=True, key="clear_chat"
):
# Only clear the messages, not all session state
st.session_state.messages = []
st.rerun()
# Optional: Add a sidebar for additional functionality
with st.sidebar:
st.header("🎨 Sidebar Demo")
st.write("This is a simple sidebar to check the color scheme.")
st.write("You can add various widgets and information here.")
The Complete Transformation
Here's the full Streamlit version of our CLI agent:
import streamlit as st
from agno.agent import Agent
from agno.embedder.openai import OpenAIEmbedder
from agno.knowledge.url import UrlKnowledge
from agno.models.openai import OpenAIChat
from agno.storage.sqlite import SqliteStorage
from agno.vectordb.lancedb import LanceDb, SearchType
from dotenv import load_dotenv
load_dotenv()
st.title("📊 Level 2: Agent with knowledge and storage")
# Initialize session state for messages
if "messages" not in st.session_state:
st.session_state.messages = []
# Sidebar placeholder
with st.sidebar:
st.header("🎨 Sidebar Demo")
st.write("This is a simple sidebar to check the color scheme.")
st.write("You can add various widgets and information here.")
# Initialize components with caching to prevent recreation on each rerun
@st.cache_resource
def initialize_components():
knowledge = UrlKnowledge(
urls=["https://docs.agno.com/introduction.md"],
vector_db=LanceDb(
uri="tmp/lancedb",
table_name="agno_docs",
search_type=SearchType.hybrid,
embedder=OpenAIEmbedder(id="text-embedding-3-small", dimensions=1536),
),
)
knowledge.load(recreate=False)
# Store agent sessions in a SQLite database
storage = SqliteStorage(table_name="agent_sessions", db_file="tmp/agent.db")
agent = Agent(
name="Agno Assist",
model=OpenAIChat(id="gpt-4o-mini"),
instructions=[
"Search your knowledge before answering the question.",
"Only include the output in your response. No other text.",
],
knowledge=knowledge,
storage=storage,
add_datetime_to_instructions=True,
# Add the chat history to the messages
add_history_to_messages=True,
# Number of history runs
num_history_runs=3,
markdown=True,
)
return knowledge, storage, agent
# Get agent instance
knowledge, storage, agent = initialize_components()
# Display chat message history
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
# Handle user input
if prompt := st.chat_input("What do you want to learn about Agno?"):
# Add user message to chat history
st.session_state.messages.append({"role": "user", "content": prompt})
# Display user message in chat
with st.chat_message("user"):
st.markdown(prompt)
# Process and display agent response
with st.chat_message("assistant"):
with st.spinner("Thinking..."):
# Create a placeholder for the streaming response
message_placeholder = st.empty()
full_response = ""
# Get streaming response instead of waiting for complete response
for chunk in agent.run(prompt, stream=True):
if chunk.content:
# Accumulate the response content
full_response += chunk.content
# Update the display with each chunk
message_placeholder.markdown(full_response + "▌")
# Final update without cursor
message_placeholder.markdown(full_response)
# Add assistant response to chat history
st.session_state.messages.append(
{"role": "assistant", "content": full_response}
)
# "Clear Chat" button below the chat
if st.session_state.messages and st.button(
"Clear Chat", use_container_width=True, key="clear_chat"
):
# Only clear the messages, not all session state
st.session_state.messages = []
st.rerun()
Running Your New Streamlit App
Save the code to a file (e.g., agno_assistant_app.py
). You'll need a virtual environment with the required dependencies:
pip install agno streamlit openai sqlalchemy lancedb tantivy pylance
Add your OpenAI API key to a .env
file (OPENAI_API_KEY=your_key_here
), then run:
streamlit run agno_assistant_app.py
Your interactive agent will be available at http://localhost:8501 with a beautiful, responsive interface that works perfectly on desktop and mobile.
Why This Streaming Implementation Matters
The streaming response capability we've implemented here is not just a nice-to-have—it's essential for professional AI applications. Here's why:
User Experience: Users see progress immediately instead of waiting for complete responses, which can take 10-30 seconds for complex queries.
Performance Perception: Streaming makes the application feel faster and more responsive, even when the total response time is the same.
Engagement: Users stay engaged throughout the response rather than switching tabs or getting distracted.
Professional Feel: The streaming interface matches the experience users expect from modern AI applications.
Key Patterns for Streamlit Success
Resource Caching: Always use
@st.cache_resource
for expensive operations like agent initialization and knowledge base loading.Session State Management: Use
st.session_state
to maintain conversation history and user preferences across interactions.Streaming Responses: Implement real-time streaming for all agent interactions to create a professional user experience.
Error Handling: Wrap agent calls in try-catch blocks and display user-friendly error messages.
Mobile Responsiveness: Streamlit's components are automatically mobile-responsive, but test your interface on different screen sizes.
Scaling This Pattern
This transformation pattern scales beautifully with any Agno agent complexity:
Multi-Agent Teams: Create tabs or pages for different specialized agents
Complex Workflows: Use Streamlit's form components and progress bars for multi-step processes
Advanced Knowledge Bases: Add file upload capabilities for users to extend the knowledge base
Rich Media: Leverage Streamlit's support for images, audio, and video in agent responses
Production Considerations
For production deployment, consider:
Authentication: Add user authentication using Streamlit's built-in auth features
Configuration: Use Streamlit's secrets management for API keys and sensitive configuration
Monitoring: Implement logging and analytics to track usage patterns
Deployment: Use Streamlit Cloud, Docker, or cloud platforms for easy deployment (I’m a big fan of Render)
Performance: Monitor memory usage and implement session cleanup for long-running applications
The Power of Rapid Prototyping
What we've built here represents something powerful: the ability to take a sophisticated AI agent and make it accessible to anyone with a web browser in minutes, not hours or days.
This matters in business environments where you need to quickly demonstrate AI capabilities to stakeholders, gather feedback from users, or validate ideas before investing in full production development.
The same agent logic, the same business intelligence, now available through an interface that anyone can use.
Next Steps
Take any Agno agent from the cookbook, apply this Streamlit transformation pattern, and you'll have a working prototype in minutes. The gap between experimentation and demonstration just disappeared.
Whether you're building internal tools, client demos, or proof-of-concept applications, this pattern gives you a professional interface that scales with your needs.
Your AI agents are ready for their close-up.