☕ Building BrewBuddy: A Multi-Agent AI Coffee Shop Assistant

Combining the art of coffee with the science of AI to create the perfect digital barista experience

Live Demo

Check out the live demo 🚀 to experience BrewBuddy firsthand!

💡 Tip: You can open the link in a new tab by CTRL+click (Windows/Linux) or CMD+click (MacOS).

⏳ Note: The first response may take a few seconds due to the cold start of the serverless deployment.

📖 Introduction: Reimagining the Coffee Shop Experience

Picture this: It's a busy morning, and you're craving your perfect coffee. You walk into your favorite coffee shop where the barista greets you with a warm smile, recommends popular drinks based on what others love, and even suggests a pastry that pairs perfectly with your drink. This personalized touch is what makes a great coffee shop experience memorable.

BrewBuddy brings personalized, conversational coffee shop service to the digital world. Our AI assistant goes beyond a basic chatbot, offering intelligent recommendations and making ordering easy—all powered by a multi-agent architecture designed for natural, human-like interactions.

Our Coffee Shop Inspiration

We found our solution by looking at how real-world coffee shops operate. Think about your last visit to a coffee shop:

In a real coffee shop, you might interact with different staff members depending on your needs—a barista for coffee recommendations, a cashier for ordering, and a manager for store information. Each person specializes in their role, creating a seamless experience through their combined expertise. Why shouldn't a digital assistant work the same way?

This insight led us to develop BrewBuddy's multi-agent architecture, where specialized AI agents handle different aspects of the customer experience—each one an expert in its domain, just like the staff in a well-run coffee shop.

By dividing responsibilities among specialized agents, we could create a system that's more knowledgeable, more contextually aware, and more personalized than a single monolithic model could ever be.

🏗️ System Architecture: The Blueprint for Intelligence

🔍 Multi-Agent Architecture Diagram

Comprehensive view of BrewBuddy's multi-agent architecture and information flow

Before we dive into the details of each agent, let's take a bird's-eye view of the system architecture. The diagram above illustrates how our multi-agent system works together to create a seamless experience.

At its core, BrewBuddy follows a pipeline architecture where each user message flows through a series of specialized agents, with each one contributing its expertise to the final response. The system begins with security and intent classification, routes to specialized agents for domain-specific tasks, and coordinates everything through a central orchestration layer.

Let's explore how each component works in detail.

🧠 The Multi-Agent Architecture: Our Secret Ingredient

The heart of BrewBuddy is its sophisticated multi-agent system that divides responsibilities among specialized AI agents, each focusing on what it does best. This approach mimics how a real coffee shop operates—with specialists handling different aspects of the customer experience.

🛡️ Guard Agent: The Cybersecurity Bouncer

Every good establishment needs security, and in the digital world, this is even more critical. The Guard Agent serves as BrewBuddy's first line of defense, ensuring all interactions remain focused on the coffee shop domain and filtering out potentially harmful or irrelevant queries.

Technical Implementation

The Guard Agent is powered by Google's Generative AI (Gemini), but with a critical twist: instead of using the model for open-ended generation, we constrain it with a carefully designed system prompt that turns it into a binary classifier. This approach transforms a general-purpose LLM into a specialized security filter.

Each incoming message is evaluated against a set of predefined criteria that determine whether it's:

Relevant to the coffee shop domain
Safe and appropriate
Within the operational boundaries of our system

Responsibilities:

🔍 Evaluates incoming messages for relevance to coffee shop topics
🚫 Blocks queries about unrelated subjects (politics, general knowledge questions, etc.)
🛑 Prevents potential misuse through prompt injection or harmful content
🎯 Ensures a focused customer experience by maintaining domain boundaries

# From guard_agent.py
class GuardAgent():
    def get_response(self, messages):
        # ...
        system_prompt = """You are a helpful AI assistant for a coffee shop application which serves drinks and pastries.
        Your task is to determine whether the user is asking something relevant to the coffee shop or not.
        
        The user is allowed to:
        1. Ask questions about the coffee shop, like location, working hours, menu items and coffee shop related questions.
        2. Ask questions about menu items, they can ask for ingredients in an item and more details about the item.
        3. Make an order, modify an order, or complete/finish an order (including phrases like "that's all", "I'm done", "that's it", etc.)
        4. Ask about recommendations of what to buy.
        5. Respond to questions about their order or confirm their choices.

        The user is NOT allowed to:
        1. Ask questions about anything else other than our coffee shop.
        2. Ask questions about the staff or how to make a certain menu item.
        # ...

Under the Hood: JSON Structured Output

One of the technical challenges we faced was ensuring consistent, parseable output from the LLM. We solved this by instructing the model to return structured JSON rather than free-form text:

{
  "chain of thought": "The user is asking about coffee recommendations which is directly related to our coffee shop",
  "decision": "allowed",
  "message": ""
}

This structured output allows the Agent Controller to programmatically process the Guard Agent's decision without ambiguity. If the query is disallowed, the message field contains a friendly explanation that's passed directly to the user.

Error Handling and Robustness

LLMs can sometimes produce unexpected outputs, especially when processing unusual inputs. To handle this, we implemented comprehensive error handling:

JSON Validation: Every response is validated to ensure it contains the required fields
Fallback Mechanisms: If parsing fails, a default "not allowed" response is generated
Logging: All unusual interactions are logged for later analysis and improvement

This robustness ensures that even if the Guard Agent encounters an edge case, the system degrades gracefully rather than failing completely.

🔄 Classification Agent: The Intelligent Router

Once a query passes the Guard Agent's security check, it arrives at the Classification Agent—our intelligent traffic director. This agent analyzes the customer's intent and routes the conversation to the most appropriate specialist agent, ensuring each request is handled by the agent best equipped to respond.

Technical Implementation: Intent Classification

The Classification Agent is essentially an intent classification system powered by a large language model. Unlike traditional intent classifiers that rely on predefined categories and keyword matching, our approach uses the contextual understanding capabilities of Gemini to perform nuanced classification based on semantic meaning.

This is critical because customers can express the same intent in countless different ways:

"What do you recommend?"
"I can't decide what to order"
"What's good here?"
"Help me choose something"

All of these phrasings express the same basic intent (seeking a recommendation), but traditional keyword-based classifiers might struggle with the variations. Our LLM-based approach understands the semantic meaning behind diverse phrasings.

Responsibilities:

🧐 Analyzes the user's intent through semantic understanding
📊 Classifies queries into different functional categories
🚦 Routes the conversation to the appropriate specialized agent
✅ Ensures the most qualified agent handles each request

The Classification Agent evaluates each message and directs it to one of three specialized agents:

📚 Details Agent (for informational queries)
🛒 Order Taking Agent (for ordering and checkout)
🎁 Recommendation Agent (for personalized suggestions)

# From classification_agent.py
class ClassificationAgent():
    def get_response(self, messages):
        # ...
        system_prompt = """You are a helpful AI assistant for a coffee shop application.
        Your task is to determine what agent should handle the user input. You have 3 agents to choose from:
        
        1. details_agent: This agent is responsible for answering questions about the coffee shop, like location, delivery places, working hours, details about menu items. Or listing items in the menu items. Or by asking what we have.
        2. order_taking_agent: This agent is responsible for taking orders from the user. It's responsible to have a conversation with the user about the order until it's complete. This includes handling order completion phrases like "that's all", "I'm done", "that's it", etc.
        3. recommendation_agent: This agent is responsible for giving recommendations to the user about what to buy. If the user asks for a recommendation, this agent should be used.
        # ...

Conversation Context and Multi-Turn Classification

An important technical challenge we solved was handling multi-turn conversations where the intent might evolve over time. For example, a conversation might start with a recommendation request, then transition to ordering, and finally shift to questions about store hours.

The Classification Agent analyzes not just the most recent message, but the entire conversation history to determine the current context. This allows it to understand messages like "Yes, I'll take that" as continuations of an ordering process rather than isolated statements.

Performance Optimization

Classification happens for every user message, so performance was a critical consideration. We optimized this process by:

Minimal Token Usage: Keeping the system prompt concise while maintaining clarity
Efficient Memory Management: Passing only relevant conversation history to the model
Temperature Settings: Using a low temperature (0) to maximize deterministic outputs
Response Caching: Implementing a caching mechanism for common patterns

These optimizations ensure that the classification process adds minimal latency to the overall response time, maintaining a natural conversation flow.

📚 Details Agent: The Knowledge Base

The Details Agent is our coffee shop manager—the knowledgeable expert who can answer any question about the business, menu items, ingredients, pricing, and policies. This agent leverages advanced semantic search technology to retrieve precisely the information a customer needs.

Technical Implementation: Vector Embeddings and Semantic Search

Traditional search systems rely on keyword matching, which falls short when handling natural language queries. For example, if a customer asks "What's your strongest coffee?" a keyword search might not find the relevant information if the database uses terms like "high caffeine content" instead of "strong."

To solve this problem, we implemented a vector database search system using the following components:

Vector Embeddings: We use Google's embedding models to convert text into high-dimensional numerical vectors that capture semantic meaning
Pinecone Vector Database: These embeddings are stored in Pinecone, a specialized vector database optimized for similarity search
Semantic Matching: When a query arrives, it's converted to the same vector space and matched against stored information

# From details_agent.py
class DetailsAgent():
    def __init__(self):
        # ...
        self.embedding_model_name = os.getenv("EMBEDDING_MODEL_NAME")
        self.embedding_client = GoogleGenerativeAIEmbeddings(model=self.embedding_model_name) 
        
        # Pinecone vector database setup
        self.pc = Pinecone(api_key=pinecone_api_key)
        self.index_name = os.getenv("PINECONE_INDEX_NAME")
        # ...

Knowledge Representation and Retrieval

Our knowledge base contains hundreds of information snippets about our coffee shop, including:

Menu item descriptions, ingredients, and nutritional information
Store policies, locations, and hours
Coffee origins and brewing methods
Seasonal offerings and promotions

When a customer asks a question, the Details Agent:

Converts the question into a vector embedding
Searches the Pinecone database for the most semantically similar content
Retrieves the relevant information
Formulates a natural, conversational response using the retrieved context

# Retrieval process in Details Agent
def get_closest_results(self, index_name, input_embeddings, top_k=2):
    """Get closest results from Pinecone with error handling."""
    try:
        index = self.pc.Index(index_name)
        
        results = index.query(
            namespace="ns1",
            vector=input_embeddings,
            top_k=top_k,
            include_values=False,
            include_metadata=True
        )

        return results
    except Exception as e:
        logger.error(f"Error getting closest results: {str(e)}")
        return {"matches": []}

Response Generation with Context

Once the relevant information is retrieved, the agent doesn't simply return the raw data. Instead, it passes the retrieved information as context to the Gemini model, allowing it to generate a natural, conversational response that addresses the customer's specific question.

This approach combines the precision of vector search with the natural language generation capabilities of large language models, creating responses that are both accurate and conversational.

Responsibilities:

📊 Converts natural language questions into vector embeddings
🔍 Retrieves semantically relevant information from the knowledge base
⏰ Provides accurate answers about menu items, store policies, and offerings
📋 Maintains consistent, up-to-date information about the coffee shop

Example Interaction

Customer: "I have a nut allergy. Which pastries should I avoid?"

Details Agent Process:

Converts query to embedding vector
Searches knowledge base for allergy information
Retrieves information about nut-containing products
Generates a helpful response with the retrieved information

Response: "I understand you have a nut allergy—thank you for letting me know. You should avoid our Almond Croissants, Hazelnut Biscotti, and any seasonal items with 'nutty' in the description. Our Chocolate Chip Biscotti, Chocolate Croissants, Oatmeal Scones, and Cranberry Scones are all nut-free. However, all items are prepared in a kitchen that handles nuts, so there is a possibility of cross-contamination."

🛒 Order Taking Agent: The Conversational Cashier

The Order Taking Agent is our expert cashier—handling the entire ordering process from start to finish. Unlike simple form-filling interfaces, this agent maintains a natural conversation throughout the ordering process, remembering your selections, suggesting modifications, and guiding you to a complete order.

Technical Implementation: Stateful Conversation Management

One of the most significant challenges in building the Order Taking Agent was maintaining state throughout a potentially long and complex conversation. Traditional chatbots often struggle with this, losing track of earlier parts of an order or failing to update items when modifications are requested.

We solved this through a sophisticated state management system:

# From order_taking_agent.py
def get_response(self, messages):
    # ...
    # Get current order status from previous messages
    current_order = []
    asked_recommendation_before = False
    current_step = "1"
    
    for message in reversed(messages):
        if (message.get('role') == 'assistant' and 
            message.get('memory', {}).get('agent') == 'order_taking_agent'):
            current_order = message['memory'].get('order', [])
            asked_recommendation_before = message['memory'].get('asked_recommendation_before', False)
            current_step = message['memory'].get('step number', '1')
            break
    # ...

This approach allows the agent to:

Maintain Order State: Keep track of items, quantities, and modifications
Track Conversation Progress: Know which step of the ordering process we're in
Remember Past Interactions: Recall if recommendations were already offered

Structured Order Data Model

To ensure consistent order processing, we implemented a structured data model for orders:

{
  "item": "Cappuccino",
  "quantity": 2,
  "price": 4.50
}

This structured approach ensures that orders are consistent and can be easily processed, validated, and modified.

Multi-Step Ordering Process

The Order Taking Agent implements a guided, multi-step ordering process:

Initial Order Taking: Capturing the customer's initial requests
Order Validation: Ensuring all items exist in the menu
Modifications: Handling special requests or changes
Confirmation: Verifying the order is complete
Summarization: Providing a complete order summary with total
Completion: Thanking the customer and closing the conversation

Each step has specific logic and prompting strategies to guide the customer naturally through the process.

Menu Validation and Error Handling

A critical function of the Order Taking Agent is validating orders against our available menu. This prevents errors and ensures customers only order items we can actually provide.

When a customer orders an item not on the menu, the agent:

Identifies the unavailable items
Informs the customer which items aren't available
Maintains the valid items in the order
Suggests alternatives when possible

Integration with Recommendation Engine

One of the most innovative aspects of the Order Taking Agent is its integration with the Recommendation Agent. At appropriate points in the conversation, it can seamlessly introduce personalized recommendations:

# Get recommendations if appropriate
response = output['response']

if not asked_recommendation_before and combined_order:
    try:
        recommendation_output = self.recommendation_agent.get_recommendations_from_order(messages, combined_order)
        response = f"{response}\n\n{recommendation_output['content']}"
        asked_recommendation_before = True
    except Exception as e:
        logger.error(f"Error getting recommendations: {str(e)}")

This creates natural upselling opportunities similar to a barista suggesting "Would you like a pastry with your coffee?" but powered by sophisticated recommendation algorithms.

Responsibilities:

📝 Maintains the full order state throughout the conversation
✅ Validates items against the available menu
🧮 Tracks quantities and calculates accurate totals
📋 Provides clear order summaries
🛣️ Guides customers through a natural ordering flow
🤝 Integrates with the Recommendation Agent for personalized suggestions

Example Interaction

Customer: "I'd like two lattes and a chocolate croissant."

Order Taking Agent Process:

Parses the order to identify items and quantities
Validates items against the menu
Adds items to the order state
Generates a response confirming the items
Asks if the customer wants anything else
(Potentially) Offers a personalized recommendation

Response: "I've added 2 Lattes at $4.75 each and 1 Chocolate Croissant at $3.75 to your order. Would you like anything else? Our Hazelnut Biscotti pairs wonderfully with lattes and is one of our most popular combinations."

🎯 Recommendation Agent: The Personalization Engine

The Recommendation Agent is our coffee connoisseur—the expert who understands customer preferences and purchasing patterns to suggest products that delight. This sophisticated agent combines multiple recommendation strategies to create personalized suggestions that feel remarkably human.

Technical Implementation: Multiple Recommendation Strategies

Creating truly personalized recommendations requires more than a one-size-fits-all approach. The Recommendation Agent implements three distinct recommendation strategies, each optimized for different scenarios:

1. Association Rule Mining with the Apriori Algorithm

The cornerstone of our recommendation system is the Apriori algorithm, a classic data mining technique used to discover relationships between items in large datasets. Here's how it works:

Data Collection: We analyze thousands of past orders to identify items frequently purchased together
Pattern Discovery: The algorithm identifies "itemsets" that appear together frequently
Rule Generation: These itemsets are converted into "if-then" rules (e.g., "if a customer orders a latte, then they're likely to also enjoy a chocolate croissant")
Rule Pruning: Rules are filtered based on confidence and support metrics

This approach allows us to make recommendations based on actual customer behavior rather than arbitrary assumptions. For example, if 70% of customers who order a cappuccino also order a chocolate croissant, we can confidently recommend that pairing.

def get_apriori_recommendation(self, items, max_recommendations=3):
    """Get recommendations based on items frequently bought together."""
    try:
        recommendations = set()
        for item in items:
            if item in self.apriori_recommendations:
                # Extract just the product names from the recommendations
                item_recommendations = [rec['product'] for rec in self.apriori_recommendations[item]]
                recommendations.update(item_recommendations)
        
        # Remove items already in the input list
        recommendations = recommendations - set(items)
        
        return list(recommendations)[:max_recommendations]
    except Exception as e:
        logger.error(f"Error getting apriori recommendations: {str(e)}")
        return []

2. Popularity-Based Recommendations

For new customers or when personalized data is limited, we fall back to popularity-based recommendations. This simpler but effective approach:

Tracks Purchase Frequency: We maintain a running count of all product purchases
Ranks Products: Items are ranked by overall popularity
Filters by Relevance: The most popular items are recommended

def get_popular_recommendation(self, max_recommendations=3):
    """Get overall popular recommendations."""
    try:
        if not self.popular_recommendations.empty:
            return self.popular_recommendations['product'].head(max_recommendations).tolist()
        return []
    except Exception as e:
        logger.error(f"Error getting popular recommendations: {str(e)}")
        return []

3. Category-Based Recommendations

When customers express interest in a specific category (e.g., "I'm looking for something with chocolate"), we use category-based recommendations:

Category Mapping: Each product is mapped to one or more categories
In-Category Popularity: We track popularity within each category
Category Filtering: Recommendations are filtered to match the requested categories

def get_popular_by_category_recommendation(self, categories, max_recommendations=3):
    """Get popular recommendations within specified categories."""
    try:
        if not self.popular_recommendations.empty:
            recommendations = []
            for category in categories:
                category_items = self.popular_recommendations[
                    self.popular_recommendations['product_category'] == category
                ]['product'].head(max_recommendations).tolist()
                recommendations.extend(category_items)
            return recommendations[:max_recommendations]
        return []
    except Exception as e:
        logger.error(f"Error getting category recommendations: {str(e)}")
        return []

Intelligent Recommendation Classification

A key innovation in our recommendation system is the ability to automatically determine which recommendation strategy is most appropriate for each customer interaction. Rather than using simple rules, we use the LLM to analyze the customer's query and determine the best strategy:

def recommendation_classification(self, messages):
    """Classify the type of recommendation needed."""
    try:
        # Get the last user message
        user_message = ""
        for msg in reversed(messages):
            if msg.get('role') == 'user':
                user_message = msg.get('content', '')
                break

        if not user_message:
            return None

        system_prompt = f"""You are a helpful AI assistant for a coffee shop application which serves drinks and pastries. 
        We have 3 types of recommendations:

        1. Apriori Recommendations: Based on items frequently bought together
        2. Popular Recommendations: Based on overall item popularity
        3. Popular Recommendations by Category: Based on popularity within a specific category

        Available items: {", ".join(self.products)}
        Available categories: {", ".join(self.product_categories)}

        Your output MUST be a valid JSON string with this exact format:
        {{
            "chain of thought": "your analysis of the recommendation type needed",
            "recommendation_type": "apriori OR popular OR popular by category",
            "parameters": []  // List of items for apriori, categories for popular by category, empty for popular
        }}
        """

This approach allows for intelligent, context-aware recommendations that adapt to the customer's specific request.

Contextual Order-Based Recommendations

One of the most powerful features of the Recommendation Agent is its ability to analyze the current order and suggest complementary items. For example, if a customer has ordered a coffee but no food items, the agent might suggest a pastry that pairs well with that specific coffee.

def get_recommendations_from_order(self, messages, order):
    """Get recommendations based on current order."""
    try:
        products = [item['item'] for item in order]
        recommendations = self.get_apriori_recommendation(products)
        
        if not recommendations:
            # Get popular recommendations instead
            recommendations = self.get_popular_recommendation(max_recommendations=2)
            
            if not recommendations:
                return {
                    "role": "assistant",
                    "content": "Would you like anything else with your order? We have a variety of pastries and drinks that might interest you!",
                    "memory": {"agent": "recommendation_agent"}
                }
        
        recommendations_str = ", ".join(recommendations)
        
        system_prompt = """You are a helpful AI assistant for a coffee shop application.
        Your task is to recommend additional items that would go well with the customer's current order.
        Be friendly, enthusiastic, and explain why these items would complement their order.
        Focus on creating an appealing combination of items.
        DO NOT list or summarize their current order - just focus on the recommendations."""

The Science Behind Apriori: How Association Rule Mining Works

Let's take a deeper look at how the Apriori algorithm works, as it's the foundation of our most sophisticated recommendation strategy:

Support Calculation: For each itemset (group of items), we calculate its "support"—the percentage of transactions containing that itemset. For example, if 30% of all orders contain both a latte and a chocolate croissant, the support for this itemset is 0.3.
Frequent Itemset Generation: We identify all itemsets with support above a minimum threshold. The algorithm works iteratively, first finding frequent individual items, then pairs, then triplets, and so on.
Rule Generation: From these frequent itemsets, we generate association rules in the form "if A, then B" along with two key metrics:
- Confidence: The probability of B given A (e.g., 70% of customers who order a latte also order a chocolate croissant)
- Lift: How much more likely B is given A, compared to B's overall popularity (controlling for items that are just generally popular)
Rule Selection: We select rules with high confidence and lift values to ensure our recommendations are both accurate and meaningful.

In our implementation, we pre-compute these association rules and store them in a JSON format for quick retrieval:

{
  "Latte": [
    {"product": "Chocolate Croissant", "confidence": 0.72, "lift": 2.3},
    {"product": "Hazelnut Biscotti", "confidence": 0.65, "lift": 3.1}
  ],
  "Cappuccino": [
    {"product": "Almond Croissant", "confidence": 0.68, "lift": 2.5},
    {"product": "Chocolate Chip Biscotti", "confidence": 0.58, "lift": 2.1}
  ]
}

This pre-computation allows us to make real-time recommendations without the computational overhead of running the algorithm for each request.

Responsibilities:

🧪 Implements multiple sophisticated recommendation strategies
🔄 Analyzes purchase patterns to identify complementary products
📊 Tracks product popularity for general recommendations
📁 Organizes products into categories for targeted suggestions
📈 Analyzes current order context to suggest complementary items
💬 Generates natural language explanations for recommendations

Example Interaction

Customer: "Can you recommend something that goes well with a cappuccino?"

Recommendation Agent Process:

Classifies this as an apriori recommendation request with "Cappuccino" as the parameter
Retrieves pre-computed association rules for "Cappuccino"
Selects the top recommendations based on confidence and lift
Generates a natural language response explaining the recommendations

Response: "A cappuccino pairs wonderfully with our Almond Croissant—the nutty flavor complements the coffee's rich, creamy texture perfectly. If you prefer something less sweet, our Chocolate Chip Biscotti is also a popular choice with cappuccino lovers. The biscotti is perfect for dipping into your coffee for an extra flavor dimension."

🎭 The Agent Controller: The Orchestra Conductor

Coordinating all these specialized agents is the Agent Controller—the maestro that orchestrates the entire system, ensuring each agent performs its role at the right time and in harmony with the others. This component is the "brain" that enables the entire multi-agent system to function as a cohesive whole.

Technical Implementation: The Pipeline Architecture

The Agent Controller implements a sophisticated pipeline architecture that processes each user message through a series of specialized agents:

# From agent_controller.py
class AgentController():
    def __init__(self):
        # ...
        # Initialize agents with error handling
        self.guard_agent = GuardAgent()
        self.classification_agent = ClassificationAgent()
        self.recommendation_agent = RecommendationAgent(apriori_path, popularity_path)
        
        self.agent_dict: dict[str, AgentProtocol] = {
            "details_agent": DetailsAgent(),
            "order_taking_agent": OrderTakingAgent(self.recommendation_agent),
            "recommendation_agent": self.recommendation_agent
        }
        # ...

    def get_response(self, input):
        # ...
        # Get GuardAgent's response
        guard_agent_response = self.guard_agent.get_response(messages)
        if guard_agent_response["memory"]["guard_decision"] == "not allowed":
            return guard_agent_response
        
        # Get ClassificationAgent's response
        classification_agent_response = self.classification_agent.get_response(messages)
        chosen_agent = classification_agent_response["memory"]["classification_decision"]
        
        # Get the chosen agent's response
        agent = self.agent_dict[chosen_agent]
        response = agent.get_response(messages)
        # ...

This pipeline approach provides several critical advantages:

Sequential Processing: Each message is processed in a logical sequence:
- First, the Guard Agent verifies the query is appropriate
- Then, the Classification Agent determines which specialized agent should handle it
- Finally, the appropriate specialized agent generates the response
Early Termination: If the Guard Agent determines a query is inappropriate, the pipeline stops immediately, saving processing time and maintaining system boundaries.
Dynamic Routing: The Classification Agent dynamically routes each query to the most appropriate specialized agent, ensuring optimal handling.

Dependency Injection and Agent Protocol

To maintain flexibility and enable easy extension of the system, we implemented a dependency injection pattern with a common agent protocol:

# From agent_protocol.py
from typing import Protocol, List, Dict, Any

class AgentProtocol(Protocol):
    def get_response(self, messages: List[Dict[str, Any]]) -> Dict[str, Any]:
        ...

This approach allows us to:

Standardize Agent Interfaces: All agents implement the same basic interface
Enable Easy Testing: Mock agents can be injected for testing
Support Future Extensions: New agents can be added by simply implementing the protocol

Memory Management and Context Preservation

The Agent Controller preserves conversation context through a sophisticated memory management system:

Agent-Specific Memory: Each agent maintains its own memory in a standardized format
Conversation History: The full conversation history is passed to each agent
Memory Persistence: Critical data (like order details) is preserved across agent transitions
Agent-Specific Memory: Each agent maintains its own specialized memory fields

# Example memory structure from Order Taking Agent
{
    "agent": "order_taking_agent",
    "step number": "3",
    "order": [
        {"item": "Latte", "quantity": 2, "price": 4.75},
        {"item": "Chocolate Croissant", "quantity": 1, "price": 3.75}
    ],
    "asked_recommendation_before": true
}

This unified memory architecture ensures that regardless of which agent is handling a particular message, the full conversation context is available, creating a seamless experience for the user.

Error Handling and System Robustness

A critical function of the Agent Controller is ensuring system robustness through comprehensive error handling:

def get_response(self, input):
    try:
        # Extract User Input
        job_input = input["input"]
        messages = job_input["messages"]
        
        # Log incoming request
        logger.info(f"Processing request with {len(messages)} messages")
        
        # Agent pipeline processing...
        
    except KeyError as e:
        logger.error(f"Invalid input format: {str(e)}")
        return {
            "role": "assistant",
            "content": "I apologize, but I encountered an error with the input format. Please try again.",
            "memory": {"error": str(e)}
        }
    except Exception as e:
        logger.error(f"Error processing request: {str(e)}")
        return {
            "role": "assistant",
            "content": "I apologize, but I encountered an error. Please try again.",
            "memory": {"error": str(e)}
        }

This robust error handling ensures that even if individual agents encounter issues, the system as a whole continues to function, providing a graceful degradation of service rather than outright failures.

Responsibilities:

🔄 Orchestrates the flow of information between specialized agents
💾 Maintains conversation context throughout interactions
🧩 Ensures a cohesive user experience across agent transitions
🛠️ Handles error conditions gracefully to maintain system stability
📊 Provides logging and monitoring of system performance

🌟 Advanced Features: Engineering for Intelligence

Beyond the core agent architecture, BrewBuddy implements several advanced features that elevate the system from a simple chatbot to a truly intelligent assistant. These features work together to create a seamless, personalized experience that feels remarkably human.

🧠 Stateful Memory Management: The Key to Contextual Understanding

One of the most challenging aspects of building conversational AI is maintaining coherent context across a multi-turn conversation. Traditional chatbots often suffer from "amnesia," forgetting earlier parts of the conversation and forcing users to repeat information.

Technical Implementation: Agent Memory Dictionaries

BrewBuddy solves this through a sophisticated memory system where each agent response includes a standardized memory dictionary:

return {
    "role": "assistant",
    "content": response_text,
    "memory": {
        "agent": "order_taking_agent",
        "step number": output["step number"],
        "order": combined_order,
        "asked_recommendation_before": asked_recommendation_before
    }
}

This memory system provides several key advantages:

State Persistence: Critical information (like order details) persists across turns
Agent Awareness: Each agent knows which other agents have been involved
Conversation Tracking: The system tracks where in a process (like ordering) the conversation is
Optimization Opportunities: We avoid redundant actions (like offering recommendations multiple times)

Handling Complex Conversation Flows

This memory system is particularly powerful for handling complex, multi-step processes like ordering. For example:

User: "I'd like a latte"
System: (Stores "latte" in order memory) "Great choice! Would you like anything else?"
User: "Make that two lattes"
System: (Updates quantity in memory) "Updated to two lattes. Anything else?"
User: "That's all"
System: (Recognizes completion trigger, accesses full order from memory) "Your order of 2 lattes at $4.75 each comes to $9.50 total."

Without this memory system, the agent would lose track of the evolving order, creating a frustrating user experience.

🔍 Vector Search Technology: Beyond Keyword Matching

Traditional information retrieval systems rely on keyword matching, which fails to capture the semantic meaning behind natural language queries. BrewBuddy implements vector search technology to truly understand the meaning behind customer questions.

Technical Implementation: Embedding Generation and Similarity Search

Our approach to semantic search involves:

Corpus Vectorization: We convert our entire knowledge base into numerical vector representations (embeddings) that capture semantic meaning
Query Vectorization: Each customer query is similarly converted to a vector
Similarity Search: We find the most similar content vectors to the query vector
Context-Enhanced Response Generation: The retrieved content is used as context for response generation

def get_embedding(embedding_client, text_input):
    """Get embeddings with error handling."""
    try:
        output = embedding_client.embed_query(text_input)
        return output
    except Exception as e:
        logger.error(f"Error in get_embedding: {str(e)}")
        return None

This approach provides several significant advantages:

Semantic Understanding: The system understands the meaning behind questions, not just keywords
Handling Synonyms: Questions can use different terminology than our knowledge base
Context Awareness: The system understands related concepts and topics

For example, if a customer asks "What's your strongest coffee?" our vector search understands this is semantically related to caffeine content, even if our knowledge base uses different terminology.

💻 Serverless Deployment Architecture: Scaling on Demand

Traditional deployment architectures often struggle with the variable load patterns of consumer applications. BrewBuddy leverages a modern serverless architecture to provide optimal performance and cost-efficiency.

Technical Implementation: RunPod Serverless Functions

We deployed the entire backend as a serverless function on RunPod, using the following architecture:

# From main.py
from agent_controller import AgentController
import runpod

def main():
    agent_controller = AgentController()
    runpod.serverless.start({"handler": agent_controller.get_response})


if __name__ == "__main__":
    main()

This serverless approach provides several key benefits:

On-Demand Scaling: Resources scale automatically with demand
Cost Optimization: We only pay for actual compute time used
High Availability: The service is distributed across multiple nodes
Simplified Operations: No server management overhead

For a coffee shop application with variable traffic patterns (busy mornings, quieter afternoons), this approach ensures responsiveness during peak times while minimizing costs during slower periods.

🧩 Structured Output Patterns: Reliable Data Extraction

Large language models are powerful but can produce inconsistent outputs. For critical operations like order processing, we need structured, reliable data. BrewBuddy implements a structured output pattern to solve this challenge.

Technical Implementation: JSON Output Templates

We instructed our language models to produce structured JSON outputs for all operational data:

system_prompt = """Your output MUST be a raw JSON string with NO markdown formatting or code blocks.
Do NOT wrap the JSON in ```json or ``` markers.
Just return the raw JSON with this exact format:
{
    "chain of thought": "your analysis of the order status and next steps",
    "step number": "current step number (1-6)",
    "order": [
        {"item": "item name", "quantity": number, "price": price}
    ],
    "response": "your response to the user"
}"""

This approach ensures:

Reliable Data Extraction: Critical information (like order details) is in a consistent format
Input Validation: The JSON structure can be validated before processing
Error Recovery: When parsing fails, we can implement fallback mechanisms

We also implemented robust error handling for cases where the model might produce invalid JSON:

def double_check_json_output(client, json_string):
    """Validate and correct JSON output with error handling."""
    try:
        # First try to parse it as JSON
        try:
            json.loads(json_string)
            return json_string
        except json.JSONDecodeError:
            pass

        # If parsing failed, try to fix it
        prompt = """You will check this json string and correct any mistakes that will make it invalid..."""

This combination of structured outputs and robust error handling ensures reliable operation even with the inherent variability of language model outputs.

🧩 Challenges and Engineering Solutions

Building a sophisticated multi-agent AI system presented numerous technical challenges. Each challenge required innovative engineering solutions to create a system that was not just intelligent but also reliable, performant, and user-friendly.

Challenge 1: Maintaining Conversation Context Across Agents

Creating a system where multiple specialized agents could seamlessly share conversation context was one of our biggest challenges. Without proper context management, the conversation would feel disjointed, with different agents asking for the same information repeatedly or failing to understand the conversation history.

Solution: Unified Memory Architecture

We implemented a sophisticated memory architecture with several key components:

Standardized Memory Format: All agents use the same memory dictionary structure
Full History Passing: The complete conversation history is passed to each agent
Memory Persistence: Critical data (like order details) is preserved across agent transitions
Agent-Specific Memory: Each agent maintains its own specialized memory fields

# Example memory structure from Order Taking Agent
{
    "agent": "order_taking_agent",
    "step number": "3",
    "order": [
        {"item": "Latte", "quantity": 2, "price": 4.75},
        {"item": "Chocolate Croissant", "quantity": 1, "price": 3.75}
    ],
    "asked_recommendation_before": true
}

This unified memory architecture ensures that regardless of which agent is handling a particular message, the full conversation context is available, creating a seamless experience for the user.

Challenge 2: Balancing Response Time and Intelligence

Each additional agent in our pipeline adds processing time, potentially creating noticeable latency for users. However, simplifying the system would reduce its intelligence and capabilities. Finding the right balance was a significant challenge.

Solution: Optimized Pipeline and Sequential Processing

We implemented several optimizations to maintain quick response times:

Optimized Prompts: We carefully crafted minimal, efficient prompts for each agent to reduce token usage
Early Pipeline Termination: The Guard Agent can terminate processing early for inappropriate queries
Gemini Flash Preview Model: We used the Gemini Flash Preview model, which offers low cost and fast response times, further improving the system's efficiency and user experience.

These optimizations ensure that our multi-agent system remains responsive while still leveraging the full power of specialized agents.

Challenge 3: Handling Edge Cases and Unexpected Inputs

Users can phrase questions and requests in countless ways, many of which might not be anticipated during development. Building robust agents that could handle unexpected inputs required extensive error handling and fallback mechanisms.

Solution: Comprehensive Error Handling and Graceful Degradation

We implemented a multi-layered approach to error handling:

Input Validation: All user inputs are validated before processing
Exception Handling: Every agent function includes comprehensive exception handling
JSON Validation: All structured outputs are validated before further processing
Fallback Responses: When specific processing fails, we provide helpful fallback responses
Logging and Monitoring: Unusual interactions are logged for later analysis and improvement

try:
    # Try to parse the response as JSON
    output = json.loads(response_text)
    
    # Validate required fields
    if not all(key in output for key in ["chain of thought", "decision", "message"]):
        raise ValueError("Missing required fields in response")
    
    return {
        "role": "assistant",
        "content": output["message"] if output["decision"] == "not allowed" else "",
        "memory": {
            "agent": "guard_agent",
            "guard_decision": output["decision"]
        }
    }
    
except json.JSONDecodeError:
    logger.error(f"Invalid JSON response from model: {response_text}")
    return {
        "role": "assistant",
        "content": "I apologize, but I encountered an error. Could you please try again?",
        "memory": {
            "agent": "guard_agent",
            "guard_decision": "not allowed"
        }
    }

This approach ensures that even when faced with unexpected inputs or processing errors, the system degrades gracefully rather than failing completely.

Challenge 4: Seamless Frontend-Backend Integration

Creating a seamless experience between the Next.js frontend and Python backend required careful API design, state management, and error handling. The two systems needed to work together flawlessly despite being built on different technology stacks.

Solution: Clean API Contract and Robust State Management

Our solution involved several key components:

Standardized API Contract: We defined a clear, consistent API contract between frontend and backend
React Context for State: Frontend state is managed through React Context for consistency
Optimistic Updates: The UI updates optimistically while awaiting backend confirmation
Error Recovery: Failed requests are gracefully handled with user-friendly error messages
Persistent Storage: Critical state (like cart contents) is persisted to local storage

This comprehensive approach ensures that the user experiences a seamless interaction, with the complexity of the multi-agent system hidden behind an intuitive, responsive interface.

Challenge 5: Ensuring Reliable Recommendation Quality

Building a recommendation system that consistently provides high-quality, relevant suggestions was challenging. Simple frequency-based approaches often recommend popular items regardless of relevance, while overly specific approaches can fail to provide recommendations for new or unusual combinations.

Solution: Hybrid Recommendation Strategy with Fallbacks

We implemented a multi-tiered recommendation approach with fallbacks:

Primary Strategy: Association rule mining (apriori algorithm) for personalized recommendations
Secondary Strategy: Category-based recommendations when specific categories are mentioned
Tertiary Strategy: Popularity-based recommendations when more specific strategies fail
Fallback Content: Generic recommendations with explanatory text when all else fails

def get_recommendations_from_order(self, messages, order):
    """Get recommendations based on current order."""
    try:
        products = [item['item'] for item in order]
        recommendations = self.get_apriori_recommendation(products)
        
        if not recommendations:
            # Get popular recommendations instead
            recommendations = self.get_popular_recommendation(max_recommendations=2)
            
            if not recommendations:
                return {
                    "role": "assistant",
                    "content": "Would you like anything else with your order? We have a variety of pastries and drinks that might interest you!",
                    "memory": {"agent": "recommendation_agent"}
                }

This layered approach ensures that customers always receive recommendations, with the system automatically adjusting the recommendation strategy based on available data.

🎨 The Frontend Experience

While the backend intelligence is impressive, we also built a beautiful, responsive frontend that complements the smart technology behind the scenes.

Our Next.js frontend:

🖼️ Showcases products in an elegant, coffee-themed UI
💬 Provides an intuitive chat interface for natural conversations
🛒 Maintains a cart system for easy ordering
📱 Ensures a seamless experience across devices

We paid special attention to the user experience:

Warm, coffee-inspired color palette creates an inviting atmosphere
Glass-morphism design elements add a premium feel
Subtle animations enhance engagement without being distracting
Accessible, responsive layout works beautifully on everything from phones to desktops

The frontend is built with Next.js and uses modern React patterns to create a fluid, responsive user experience that complements the intelligent backend.

🔮 Conclusion: The Future of Conversational AI

BrewBuddy demonstrates the power of a multi-agent architecture for creating sophisticated AI assistants. By dividing responsibilities among specialized agents, we created a system that's more capable, maintainable, and extensible than a single monolithic model could be.

The combination of guard rails, intelligent routing, specialized knowledge, contextual memory, and data-driven recommendations creates an AI assistant that truly feels like interacting with a knowledgeable barista who remembers your preferences and can guide you to your perfect cup of coffee.

This approach to AI assistant design opens up exciting possibilities beyond coffee shops—the same architecture could be applied to countless domains where specialized knowledge and personalized recommendations add value to the customer experience.

"The best AI doesn't replace human interaction—it enhances it by bringing together specialized expertise, personalization, and contextual understanding in a way that feels natural and helpful."

🛠️ Tech Stack

Backend

🐍 Python 3.10+
🧠 Google Generative AI (Gemini)
☁️ RunPod for serverless deployment
🔥 Firebase for product data
🔍 Pinecone for vector search
📊 Pandas for data processing

Frontend

⚛️ Next.js 14
📝 TypeScript
🎨 Tailwind CSS
🔄 React Context for state management
📝 ReactMarkdown for formatting

Deployment and Infrastructure

🚀 Vercel for frontend hosting
☁️ RunPod for serverless backend

☕ Building BrewBuddy: A Multi-Agent AI Coffee Shop Assistant

Combining the art of coffee with the science of AI to create the perfect digital barista experience

🚀 Future Directions

We're excited about the future of BrewBuddy and are exploring several enhancements to make the experience even more engaging and personalized.

👤 User Profiles: Storing user preferences to enhance personalization
🗣️ Voice Interface: Adding speech recognition for hands-free ordering
💳 Payment Integration: Completing the e-commerce experience
📚 Expanded Product Knowledge: Adding more detailed information about coffee origins and brewing methods
⚡ Performance Optimization: Further reducing response times through caching and model optimization

☕ Brewed with ❤️ and AI

Combining the art of coffee with the science of artificial intelligence

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
frontend		frontend
python_code		python_code
.env.local		.env.local
README.md		README.md
npm_output.log		npm_output.log
verify_fallback_image.py		verify_fallback_image.py

mohanganesh3/CoffeeShopAI-Assistant

Folders and files

Latest commit

History

Repository files navigation

☕ Building BrewBuddy: A Multi-Agent AI Coffee Shop Assistant

Live Demo

📖 Introduction: Reimagining the Coffee Shop Experience

Our Coffee Shop Inspiration

🏗️ System Architecture: The Blueprint for Intelligence

🔍 Multi-Agent Architecture Diagram

🧠 The Multi-Agent Architecture: Our Secret Ingredient

🛡️ Guard Agent: The Cybersecurity Bouncer

Technical Implementation

Under the Hood: JSON Structured Output

Error Handling and Robustness

🔄 Classification Agent: The Intelligent Router

Technical Implementation: Intent Classification

Conversation Context and Multi-Turn Classification

Performance Optimization

📚 Details Agent: The Knowledge Base

Technical Implementation: Vector Embeddings and Semantic Search

Knowledge Representation and Retrieval

Response Generation with Context

Example Interaction

🛒 Order Taking Agent: The Conversational Cashier

Technical Implementation: Stateful Conversation Management

Structured Order Data Model

Multi-Step Ordering Process

Menu Validation and Error Handling

Integration with Recommendation Engine

Example Interaction

🎯 Recommendation Agent: The Personalization Engine

Technical Implementation: Multiple Recommendation Strategies

1. Association Rule Mining with the Apriori Algorithm

2. Popularity-Based Recommendations

3. Category-Based Recommendations

Intelligent Recommendation Classification

Contextual Order-Based Recommendations

The Science Behind Apriori: How Association Rule Mining Works

Example Interaction

🎭 The Agent Controller: The Orchestra Conductor

Technical Implementation: The Pipeline Architecture

Dependency Injection and Agent Protocol

Memory Management and Context Preservation

Error Handling and System Robustness

🌟 Advanced Features: Engineering for Intelligence

🧠 Stateful Memory Management: The Key to Contextual Understanding

Technical Implementation: Agent Memory Dictionaries

Handling Complex Conversation Flows

🔍 Vector Search Technology: Beyond Keyword Matching

Technical Implementation: Embedding Generation and Similarity Search

💻 Serverless Deployment Architecture: Scaling on Demand

Technical Implementation: RunPod Serverless Functions

🧩 Structured Output Patterns: Reliable Data Extraction

Technical Implementation: JSON Output Templates

🧩 Challenges and Engineering Solutions

Challenge 1: Maintaining Conversation Context Across Agents

Solution: Unified Memory Architecture

Challenge 2: Balancing Response Time and Intelligence

Solution: Optimized Pipeline and Sequential Processing

Challenge 3: Handling Edge Cases and Unexpected Inputs

Solution: Comprehensive Error Handling and Graceful Degradation

Challenge 4: Seamless Frontend-Backend Integration

Solution: Clean API Contract and Robust State Management

Challenge 5: Ensuring Reliable Recommendation Quality

Solution: Hybrid Recommendation Strategy with Fallbacks

🎨 The Frontend Experience

🔮 Conclusion: The Future of Conversational AI

🛠️ Tech Stack

Backend

Frontend

Deployment and Infrastructure

☕ Building BrewBuddy: A Multi-Agent AI Coffee Shop Assistant

🚀 Future Directions

☕ Brewed with ❤️ and AI

About

Resources

Uh oh!

Stars

Packages