Knowledge Bases¶
Knowledge bases (also called "Stores") allow your assistants to access your specific documents and data, enabling accurate, contextual responses using Retrieval-Augmented Generation (RAG).
What Are Knowledge Bases?¶
A knowledge base is a collection of documents that your assistant can search and reference when responding to users. This enables:
- Accurate answers based on your actual documentation
- Up-to-date information without retraining the AI
- Specific knowledge about your products, services, or domain
How It Works¶
When a user asks a question, the system searches your documents for relevant information and uses it to generate accurate, contextual responses.
The Process:
- Upload - You upload documents to a knowledge base
- Process - Documents are chunked and converted to embeddings for search
- Connect - Link the knowledge base to your assistant
- Query - User asks a question
- Retrieve - System finds the most relevant document sections
- Generate - Assistant crafts an answer using your specific information
Advanced: How RAG Works
Retrieval-Augmented Generation (RAG) combines semantic search with AI generation:
- Documents are split into chunks (manageable sections)
- Each chunk becomes a vector embedding (numerical representation capturing meaning)
- Embeddings are stored in OpenAI's vector store
- When users ask questions, their query is also embedded
- The system finds chunks with similar embeddings (semantic similarity)
- Top matching chunks provide context for the AI's response
Why semantic search matters: A search for "refund policy" retrieves chunks about "returns," "money back guarantee," or "cancellation procedures" because they're semantically similar - not just keyword matches.
What You Can Do¶
| Task | Description |
|---|---|
| Create Stores | Set up new knowledge bases |
| Upload Files | Add documents to your stores |
| Supported Formats | See what file types you can upload |
| Connect to Assistants | Link stores to your chatbots |
Storage Limits by Tier¶
| Tier | Max Stores | Storage per Store |
|---|---|---|
| Basic | 1 | 5 MB |
| Medium | 2 | 30 MB |
| Pro | 5 | 150 MB |
See Tier Limits for complete details.
Best Practices¶
What to Upload¶
Good candidates: - Product documentation and manuals - FAQ documents - Policy documents (returns, privacy, terms) - Training materials and guides - Technical specifications
Avoid: - Image-heavy documents (images aren't processed) - Scanned PDFs without OCR - Password-protected files - Raw data exports without context
Content Organization Strategy¶
Use Multiple Stores When:
- You have distinct topic areas (products vs. policies vs. support)
- Different assistants need different knowledge
- Content has different update cycles
- You want to control which information is available to which assistant
Use One Store When:
- All content is related to a single domain
- Your assistant needs access to everything
- You're at your store limit (Basic tier: 1 store, Medium: 2 stores)
- Content is interconnected and often referenced together
Example Structure:
For an e-commerce business:
Store 1: "Product Catalog" (50 MB)
- Product descriptions, specs, features
- Updated frequently
Store 2: "Customer Support" (15 MB)
- FAQs, troubleshooting guides
- Shipping and returns policies
- Updated occasionally
Store 3: "Company Info" (5 MB)
- About us, brand story, values
- Rarely updated
Document Preparation¶
Do: - Use clear headings and section breaks - Write in complete sentences with context - Use descriptive file names - Remove duplicate content
Don't: - Rely on images for critical information - Use excessive formatting that obscures text - Upload multiple versions of the same document
Common Use Cases¶
E-commerce Customer Support¶
Scenario: You run an online store and want your chatbot to answer product and policy questions.
Recommended setup:
Store 1: "Product Information" (30 MB)
- Product descriptions and specifications
- Size guides and measurement charts
- Care instructions
Store 2: "Policies & Shipping" (10 MB)
- Return and exchange policy
- Shipping information
- Warranty terms
Store 3: "FAQ" (5 MB)
- Common questions and answers
- Troubleshooting guides
Why this works: Separates frequently-updated products from stable policies, easy to maintain.
SaaS Product Documentation¶
Scenario: You provide a software product and want to help users learn how to use it.
Recommended setup:
Store 1: "Getting Started" (15 MB)
- Onboarding guides
- Quick start tutorials
- Basic concepts
Store 2: "Feature Documentation" (50 MB)
- Detailed feature guides
- Advanced usage
- Configuration options
Store 3: "API Reference" (20 MB)
- API documentation
- Code examples
- Integration guides
Why this works: Organizes by user journey - beginners get basics, advanced users get detailed docs.
Professional Services¶
Scenario: You offer consulting or services and want to automate client FAQs.
Recommended setup:
Store 1: "Services & Pricing" (8 MB)
- Service descriptions
- Pricing packages
- Engagement process
Store 2: "Case Studies & Examples" (25 MB)
- Client success stories
- Project examples
- Testimonials
Store 3: "Resources" (12 MB)
- Whitepapers
- Industry insights
- Best practices guides
Why this works: Helps prospects understand services while showcasing expertise.
Testing Your Knowledge Base¶
After uploading content, test your assistant to ensure it uses the information correctly.
Create a test plan:
- Baseline questions: Ask about content you know is in your docs
- Variation testing: Ask the same question different ways
- Edge cases: Questions partially covered or not covered
Example test suite for e-commerce:
Baseline:
- "What is your return policy?"
- "How long does shipping take?"
Variations:
- "Can I return this if I don't like it?"
- "How do I send something back?"
Edge cases:
- "Can I return a sale item?"
- "What if my item arrives damaged?"
Advanced: Optimizing for Chunking
Documents are automatically split into chunks. You can optimize results by:
Good for chunking: - Clear section breaks with headings - Paragraphs of reasonable length (3-8 sentences) - Logical content flow
Bad for chunking: - Very long paragraphs without breaks - Important info split across distant sections - Dense walls of text