Supported Formats¶
Complete reference of file types you can upload to your knowledge bases.
Overview¶
Functional AI supports 21 file formats across documents, code, and data. All files are processed to extract text content for semantic search.
Key principle: If it contains extractable text, it can be used.
Images Not Processed
Images, charts, and diagrams within documents are not processed. Only text content is extracted and indexed.
Document Formats¶
PDF Files (.pdf)¶
| Aspect | Details |
|---|---|
| Best for | Manuals, reports, documentation, whitepapers |
| Text extraction | Excellent for text-based PDFs, poor for scanned images |
| Max size | 500 MB |
PDF Quality
Use PDFs with selectable text rather than scanned images. If you have scanned documents, run OCR first.
PDF Optimization & Issues
Optimization tips: - Remove cover pages and blank pages - Compress images to reduce file size - Convert image-heavy PDFs to text-only versions - For scanned documents, use OCR tools - Test if text is selectable by trying to highlight it
Common issues: - Scanned PDFs appear as images - no text extracted - Password-protected PDFs cannot be processed - Very large PDFs (100+ MB) take significant time to process
Microsoft Word (.doc, .docx)¶
| Aspect | Details |
|---|---|
| Best for | Business documents, policies, guides, procedures |
| Text extraction | Excellent |
| Max size | 500 MB |
Word Optimization Tips
- Use clear headings (Heading 1, Heading 2, etc.)
- Avoid text boxes and complex layouts
- Remove unnecessary images
- Use tables for structured data (converted to text)
- Save as .docx (more reliable than old .doc format)
Plain Text (.txt)¶
| Aspect | Details |
|---|---|
| Best for | Simple content, logs, notes, transcripts |
| Text extraction | Perfect (already plain text) |
| Max size | 500 MB |
Markdown (.md)¶
| Aspect | Details |
|---|---|
| Best for | Technical documentation, README files, knowledge bases |
| Text extraction | Excellent (headings preserved as text) |
| Max size | 500 MB |
Why Markdown Works Great
Clean structure with headings makes chunking more effective. No complex formatting to interfere with text extraction.
Presentation Formats¶
PowerPoint (.pptx)¶
| Aspect | Details |
|---|---|
| Best for | Slide content, training materials, presentations |
| Text extraction | Good (text from slides) |
| Max size | 500 MB |
PowerPoint: What Gets Extracted
Extracted: - Slide titles and body text - Bullet points and lists - Table content
Not extracted: - Images and diagrams - Charts and graphs - Embedded videos - Animations
Optimization tips: - Ensure critical information is in text, not images - Use slide titles effectively (they help with chunking) - Consider exporting to PDF for more control
Data Formats¶
JSON (.json)¶
| Aspect | Details |
|---|---|
| Best for | Structured data, product catalogs, configurations, API responses |
| Text extraction | Excellent (preserves structure as text) |
| Max size | 500 MB |
Best use cases: - Product catalogs with descriptions - FAQ data in structured format - Configuration documentation
JSON Optimization
Tips for better results: - Keep nesting reasonable (3-4 levels max) - Use descriptive key names - Include text descriptions alongside data values
Example - Good JSON structure:
Code Files¶
Code files are useful for technical documentation, API references, or coding assistants.
| Language | Extensions | Best For |
|---|---|---|
| Python | .py |
Python modules, API implementations |
| JavaScript/TypeScript | .js, .ts |
Frontend code, Node.js modules |
| HTML | .html |
Web pages, email templates |
| CSS | .css |
Style documentation, design systems |
| Java | .java |
Java API documentation |
| C/C++ | .c, .cpp |
Systems programming reference |
| C# | .cs |
.NET application code |
| Ruby | .rb |
Rails applications |
| PHP | .php |
PHP application code |
| Go | .go |
Go service documentation |
| Shell | .sh |
Bash scripts, automation docs |
All code files: Max size 500 MB, perfect text extraction
Code File Best Practices
- Include comprehensive comments and docstrings
- Remove credentials and secrets
- Focus on well-documented, exemplary code
- Consider if Markdown documentation might be clearer
Other Formats¶
LaTeX (.tex)¶
Best for academic papers and technical documents. Max size: 500 MB.
Complete File Type List¶
| Category | Supported Extensions |
|---|---|
| Documents | .pdf, .doc, .docx, .txt, .md, .pptx |
| Data | .json |
| Code | .py, .js, .ts, .html, .css |
| Code (additional) | .java, .c, .cpp, .cs, .rb, .php, .go, .sh |
| Other | .tex |
Format Recommendations¶
Best Formats for RAG¶
| Use Case | Recommended Format |
|---|---|
| Product documentation | PDF, Markdown |
| FAQs | Markdown, Text |
| Policies | PDF, Word |
| Technical docs | Markdown |
| Data/catalogs | JSON |
| Code reference | Native code files |
Formats to Avoid¶
| Format | Issue | Alternative |
|---|---|---|
| Scanned PDFs | No text extraction | Use OCR first |
| Image files | Not processed | Convert to text |
| Video/Audio | Not supported | Provide transcripts |
| Spreadsheets (CSV, XLSX) | Not supported | Convert to JSON |
| Password-protected | Cannot read | Remove protection |
| Very large files | Slow processing | Split into sections |
File Size Optimization¶
PDF Optimization
Tools: Adobe Acrobat, SmallPDF, iLovePDF
Techniques: 1. Compress images: Reduce image quality/resolution 2. Remove pages: Delete cover, blank, and unnecessary pages 3. Remove metadata: Author info, comments, etc. 4. Convert to grayscale: If color isn't essential
Example: A 50 MB PDF manual can often be reduced to 5-10 MB without losing text quality.
When to Split Files
Instead of one massive file, consider splitting when:
- Single file exceeds 50 MB (easier to manage)
- Content covers distinct topics
- Updates affect only portions
Example: Instead of "Complete Product Manual.pdf" (100 MB), split into: - "Installation Guide.pdf" (5 MB) - "User Manual.pdf" (20 MB) - "Troubleshooting.pdf" (10 MB) - "Technical Specifications.pdf" (8 MB)