Optimizing Your Data Sources
Get the best results from Chat Aid by understanding how different data sources are processed and following best practices for content organization.
Understanding Training Methods
Chat Aid uses two approaches to process your data sources, each optimized for different types of content:
Advanced Training™
What it is: Intelligent content processing that understands document structure, conversation context, and metadata relationships.
Benefits:
- 🎯 Context-Aware - Preserves relationships between related content
- 📊 Structure Recognition - Maintains document hierarchy and organization
- 🔄 Thread Continuity - Keeps conversations and discussions together
- 📝 Metadata Integration - Enriches content with contextual information
- ⚡ Optimized Retrieval - Creates ideal-sized chunks for accurate answers
Integrations with Advanced Training:
| Integration | What's Optimized |
|---|---|
| Confluence | Space organization, page structure, version history |
| Google Docs | Document structure, headings, formatting |
| Help Scout Docs | Article structure, collections, categories |
| Jira | Ticket structure, comments, status, priority, metadata |
| Microsoft Teams | Message threads, channel structure, subject lines |
| Notion | Page hierarchy, databases, properties, relationships |
| PDFs | Document layout, sections, page context |
| Slack | Conversation threads, context preservation, author information |
| Word Documents | Document structure, headings, formatting |
| Zendesk Help Center | Article structure, categories, sections, metadata |
Standard Training
What it is: Reliable content processing that works well for straightforward text and tabular data.
Benefits:
- ✅ Universal Compatibility - Works with any text-based content
- 📄 Straightforward Processing - Clean, efficient extraction
- 📊 Table Support - Handles spreadsheets and structured data
Integrations with Standard Training:
- Custom File Uploads (various formats)
- Web Pages (crawled content)
- Most other integrations not listed above
Both training methods produce excellent results. Advanced Training provides additional optimizations that are specific to the source data format that may allow it to provide better answers or to answer more questions than would otherwise be possible. Not all data types require Advanced Training to perform well and Standard Training is perfectly suitable for most content types.
Best Practices by Integration Type
For Communication Platforms (Slack, Teams)
✅ Do:
- Keep important conversations in dedicated channels
- Use descriptive channel names that indicate the content type
- Train on channels with substantive discussions and decisions
- Include channels where knowledge is shared (not just casual chat)
- Mark important individual messages (Slack feature)
❌ Avoid:
- Training on purely social/casual channels
- Including channels with sensitive or temporary information
- Training on channels with mostly notifications or bot messages
Why it matters: Chat Aid's Advanced Training understands conversation flow and keeps questions with their answers, making it easier to find complete context.
For Ticketing Systems (Jira)
✅ Do:
- Include projects with important technical decisions and resolutions
- Keep ticket descriptions clear and detailed
- Add meaningful comments that provide context
- Use custom fields to capture important metadata
- Keep ticket statuses and priorities up-to-date
- Link related tickets when appropriate
❌ Avoid:
- Training on test or demo projects
- Including tickets with sensitive customer data
- Training on very old, archived projects unless historically relevant
- Projects with mostly spam or invalid tickets
Why it matters: Advanced Training preserves ticket structure, keeps descriptions with their comments, and integrates metadata (status, priority, assignee) for better context understanding.
For Knowledge Bases (Notion, Confluence, Help Scout Docs, Zendesk Help Center)
✅ Do:
- Maintain clear heading hierarchy (H1 → H2 → H3)
- Use descriptive page titles and section headings
- Keep related content within the same space/collection
- Include metadata (tags, categories, status)
- Link related pages for context
- Keep pages up-to-date and archive outdated content
❌ Avoid:
- Overly long pages (consider breaking into sub-pages)
- Duplicate content across multiple pages
- Mixing unrelated topics on single pages
- Pages with only images and no text
Why it matters: Advanced Training preserves your content structure and metadata, helping Chat Aid understand the context and relationships between topics.
For Documents (PDFs, Word, Google Docs)
✅ Do:
- Use clear headings and subheadings
- Structure documents with logical sections
- Include a table of contents for longer documents
- Use descriptive file names
- Keep documents focused on specific topics
- Update documents rather than creating new versions
❌ Avoid:
- Scanned PDFs without legible text
- Documents with complex layouts that obscure text
- Image-only documents
- Password-protected files (we can't read these)
Why it matters: Advanced Training recognizes document structure, making it easier for Chat Aid to cite specific sections and maintain context.
For Spreadsheets (Excel, Google Sheets)
✅ Do:
- Use clear column headers
- Keep one type of data per sheet
- Include a description row or summary
- Use descriptive sheet names
- Keep data organized and consistent
- Remove sensitive columns before training
❌ Avoid:
- Spreadsheets with merged cells or complex formatting
- Multiple unrelated tables on one sheet
- Pivot tables (not supported)
Why it matters: Tabular data is processed row by row to maintain data integrity and relationships.
For Web Pages
✅ Do:
- Train on pages with substantial text content
- Include your main website pages (About, Product, FAQ)
- Train on blog posts and documentation
- Set appropriate crawl depth for your site structure
- Keep content fresh and updated
❌ Avoid:
- Pages behind login walls (unless using authenticated crawling)
- Pages that are heavily JavaScript-rendered
- Sites with constantly changing content
- Pages with mostly navigation elements
- Pages with collapsible sections that are collapsed by default
Why it matters: Web content is extracted as-is, so well-structured pages with clear content produce better results.
General Optimization Tips
Content Quality
-
Use Clear Language
- Write in complete sentences
- Define acronyms on first use
- Use consistent terminology
- Avoid jargon when possible
-
Structure Your Content
- Break up long text with headings
- Use bullet points for lists
- Include examples where helpful
- Add context for technical concepts
-
Keep Content Current
- Archive or remove outdated information
- Update documents rather than creating duplicates
- Use auto-retrain to keep content fresh
- Review and refresh regularly
Content Organization
-
Logical Grouping
- Group related documents in the same folders/spaces
- Use consistent naming conventions
- Organize by topic or department
- Use tags or categories where available
-
Avoid Duplication
- Don't train on multiple copies of the same content
- Link to original sources instead of copying
- Consolidate information when possible
- Use one source of truth for each topic
-
Metadata Matters
- Use descriptive titles
- Add tags and categories
- Include author information where relevant
- Add descriptions for folders/spaces
Training Configuration
Choose the Right Sources
High Priority:
- Official documentation
- Policy and procedure documents
- Product specifications
- FAQ and help content
- Customer support resources
Medium Priority:
- Team wikis
- Meeting notes with decisions
- Project documentation
- Training materials
Low Priority / Not recommended:
- Casual conversation channels
- Temporary project files
- Outdated archives
- Personal notes
Set Appropriate Auto-Retrain
- Daily: Frequently updated documentation, active channels
- Weekly: Regularly updated content, active knowledge bases
- Monthly: Stable documentation, reference materials
- Never: Archived content, one-time documents
Learn more: Auto-Retrain Setup →
Improving Answer Quality
If Answers Are Too Generic
Possible causes:
- Content is too broad or unfocused
- Multiple conflicting sources on the same topic
- Lack of specific examples or details
Solutions:
- Consolidate information into more detailed documents
- Remove or archive conflicting/outdated content
- Add specific examples and use cases
- Include more context in your content
If Answers Can't Be Found
Possible causes:
- Information not in training data
- Content is too scattered across sources
- Key terms are missing from content
- Content hasn't been indexed yet
Solutions:
- Add the missing information to your sources
- Consolidate related information
- Use terms your users actually search for
- Trigger manual retrain after adding content
If Answers Are Incorrect
Possible causes:
- Outdated content in your sources
- Conflicting information across sources
- Content is ambiguous or unclear
Solutions:
- Update or remove outdated content
- Establish a single source of truth
- Use auto-retrain to stay current
- Make content more explicit and clear