Train on your website
Crawling, page limits and keeping answers fresh
How crawling works
Chatixy starts at the URL you provide and follows internal links up to your plan’s page limit, indexing the visible text content of each page. Pages behind logins, paywalls or robots.txt blocks are skipped.
The crawl usually finishes in a few minutes for typical marketing sites; very large sites are processed in the background and your agent improves as pages are added.
Excluding pages
You can exclude paths you never want the agent to learn from — careers pages, legal archives, internal tools.
- Add path patterns like
/blog/archive/*in the project’s knowledge settings - Excluded pages are removed from the index on the next crawl
robots.txtdisallow rules are always respected
Keeping answers fresh
Chatixy re-crawls your site automatically on a schedule that depends on your plan. You can also trigger a manual re-crawl at any time from the knowledge tab — useful right after you publish new docs or change pricing.
提示
- Trigger a manual re-crawl after big site updates so the agent never quotes stale content.
重要
- Removing a page from your site does not instantly remove it from the index — it disappears on the next crawl, or you can delete the source manually.