Train on your website

Crawling, page limits and keeping answers fresh

How crawling works

Chatixy starts at the URL you provide and follows internal links up to your plan’s page limit, indexing the visible text content of each page. Pages behind logins, paywalls or robots.txt blocks are skipped.

The crawl usually finishes in a few minutes for typical marketing sites; very large sites are processed in the background and your agent improves as pages are added.

Excluding pages

You can exclude paths you never want the agent to learn from — careers pages, legal archives, internal tools.

Add path patterns like /blog/archive/* in the project’s knowledge settings
Excluded pages are removed from the index on the next crawl
robots.txt disallow rules are always respected

Keeping answers fresh

Chatixy re-crawls your site automatically on a schedule that depends on your plan. You can also trigger a manual re-crawl at any time from the knowledge tab — useful right after you publish new docs or change pricing.

提示

Trigger a manual re-crawl after big site updates so the agent never quotes stale content.

重要

Removing a page from your site does not instantly remove it from the index — it disappears on the next crawl, or you can delete the source manually.

Install the widget Upload documents