Website Crawling
Use Data Sources -> Website to ingest website content.
Configuration
Set the website URL and optional page limit.
- Default max pages:
100 - Maximum accepted in UI:
5000
After saving, a crawl/index process runs and status updates are visible in the website dashboard.
Crawl Status Stages
Typical lifecycle:
queuedcrawlingindexingcompleteerror
The UI surfaces pages crawled, indexed pages, and chunk totals while the process runs.
Recommended Scope Strategy
- Start with full website for initial launch.
- Use nested paths for domain-specific agents.
- Exclude low-value pages by disabling pages during review.
Quality Tips
- Prioritize pages with definitive product, policy, and contact information.
- Re-crawl after major site updates.
- Use FAQ snippets for policy answers that must be exact.
Last updated on