Skip to Content
Knowledge BaseWebsite Crawling

Website Crawling

Use Data Sources -> Website to ingest website content.

Configuration

Set the website URL and optional page limit.

  • Default max pages: 100
  • Maximum accepted in UI: 5000

After saving, a crawl/index process runs and status updates are visible in the website dashboard.

Crawl Status Stages

Typical lifecycle:

  • queued
  • crawling
  • indexing
  • complete
  • error

The UI surfaces pages crawled, indexed pages, and chunk totals while the process runs.

  • Start with full website for initial launch.
  • Use nested paths for domain-specific agents.
  • Exclude low-value pages by disabling pages during review.

Quality Tips

  • Prioritize pages with definitive product, policy, and contact information.
  • Re-crawl after major site updates.
  • Use FAQ snippets for policy answers that must be exact.
Last updated on
Dukon | Docs