Website Crawling

Use Data Sources -> Website to ingest website content.

Configuration

Set the website URL and optional page limit.

Default max pages: 100
Maximum accepted in UI: 5000

After saving, a crawl/index process runs and status updates are visible in the website dashboard.

Crawl Status Stages

Typical lifecycle:

queued
crawling
indexing
complete
error

The UI surfaces pages crawled, indexed pages, and chunk totals while the process runs.

Recommended Scope Strategy

Start with full website for initial launch.
Use nested paths for domain-specific agents.
Exclude low-value pages by disabling pages during review.

Quality Tips

Prioritize pages with definitive product, policy, and contact information.
Re-crawl after major site updates.
Use FAQ snippets for policy answers that must be exact.

Last updated on March 17, 2026

File Uploads Operations

Dukon | Docs