Web Crawl Tool
Last updated: Jan 2026
Overview
The Web Crawl tool retrieves content from specific URLs or crawls multiple pages from a website by following links. Ideal for analyzing competitors, gathering product information, or monitoring website content.
Powered by Tavily
Web crawling is powered by Tavily's crawl API, optimized for AI applications. No API key configuration required - it works out of the box.How It Works
When you provide a URL in your prompt and enable the Web Crawl tool, the AI can visit that page and optionally follow links to gather more content.
Enable the Web Crawl Tool
Add the Web Crawl tool to your LLM step from the tool configuration panel.
Provide the Target URL
Include the URL you want to crawl in your prompt. Specify if you want to follow links or just analyze the single page.
AI Processes the Content
The AI receives the page content and uses it to complete your task.
Visit https://example.com/products and extract information about
all their product offerings. Include prices, features, and availability.Parameters
Configure these parameters in the tool settings panel to control how the crawler navigates and extracts content from websites.
URL Settings
| Parameter | Description | Default |
|---|---|---|
url | Override the base URL to crawl. Leave empty to use the LLM-provided URL. | - |
instructions | Instructions for the crawler (what to look for, focus areas). Helps guide which pages are most relevant. | - |
Crawl Limits
Control how far and wide the crawler goes from the starting URL.
| Parameter | Description | Default |
|---|---|---|
max_depth | Maximum depth to crawl from the starting URL (1-10). Depth 1 means only the starting page and its direct links. | 2 |
max_breadth | Maximum number of pages per level (1-50). Controls how many links to follow at each depth level. | 10 |
limit | Maximum total pages to crawl (1-100). Hard limit across all depth levels. | 10 |
Path Filters
Include or exclude specific paths to focus the crawl on relevant content.
| Parameter | Description | Example |
|---|---|---|
select_paths | Only crawl pages matching these paths | /docs, /api |
exclude_paths | Skip pages matching these paths | /blog, /archive |
Domain Filters
Control which domains the crawler can access.
| Parameter | Description | Default |
|---|---|---|
select_domains | Only crawl within these domains | - |
exclude_domains | Exclude these domains from crawling | - |
allow_external | Follow links to external domains | false |
Content Options
Control how content is extracted from crawled pages.
| Parameter | Description | Default |
|---|---|---|
extract_depth | Depth of content extraction per page. Basic extracts main content, Advanced includes more details. | basic |
format | Output format: "markdown" preserves formatting, "text" returns plain text. | markdown |
include_images | Include images from crawled pages | false |
include_favicon | Include favicon URLs for crawled pages | false |
categories | Filter pages by content categories | - |
Efficient Crawling
Use path and domain filters to focus your crawl on relevant sections. This improves speed, reduces costs, and produces more relevant results.Crawl Types
The Web Crawl tool supports two crawling modes depending on your needs.
Single Page Crawl
Retrieve content from a single URL. Fast and efficient for analyzing specific pages like a product page, blog post, or documentation page.
Multi-Page Crawl
Follow links from a starting URL to gather content from multiple related pages. Useful for comprehensive site analysis or documentation gathering.
Rate Limiting
Web crawling is rate-limited to prevent abuse. For large-scale crawling needs, break your task into multiple workflow runs or focus on specific sections of a site.Use Cases
Common scenarios where Web Crawl excels:
| Use Case | Description |
|---|---|
| Competitor Analysis | Analyze competitor websites for pricing, features, and positioning |
| Product Research | Gather product specifications, reviews, and availability from e-commerce sites |
| Documentation Synthesis | Crawl technical documentation to create summaries or answer questions |
| Content Monitoring | Track changes on websites for news, pricing updates, or content changes |
| Data Collection | Gather structured data from websites for analysis or reporting |
Best Practices
Follow these guidelines for effective web crawling:
- Provide specific URLs when you know the exact pages you need
- Limit crawl scope to relevant sections of a site
- Use single page crawl for focused analysis
- Specify what information you want to extract from the pages
- Consider using Content Extract for cleaner results on single pages
When to Use Web Crawl vs. Other Tools
| Tool | Best For |
|---|---|
| Web Search | Finding pages on a topic when you don't have specific URLs |
| Web Crawl | Gathering content from known URLs, following links across a site |
| Content Extract | Getting clean, structured content from a single specific page |
Key Takeaways
- Web Crawl visits URLs and optionally follows links
- No configuration required - works automatically
- Best for site analysis, competitor research, and documentation
- Rate-limited to prevent abuse - scope crawls appropriately
- Use Content Extract for cleaner single-page results