katana-web-crawling
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseKatana web crawling
Katana 网页爬取
Katana is a fast crawler/spider from ProjectDiscovery, aimed at automation pipelines (URLs in → discovered endpoints out). Official docs and flags: repository README and .
katana -hKatana 是来自ProjectDiscovery的一款快速爬虫/蜘蛛工具,专为自动化流水线设计(URL输入 → 发现端点输出)。官方文档及参数可查看仓库README和命令。
katana -hScope and ethics
范围与伦理
Use only on systems you own or are explicitly authorized to test (contract, bug bounty program rules, internal env). Crawl gently: set concurrency, rate limits, and depth to reduce load. Misuse can violate law and terms of service—you are responsible for your actions (tool ships with that warning).
仅可在你拥有或明确获权测试的系统上使用(如合同授权、漏洞赏金计划规则允许、内部环境)。温和爬取:设置并发数、速率限制和爬取深度以降低服务器负载。滥用工具可能违反法律和服务条款——你需对自身行为负责(工具自带该警告)。
Installation
安装
Go (requires Go 1.25+ per upstream; verify current README if install fails):
bash
CGO_ENABLED=1 go install github.com/projectdiscovery/katana/cmd/katana@latestDocker:
bash
docker pull projectdiscovery/katana:latest
docker run projectdiscovery/katana:latest -u https://example.comHeadless in Docker often needs and Chrome/Chromium available—see upstream Docker section.
-system-chromeGo语言安装(根据上游要求需Go 1.25+;若安装失败请查看最新README):
bash
CGO_ENABLED=1 go install github.com/projectdiscovery/katana/cmd/katana@latestDocker安装:
bash
docker pull projectdiscovery/katana:latest
docker run projectdiscovery/katana:latest -u https://example.comDocker环境下的无头模式通常需要参数,且需确保Chrome/Chromium可用——详情请查看上游Docker章节。
-system-chromeInput
输入方式
- Single/multiple URLs: or comma-separated URLs
-u https://a.com - File:
-list urls.txt - STDIN: or
echo https://example.com | katanacat domains | httpx | katana
- 单个/多个URL:或逗号分隔的URL
-u https://a.com - 文件输入:
-list urls.txt - 标准输入(STDIN):或
echo https://example.com | katanacat domains | httpx | katana
Modes
模式说明
| Mode | When |
|---|---|
| Standard (default) | Fast; uses Go HTTP client; no full JS/DOM render—may miss post-render routes |
Headless ( | Browser context; better for JS-heavy apps; optional |
Enable JS file parsing for more endpoints: (). is heavier.
-js-crawl-jc-jsluice| 模式 | 使用场景 |
|---|---|
| 标准模式(默认) | 速度快;使用Go HTTP客户端;不支持完整JS/DOM渲染——可能会错过渲染后生成的路由 |
无头模式 ( | 浏览器上下文环境;更适合重度JS依赖的应用;可选 |
启用JS文件解析可发现更多端点:(简写)。解析强度更高。
-js-crawl-jc-jsluiceFlags to know first
必知参数
| Flag | Purpose |
|---|---|
| Max crawl depth (default 3) |
| Parallel fetchers |
| Max requests per second |
| Cap total crawl time (e.g. |
| In-scope / out-of-scope URL regex |
| Disable default host scope if you need cross-host (use carefully) |
| Ignore same path with different query strings |
| Reduce near-duplicate paths |
| |
| JSONL output for scripting |
| Write to file |
| Store HTTP for review (disk use) |
| HTTP/SOCKS5 proxy |
| Extra headers (auth, cookies) via |
Run for the full list (filters, form fill, tech detect, TLS options, etc.).
katana -h| 参数 | 用途 |
|---|---|
| 最大爬取深度(默认3) |
| 并行请求数 |
| 每秒最大请求数 |
| 爬取总时长上限(例如 |
| 在范围内 / 范围外的URL正则表达式 |
| 若需跨主机爬取,禁用默认的主机范围限制(谨慎使用) |
| 忽略同一路径下不同查询字符串的请求 |
| 减少近似重复的路径 |
| 爬取 |
| 输出JSONL格式以便脚本处理 |
| 将结果写入文件 |
| 存储HTTP响应以供后续查看(会占用磁盘空间) |
| HTTP/SOCKS5代理 |
| 通过 |
运行查看完整参数列表(包括过滤器、表单填充、技术检测、TLS选项等)。
katana -hMinimal examples
极简示例
bash
katana -u https://example.com -d 2 -silentbash
katana -u https://example.com -jsonl -o endpoints.jsonlbash
katana -list seeds.txt -d 3 -cs '.*\.example\.com.*' -rl 30 -jsonlHeadless (JS-heavy target):
bash
katana -u https://example.com -headless -d 2bash
katana -u https://example.com -d 2 -silentbash
katana -u https://example.com -jsonl -o endpoints.jsonlbash
katana -list seeds.txt -d 3 -cs '.*\.example\.com.*' -rl 30 -jsonl无头模式(针对重度JS依赖目标):
bash
katana -u https://example.com -headless -d 2Pipelines
流水线组合
Common pattern: resolve live HTTP first, then crawl:
bash
cat domains.txt | httpx -silent | katana -jsonl -o crawl.jsonlCombine with other PD tools (naabu, nuclei, etc.) only in authorized assessments.
常见流程:先解析存活HTTP服务,再进行爬取:
bash
cat domains.txt | httpx -silent | katana -jsonl -o crawl.jsonl仅在获权的评估中,才可与其他PD工具(如naabu、nuclei等)组合使用。
Troubleshooting
故障排查
- required for go install per README.
CGO_ENABLED=1 - Headless failures: try , ensure Chrome/Chromium installed, or use Docker image with documented Chrome setup.
-system-chrome - Health check: /
-health-check.-hc
- Go语言安装时必须设置,详情见README。
CGO_ENABLED=1 - 无头模式失败:尝试添加参数,确保Chrome/Chromium已安装,或使用带有Chrome配置的Docker镜像。
-system-chrome - 健康检查:使用/
-health-check参数。-hc
References
参考链接
- Source and releases: github.com/projectdiscovery/katana