Loading...
Loading...
When the user wants to configure, audit, or optimize robots.txt. Also use when the user mentions "robots.txt," "crawler rules," "block crawlers," "AI crawlers," "GPTBot," "allow/disallow," "disallow path," "crawl directives," "user-agent," "block Googlebot," "fix robots.txt," "robots.txt blocking," or "search engine crawling."
npx skill4agent add kostja94/marketing-skills robots-txt.claude/product-marketing-context.md.cursor/product-marketing-context.mdhttps://example.com| Point | Note |
|---|---|
| Purpose | Controls crawler access; does NOT prevent indexing (disallowed URLs may still appear in search without snippet) |
| No-index | Use noindex meta or auth for sensitive content; robots.txt is publicly readable |
| Indexed vs non-indexed | Not all content should be indexed. robots.txt and noindex complement each other: robots for path-level crawl control, noindex for page-level indexing. See indexing |
| Advisory | Rules are advisory; malicious crawlers may ignore |
| Item | Requirement |
|---|---|
| Path | Site root: |
| Encoding | UTF-8 plain text |
| Standard | RFC 9309 (Robots Exclusion Protocol) |
| Directive | Purpose | Example |
|---|---|---|
| Target crawler | |
| Block path prefix | |
| Allow path (can override Disallow) | |
| Declare sitemap absolute URL | |
| Strip query params (Yandex) | See below |
| User-agent | Purpose | Typical |
|---|---|---|
| OAI-SearchBot | ChatGPT search | Allow |
| GPTBot | OpenAI training | Disallow |
| Claude-SearchBot | Claude search | Allow |
| ClaudeBot | Anthropic training | Disallow |
| PerplexityBot | Perplexity search | Allow |
| Google-Extended | Gemini training | Disallow |
| CCBot | Common Crawl | Disallow |
Clean-param: utm_source&utm_medium&utm_campaign&utm_term&utm_content&ref&fbclid&gclid