Robots.txt Tester

Question 1

What is a robots.txt file?

Answer

A robots.txt file is a text file placed at the root of a website (e.g., https://example.com/robots.txt) that tells web crawlers which parts of the site they are allowed or disallowed to access. It follows the Robots Exclusion Protocol and is the first file most crawlers request when visiting a site. It is a directive, not an enforcement mechanism — well-behaved crawlers obey it, but malicious bots may ignore it.

Question 2

Does robots.txt prevent my pages from being indexed?

Answer

No — robots.txt blocks crawling, not indexing. If you disallow a page in robots.txt but an external site links to it, Google may still index the URL (without being able to see the content). To prevent indexing, use the noindex meta tag or X-Robots-Tag HTTP header instead. Robots.txt only controls whether crawlers can access the page to read its content.

Question 3

What is the difference between Allow and Disallow?

Answer

Disallow tells crawlers which paths they cannot access. Allow explicitly permits access to a path, typically used to override a broader Disallow rule for a specific subdirectory or file. For example, you can Disallow the entire /private/ directory but Allow /private/public/ to grant selective access. When two rules conflict, the longest matching path wins (most specific rule takes precedence).

Question 4

Does the order of rules in robots.txt matter?

Answer

Order does not directly matter because the Robots Exclusion Protocol uses the longest matching path rule — the rule with the most specific (longest) path match wins, regardless of where it appears in the file. However, it is best practice to group rules by user-agent and place Allow rules after broader Disallow rules for readability and maintainability.

Question 5

What user-agents should I include in my robots.txt?

Answer

You typically need at least a wildcard (*) group for general rules, plus specific groups for bots you want to restrict differently. Common user-agents include Googlebot (Google Search), Bingbot (Bing), and purpose-specific crawlers like GPTBot/ClaudeBot (AI training), OAI-SearchBot/Claude-SearchBot (AI search), AhrefsBot/SemrushBot (SEO tools), and CCBot (dataset crawlers). Each group should be tailored to what that crawler needs or should not access.

Question 6

Can I use wildcards and patterns in robots.txt?

Answer

Yes — Googlebot and most modern search engines support basic pattern matching in robots.txt rules. An asterisk (*) matches any sequence of characters (e.g., Disallow: /*.pdf$ blocks all PDF files). The dollar sign ($) denotes the end of a URL pattern. For example, Disallow: /*?session= blocks all URLs containing a session parameter. Note that not all search engines support pattern matching, and the syntax is more limited than full regular expressions.

Question 7

How do I block AI crawlers like GPTBot and ClaudeBot?

Answer

To block AI training crawlers, add dedicated user-agent groups: User-agent: GPTBot followed by Disallow: /. You can also block Google-Extended (Google's AI training and grounding product token) the same way. Note that search-specific AI crawlers like OAI-SearchBot and Claude-SearchBot are separate from training crawlers — blocking them may remove your site from AI search results. Always test with a tool like this one to verify your rules work as expected.

Question 8

Should I block CSS, JS, or image files in robots.txt?

Answer

Generally no — search engines like Google need CSS and JavaScript files to render pages correctly for their mobile-first indexing. Blocking these can harm your search rankings because Google cannot see the page as a user would. Only block static assets if they are in a private area (e.g., /admin/). For any other case, allow crawlers to access your CSS, JS, and image files.

Question 9

What does a blank Disallow mean?

Answer

An empty Disallow rule has no blocking effect. It does not erase or cancel other rules in the group. If no other matching rule blocks the URL, crawling is allowed. To explicitly allow everything, use Allow: / instead.

Robots.txt Tester

Robots.txt Content

Test a URL Path

Test Result

Parsed Groups

How Robots.txt Works

Why Robots.txt Matters for SEO

Common Syntax Rules

Common Mistakes

Frequently Asked Questions

Suggested Workflow

Robots.txt Tester

Robots.txt Content

Test a URL Path

Test Result

Parsed Groups

How Robots.txt Works

Why Robots.txt Matters for SEO

Common Syntax Rules

Common Mistakes

Frequently Asked Questions

Suggested Workflow

Related Tools