Does Google actually follow robots.txt?

Yes — Googlebot honours it strictly. But "honouring it" only means Google won't crawl those URLs. They can still index a URL it learns about via external links, just without crawling it (you'll see "no information available for this page" in results). To truly hide a page, use noindex meta tags or HTTP auth.

What's the difference between robots.txt and meta noindex?

robots.txt says "don't crawl this." Meta noindex says "don't list this in search results." If you want a page kept out of search, use noindex (not Disallow), because Disallow blocks Google from seeing the noindex tag.

How does this handle wildcards and $?

The tool supports * (any characters) and $ (end of URL anchor), which are the standard extensions Google supports. It doesn't support every weird edge case — for very complex rules, also test in Google Search Console's robots.txt tester.

How is "most specific user agent" determined?

By substring match against the user-agent string. Googlebot matches a group with User-agent: Googlebot . Googlebot-Image matches both Googlebot-Image (more specific) and Googlebot (less specific) — the more specific one wins.

Robots.txt Tester

Test URLs against a robots.txt file — does Googlebot see this page? Supports wildcards, $ anchors, and user-agent matching.

Runs entirely in your browser. Nothing is sent to our servers.

About this tool

Tests URLs against a robots.txt file to predict whether a given crawler will be allowed to fetch them. Useful when you've just edited robots.txt and want to confirm it does what you think before deploying — or when you're diagnosing why a search engine isn't crawling something it should.

How robots.txt rules work

robots.txt is grouped by User-agent. Each group lists Disallow: and Allow: rules. When a bot fetches a URL, it picks the most specific user-agent group that matches its name (falling back to User-agent: * if nothing else matches), then applies the rules in that group.

Within a group, the rule with the longest matching path wins — that's why Allow: /admin/public.html overrides Disallow: /admin/ for that one URL. If two rules match with the same length, Allow wins as a tiebreaker.

Common gotchas

Disallow: with an empty value means "allow everything" (it's a literal empty path that doesn't match anything).
Disallow: / blocks the entire site.
Paths are case-sensitive (/Admin/ ≠ /admin/).
Wildcards: * matches any sequence of characters, $ anchors to end of URL. So Disallow: /*.pdf$ blocks all PDFs.
robots.txt is advisory — well-behaved crawlers honour it, malicious ones ignore it.

Frequently asked questions

Does Google actually follow robots.txt?: Yes — Googlebot honours it strictly. But "honouring it" only means Google won't crawl those URLs. They can still index a URL it learns about via external links, just without crawling it (you'll see "no information available for this page" in results). To truly hide a page, use noindex meta tags or HTTP auth.
What's the difference between robots.txt and meta noindex?: robots.txt says "don't crawl this." Meta noindex says "don't list this in search results." If you want a page kept out of search, use noindex (not Disallow), because Disallow blocks Google from seeing the noindex tag.
How does this handle wildcards and $?: The tool supports * (any characters) and $ (end of URL anchor), which are the standard extensions Google supports. It doesn't support every weird edge case — for very complex rules, also test in Google Search Console's robots.txt tester.
How is "most specific user agent" determined?: By substring match against the user-agent string. Googlebot matches a group with User-agent: Googlebot. Googlebot-Image matches both Googlebot-Image (more specific) and Googlebot (less specific) — the more specific one wins.

Last updated: May 17, 2026

About this tool

How robots.txt rules work

Common gotchas

Frequently asked questions

Related tools