Question 1

What is a robots.txt file?

Accepted Answer

A robots.txt file is a text file at the root of your website (e.g., example.com/robots.txt) that tells search engine crawlers which pages they can and cannot access. It's a key part of the Robots Exclusion Protocol.

Question 2

Does robots.txt prevent pages from being indexed?

Accepted Answer

Not exactly. Robots.txt prevents crawling, not indexing. If other sites link to a blocked page, Google may still index the URL (showing it in results without a description). To prevent indexing, use a noindex meta tag instead.

Question 3

What does 'Disallow: /' mean in robots.txt?

Accepted Answer

Disallow: / blocks the entire site from being crawled by the specified user-agent. If applied to all user-agents (User-agent: *), no search engine crawler can access any page on your site.

Question 4

How does robots.txt pattern matching work?

Accepted Answer

Robots.txt uses simple pattern matching. * matches any sequence of characters, and $ anchors to the end of the URL. For example, Disallow: /*.pdf$ blocks all URLs ending in .pdf. When multiple rules match, the most specific (longest) pattern wins.

Question 5

Can I block specific bots like GPTBot?

Accepted Answer

Yes. Add a User-agent: GPTBot section with Disallow: / to block OpenAI's crawler. Similarly, you can target Googlebot, Bingbot, or any specific crawler by name. Each bot reads only the rules in its own section (or the * section as fallback).

Robots.txt Tester

URL not blocked but still not indexed?

Frequently Asked Questions

Related Guides

Ready to get your URLs indexed?