Robots.txt Generator
Build a valid robots.txt file with presets for AI crawlers and CMS platforms.
User-agent: * Allow: /
What Is robots.txt?
The robots.txt file is a simple text file placed at the root of your website that tells search engine crawlers which pages or sections they can or cannot access. It follows the Robots Exclusion Protocol, a standard that's been used since 1994. Every major search engine β Google, Bing, Yahoo, DuckDuckGo β respects robots.txt directives, making it a fundamental tool for managing how your site is crawled and indexed.
robots.txt vs Meta Robots Tag
robots.txt controls crawling at the URL level β it prevents crawlers from even accessing specified pages. The meta robots tag (noindex, nofollow) controls indexing β it tells search engines not to include a page in search results even if they've already crawled it. Use robots.txt to save crawl budget by blocking unimportant sections. Use meta robots to prevent indexing of specific pages you want crawled but not listed.
Blocking AI Crawlers
With the rise of AI training on web content, many site owners want to block AI crawlers. GPTBot (OpenAI), CCBot (Common Crawl), anthropic-ai (Anthropic), and Google-Extended (Google AI training) can be blocked via robots.txt. However, blocking these crawlers is a trade-off β it prevents your content from being used in AI training but may reduce your visibility in AI-powered search features.
Common Mistakes
The most common robots.txt mistakes include accidentally blocking CSS and JavaScript files (which prevents Google from rendering your pages), blocking entire sections that contain valuable content, using incorrect syntax (rules are case-sensitive for paths), and forgetting to include a sitemap declaration. Always test your robots.txt using Google Search Console's robots.txt tester before deploying changes.
Sitemap Declaration
Including a Sitemap directive in robots.txt is an easy way to help search engines discover your sitemap. Simply add "Sitemap: https://yourdomain.com/sitemap.xml" at the end of the file. This isn't a substitute for submitting your sitemap through Google Search Console, but it provides an additional discovery mechanism for all compliant crawlers.