Home » Articles » SEO » Essential robots.txt rules to control bots on your website

Robots.txt is a powerful tool that acts as a gatekeeper for your website, telling search engines and other bots which parts of your site they can access and which they should avoid. Whether you’re looking to solve technical issues or simply want to keep certain content private, understanding how to craft effective robots.txt rules is essential for any website owner.

What can you do with robots.txt?

Robots.txt is incredibly versatile. You can create simple rules or complex instructions targeting specific URL patterns. Here’s what you can achieve:

Target multiple bots with the same rule

Rule template: List multiple user-agents followed by the disallow rule.

Example:

user-agent: [first bot name]
user-agent: [second bot name]
disallow: [path to restrict]

For instance, if you want to keep both GoogleBot and BingBot away from your search results pages, you could write:

user-agent: googlebot
user-agent: bingbot
disallow: /search-results/

Block specific file types

Rule template: Specify a user-agent and use a wildcard to block file extensions.

Example:

user-agent: [bot name]
disallow: *.[file extension]$

If you wanted to prevent all bots from accessing your PDF documents, you might use:

user-agent: *
disallow: *.pdf$

Allow crawling of some areas while restricting others

Rule template: Use allow and disallow in sequence for the same bot.

Example:

user-agent: [bot name]
allow: [parent directory]/
disallow: [parent directory]/[subdirectory]/

For a website with public articles but private drafts:

user-agent: *
allow: /articles/
disallow: /articles/drafts/

Block specific bots while allowing others

Rule template: Create a general rule for all bots, then specific rules for exceptions.

Example:

user-agent: *
allow: /

user-agent: [specific bot to restrict]
disallow: /
allow: [limited access path]

To block an AI training bot while allowing search engines:

user-agent: *
allow: /

user-agent: ai-training-bot
disallow: /
allow: /$

Add comments for clarity

Rule template: Use the # symbol to add notes.

Example:

# [your comment here]
user-agent: [bot name]
disallow: [path]

For personal reference:

# Blocking access to our upcoming product pages until launch
user-agent: *
disallow: /products/upcoming/

Useful robots.txt rules for website owners

Blocking your entire site from all crawlers

Rule template:

User-agent: *
Disallow: /

This tells all bots not to crawl any page on your site. Remember this doesn’t necessarily prevent indexing, just crawling.

Restricting access to specific directories

Rule template:

User-agent: [bot name]
Disallow: /[directory name]/

For example, to keep all bots out of your admin area:

User-agent: *
Disallow: /admin/

Remember that robots.txt isn’t for securing private content—it’s publicly visible and merely a request, not a strict barrier.

Allowing access to specific crawlers only

Rule template:

User-agent: [allowed bot]
Allow: /

User-agent: *
Disallow: /

If you want only Google News to access your site:

User-agent: googlebot-news
Allow: /

User-agent: *
Disallow: /

Blocking a single crawler

Rule template:

User-agent: [bot to block]
Disallow: /

User-agent: *
Allow: /

To block just one problematic bot:

User-agent: aggressive-crawler
Disallow: /

User-agent: *
Allow: /

Blocking specific pages

Rule template:

User-agent: [bot name]
Disallow: /[filename.html]

If you have a temporary page you don’t want indexed:

User-agent: *
Disallow: /temporary-promotion.html

Allowing access to only one directory

Rule template:

User-agent: [bot name]
Disallow: /
Allow: /[public directory]/

For a site under development with only a press area public:

User-agent: *
Disallow: /
Allow: /press-releases/

Managing image crawling

Rule template for blocking a specific image:

User-agent: [image bot]
Disallow: /[path to image]

Rule template for blocking all images:

User-agent: [image bot]
Disallow: /

To prevent Google from indexing product prototype images:

User-agent: googlebot-image
Disallow: /images/prototypes/

Blocking specific file types

Rule template:

User-agent: [bot name]
Disallow: /*.[file extension]$

To prevent all bots from crawling your spreadsheets:

User-agent: *
Disallow: /*.xlsx$

Allowing ad bots while blocking other crawlers

Rule template:

User-agent: *
Disallow: /

User-agent: [ad bot]
Allow: /

For a private site that still needs ad analysis:

User-agent: *
Disallow: /

User-agent: mediapartners-google
Allow: /

Conclusion

Robots.txt is a simple yet powerful tool for managing how bots interact with your website. By implementing the right rules, you can control which parts of your site are crawled, by which bots, and under what circumstances. While robots.txt can help manage bot traffic, it shouldn’t be used as a security measure for sensitive content. With these examples and guidelines, you can create an effective robots.txt file tailored to your website’s specific needs. If you need help with this for your site, contact Kahunam for a consultation.

Wave

Enjoy our articles? Join our free list and get more.

Sign Up

Book Discovery Call