Essential robots.txt rules to control bots on your website

Unlimited technical & Analytics support for website owners - Starting from $1,240 per month

Home » Articles » SEO » Essential robots.txt rules to control bots on your website

Robots.txt is a powerful tool that acts as a gatekeeper for your website, telling search engines and other bots which parts of your site they can access and which they should avoid. Whether you’re looking to solve technical issues or simply want to keep certain content private, understanding how to craft effective robots.txt rules is essential for any website owner.

What can you do with robots.txt?

Robots.txt is incredibly versatile. You can create simple rules or complex instructions targeting specific URL patterns. Here’s what you can achieve:

Target multiple bots with the same rule

Rule template: List multiple user-agents followed by the disallow rule.

Example:

user-agent: [first bot name]
user-agent: [second bot name]
disallow: [path to restrict]

For instance, if you want to keep both GoogleBot and BingBot away from your search results pages, you could write:

user-agent: googlebot
user-agent: bingbot
disallow: /search-results/

Block specific file types

Rule template: Specify a user-agent and use a wildcard to block file extensions.

Example:

user-agent: [bot name]
disallow: *.[file extension]$

If you wanted to prevent all bots from accessing your PDF documents, you might use:

user-agent: *
disallow: *.pdf$

Allow crawling of some areas while restricting others

Rule template: Use allow and disallow in sequence for the same bot.

Example:

user-agent: [bot name]
allow: [parent directory]/
disallow: [parent directory]/[subdirectory]/

For a website with public articles but private drafts:

user-agent: *
allow: /articles/
disallow: /articles/drafts/

Block specific bots while allowing others

Rule template: Create a general rule for all bots, then specific rules for exceptions.

Example:

user-agent: *
allow: /

user-agent: [specific bot to restrict]
disallow: /
allow: [limited access path]

To block an AI training bot while allowing search engines:

user-agent: *
allow: /

user-agent: ai-training-bot
disallow: /
allow: /$

Add comments for clarity

Rule template: Use the # symbol to add notes.

Example:

# [your comment here]
user-agent: [bot name]
disallow: [path]

For personal reference:

# Blocking access to our upcoming product pages until launch
user-agent: *
disallow: /products/upcoming/

Useful robots.txt rules for website owners

Blocking your entire site from all crawlers

Rule template:

User-agent: *
Disallow: /

This tells all bots not to crawl any page on your site. Remember this doesn’t necessarily prevent indexing, just crawling.

Restricting access to specific directories

Rule template:

User-agent: [bot name]
Disallow: /[directory name]/

For example, to keep all bots out of your admin area:

User-agent: *
Disallow: /admin/

Remember that robots.txt isn’t for securing private content—it’s publicly visible and merely a request, not a strict barrier.

Allowing access to specific crawlers only

Rule template:

User-agent: [allowed bot]
Allow: /

User-agent: *
Disallow: /

If you want only Google News to access your site:

User-agent: googlebot-news
Allow: /

User-agent: *
Disallow: /

Blocking a single crawler

Rule template:

User-agent: [bot to block]
Disallow: /

User-agent: *
Allow: /

To block just one problematic bot:

User-agent: aggressive-crawler
Disallow: /

User-agent: *
Allow: /

Blocking specific pages

Rule template:

User-agent: [bot name]
Disallow: /[filename.html]

If you have a temporary page you don’t want indexed:

User-agent: *
Disallow: /temporary-promotion.html

Allowing access to only one directory

Rule template:

User-agent: [bot name]
Disallow: /
Allow: /[public directory]/

For a site under development with only a press area public:

User-agent: *
Disallow: /
Allow: /press-releases/

Managing image crawling

Rule template for blocking a specific image:

User-agent: [image bot]
Disallow: /[path to image]

Rule template for blocking all images:

User-agent: [image bot]
Disallow: /

To prevent Google from indexing product prototype images:

User-agent: googlebot-image
Disallow: /images/prototypes/

Blocking specific file types

Rule template:

User-agent: [bot name]
Disallow: /*.[file extension]$

To prevent all bots from crawling your spreadsheets:

User-agent: *
Disallow: /*.xlsx$

Allowing ad bots while blocking other crawlers

Rule template:

User-agent: *
Disallow: /

User-agent: [ad bot]
Allow: /

For a private site that still needs ad analysis:

User-agent: *
Disallow: /

User-agent: mediapartners-google
Allow: /

Conclusion

Robots.txt is a simple yet powerful tool for managing how bots interact with your website. By implementing the right rules, you can control which parts of your site are crawled, by which bots, and under what circumstances. While robots.txt can help manage bot traffic, it shouldn’t be used as a security measure for sensitive content. With these examples and guidelines, you can create an effective robots.txt file tailored to your website’s specific needs. If you need help with this for your site, contact Kahunam for a consultation.

Author

Brian Nguyen

Brian is a marketing ninja that writes about all things tech for Kahunam. When he's not busy posting on our blog, he's scouring the web for new tips and tricks that help Wordpress and Shopify site owners make the most of their online presence.

Unlimited technical support for website owners

Starting from $1,240 per month.

Just posted

How to track user activity on WordPress with pluginsApril 23, 2025
403 Forbidden Error in WordPress: What it is, Why it happens, and how to fix it (2025 Guide)April 23, 2025

Enjoy our articles? Join our free list and get more.

Get expert tips, detailed free guides and practical checklists that can help you get your website working better, converting higher and rising up the search engine rankings.

Essential robots.txt rules to control bots on your website

What can you do with robots.txt?

Target multiple bots with the same rule

Block specific file types

Allow crawling of some areas while restricting others

Block specific bots while allowing others

Add comments for clarity

Useful robots.txt rules for website owners

Blocking your entire site from all crawlers

Restricting access to specific directories

Allowing access to specific crawlers only

Blocking a single crawler

Blocking specific pages

Allowing access to only one directory

Managing image crawling

Blocking specific file types

Allowing ad bots while blocking other crawlers

Conclusion

Author

Brian Nguyen

Unlimited technical support for website owners

Just posted

Enjoy our articles? Join our free list and get more.

Related posts

Ready to improve your website?