How to Use Crawl-delay in Your Robots.txt File
Web Performanceđź“– 9 min readđź“… September 15, 2024

How to Use Crawl-delay in Your Robots.txt File

Marcus Thorne
Marcus Thorne
System Administrator

What is Crawl-delay?

Crawl-delay is an unofficial, but widely supported, directive used in a robots.txt file to tell search engine bots how many seconds they must wait between successive requests to your server. It's a throttling mechanism designed to prevent crawlers from overwhelming your web server.

Basic Syntax

User-agent: bingbot
Crawl-delay: 5

This tells Bing's crawler to wait exactly 5 seconds after downloading one page before requesting the next one. For example:

  • Request page A at 10:00:00
  • Wait 5 seconds (10:00:01 to 10:00:05)
  • Request page B at 10:00:06
  • Wait 5 seconds
  • Request page C at 10:00:11

How Crawl-delay Affects Crawl Rate

With a Crawl-delay of D seconds, a bot can crawl at most:

  • 60/D pages per minute
  • 3,600/D pages per hour
  • 86,400/D pages per day

For example, with a Crawl-delay of 5 seconds, the maximum crawl rate is 12 pages per minute, 720 per hour, or 17,280 per day.

💡 Fractional Seconds: Some crawlers (like Yandex) support fractional delays. Crawl-delay: 0.5 tells the bot to wait half a second between requests—useful for fine-grained throttling.

The Big Catch: Googlebot Ignores Crawl-delay

It's absolutely critical to know that Googlebot does not support the Crawl-delay directive. This is explicitly stated in Google's documentation and has been true since at least 2019.

Why Google Ignores It

Google uses a sophisticated, dynamic crawl rate management system that automatically adjusts based on your site's performance. Their system:

  • Monitors server response times and error rates
  • Automatically reduces crawl rate if your server starts responding slowly or with errors
  • Increases crawl rate when your server is performing well and you have fresh content
  • Uses historical data to predict optimal crawl times

Google believes their automatic system is more effective than a static crawl delay value, which may be too aggressive or too conservative depending on your server's current load.

What Happens If You Add Crawl-delay for Googlebot?

User-agent: Googlebot
Crawl-delay: 10

This directive will be completely ignored. Googlebot will continue crawling at its dynamically determined rate, which could be faster than you'd like.

How to Actually Control Googlebot's Crawl Rate

To limit Googlebot's crawl rate, you must log into your Google Search Console account:

  1. Navigate to "Settings" in the left sidebar
  2. Click "Crawl rate" under the "Crawling" section
  3. Adjust the slider between "Faster" and "Slower"

This setting is specific to your site and overrides Google's automatic crawl rate calculations. You can set it to:

  • Let Google optimize crawl rate (recommended): Google's automatic system
  • Lower crawl rate: Reduces requests per second
  • Higher crawl rate (limited availability): Increases requests for very large sites
⚠️ Important: The GSC crawl rate setting only affects Googlebot's crawling of your site. It does not affect other crawlers like Bingbot, Yandex, or Baidu. For those, you still need Crawl-delay directives in robots.txt.

Which Bots Support Crawl-delay?

While Google ignores it, many other major search engines and crawlers do respect the Crawl-delay directive. Here's the comprehensive list:

Fully Support Crawl-delay

  • Bingbot (Microsoft Bing): Fully supports it. Bing also recommends using Bing Webmaster Tools for more granular control, but Crawl-delay works reliably.
  • YandexBot (Yandex - Russia): Fully supports it. Accepts fractional seconds (e.g., Crawl-delay: 0.5).
  • Baiduspider (Baidu - China): Generally respects the delay, though documentation is sparse. A delay of 1-3 seconds is recommended.
  • Sogou (Chinese search engine): Supports Crawl-delay.
  • SeznamBot (Czech search engine): Supports Crawl-delay.
  • Naver (Korean search engine): Supports Crawl-delay.

Support Varies (Test First)

  • DuckDuckBot (DuckDuckGo): Documentation suggests it respects Crawl-delay but may have upper limits.
  • GPTBot (OpenAI): Respects robots.txt but doesn't document Crawl-delay support specifically.
  • Amazonbot (Amazon): Respects robots.txt and likely honors Crawl-delay.

Do NOT Support Crawl-delay

  • Googlebot: Ignores it entirely (use GSC instead)
  • Applebot (Apple Siri/Spotlight): Documentation indicates it does not support Crawl-delay
  • Common malicious scrapers: They ignore robots.txt entirely, including Crawl-delay

Example: Targeting Multiple Bots with Different Delays

# Google: Manage via GSC, not here
User-agent: Googlebot
# No Crawl-delay here - it would be ignored

# Bing: 5 second delay
User-agent: bingbot
Crawl-delay: 5

# Yandex: 2 second delay (faster than Bing)
User-agent: YandexBot
Crawl-delay: 2

# Baidu: 10 second delay (more aggressive crawler)
User-agent: Baiduspider
Crawl-delay: 10

# Default for all other bots (optional)
User-agent: *
Crawl-delay: 1

When Should You Use Crawl-delay?

Crawl-delay is a throttling mechanism with specific use cases. It's not something every website needs.

Situations That Warrant Crawl-delay

1. You're on Shared Hosting

Shared hosting plans often limit CPU usage, concurrent connections, or memory. Aggressive crawlers can quickly trigger these limits, causing 503 Service Unavailable errors for human visitors. A modest Crawl-delay (1-3 seconds) can prevent this.

2. Your Site Has Server Performance Issues

If your server consistently experiences high load or slow response times during normal traffic, adding crawl delays can prevent crawlers from making the problem worse while you address the root cause.

3. You're Being Targeted by Aggressive International Crawlers

Crawlers from certain regions (e.g., Yandex, Baidu) can be significantly more aggressive than Western ones, sometimes making 20-50 requests per second. A Crawl-delay of 5-10 seconds can tame them.

4. Your Analytics Show Bot Overload

Check your server logs. If you see bots making thousands of requests per minute and your server load spikes correlate with those requests, a Crawl-delay is appropriate.

5. You Have a Very Large Site (1M+ Pages)

Paradoxically, very large sites might use Crawl-delay not to slow down crawlers but to ensure they crawl strategically. For example, applying a delay to parameter-heavy URLs while allowing faster crawling of core content.

When You Should NOT Use Crawl-delay

  • Small websites (<10,000 pages): Crawlers aren't generating enough traffic to cause problems
  • Well-optimized dedicated servers: You have plenty of resources to handle crawlers
  • Sites with fast response times (<200ms): Crawlers aren't harming performance
  • Sites that rely on rapid indexing: Crawl-delay slows down discovery of new content
đź’ˇ Pro Tip: Before implementing Crawl-delay, check your server logs to identify which bots are actually causing problems. You may only need to throttle one or two specific crawlers rather than applying a blanket delay.

How to Implement Crawl-delay Safely

Implementing Crawl-delay incorrectly can harm your SEO by slowing down indexing. Follow these best practices:

Best Practice #1: Target Specific Bots

Never use Crawl-delay with User-agent: * unless absolutely necessary. Apply delays only to the specific bots causing problems.

# Good: Targets only aggressive bots
User-agent: YandexBot
Crawl-delay: 3

User-agent: Baiduspider
Crawl-delay: 3

# Bad: Slows down every bot including polite ones
User-agent: *
Crawl-delay: 3

Best Practice #2: Start Small and Increase Gradually

Don't set a high delay immediately. Start with a low value (1-2 seconds), monitor your server load for a week, and increase only if needed.

  • Week 1: Crawl-delay: 1
  • Week 2: Increase to 2 if problems persist
  • Week 3: Increase to 3-5 if necessary

Best Practice #3: Test Before Deploying

Use Google's Robots.txt Tester and Bing's equivalent to verify your syntax. Even for bots that support Crawl-delay, syntax errors can cause the entire robots.txt to be ignored.

Best Practice #4: Monitor Crawl Statistics

After implementing Crawl-delay, check:

  • Bing Webmaster Tools → Crawl Information → Crawl Rate
  • Yandex Webmaster → Crawl Statistics
  • Your server logs for reduced request volumes from targeted bots

Full Example: Comprehensive Crawl-delay Configuration

# Googlebot - managed via GSC, not here
User-agent: Googlebot
# No Crawl-delay

# Bingbot - 2 second delay (moderate)
User-agent: bingbot
Crawl-delay: 2

# Yandex - 5 second delay (aggressive crawler)
User-agent: YandexBot
Crawl-delay: 5

# Baidu - 10 second delay (very aggressive)
User-agent: Baiduspider
Crawl-delay: 10

# SEO bot - 1 second delay (polite, but still throttle)
User-agent: SeznamBot
Crawl-delay: 1

# AI training bots - block entirely (optional)
User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

# Default: allow all but don't apply crawl delay
User-agent: *
Disallow:

Alternatives to Crawl-delay for Googlebot

Since Googlebot ignores Crawl-delay, here are alternative methods to control Google's crawling behavior:

Method 1: Google Search Console Crawl Rate Setting (Easiest)

As mentioned earlier, this is the official way to limit Googlebot. It's simple but offers only coarse control (slower or faster relative to default).

Method 2: Server-Level Rate Limiting (Advanced)

You can configure your web server to limit request rates based on User-agent. This works for ALL bots, including those that ignore robots.txt.

Apache Example (using mod_ratelimit and mod_setenvif):

SetEnvIf User-Agent "Googlebot" googlerate
SetEnvIf User-Agent "bingbot" bingrate
<Location />
    
        # Limit Googlebot to 10 requests per second
        
            Require env googlerate
            # Apache rate limiting configuration
        
    
</Location>

Nginx Example (using limit_req module):

map $http_user_agent $limit_key {
    default "";
    ~*Googlebot "googlebot";
    ~*bingbot "bingbot";
}

limit_req_zone $limit_key zone=bot_zone:10m rate=5r/s;

server {
    location / {
        limit_req zone=bot_zone burst=10 nodelay;
        # ... rest of config
    }
}

Method 3: CDN-Level Bot Management

Services like Cloudflare, Akamai, and CloudFront offer bot management features that can rate-limit or challenge specific crawlers before they reach your origin server.

đź’ˇ Recommendation: For most websites, the GSC crawl rate setting is sufficient. For advanced use cases, server-level rate limiting gives you precise control over all bots, not just search engines.

Calculating the Right Crawl-delay Value

Choosing the right Crawl-delay value requires balancing indexing speed against server load.

The Formula

To determine your optimal Crawl-delay, you need to know:

  • R: Maximum requests per second your server can handle from a single bot
  • D: Desired requests per second from the bot (should be less than R for safety margin)
  • C: Crawl-delay in seconds = 1 / D

Example Calculation

Your server can handle 15 requests per second from a single bot (R = 15). You want to leave a 50% safety margin, so you target D = 7.5 requests per second. Your Crawl-delay would be 1 / 7.5 = 0.13 seconds. Since fractional seconds may not be supported by all bots, round to Crawl-delay: 1 (which allows about 1 request per second, well within your capacity).

Crawl-delay Guidelines by Site Type

  • Small blog on shared hosting: 1-2 seconds
  • Medium business site on VPS: 0.5-1 second (if needed at all)
  • Large e-commerce on dedicated server: 0.1-0.5 seconds or none
  • News site needing rapid indexing: No Crawl-delay
  • Site being overwhelmed by Yandex/Baidu: 5-10 seconds for those specific bots

Monitoring Impact After Setting Delay

After implementing Crawl-delay, monitor these metrics for 1-2 weeks:

  • Server CPU and memory usage during peak bot activity
  • Response times for human visitors during the same periods
  • Crawl statistics in Bing Webmaster Tools and Yandex Webmaster
  • Indexation of new content (is it being discovered quickly enough?)

If your server load remains high, increase the delay. If new content isn't being indexed quickly enough, decrease the delay.

🏆 Final Takeaway: Crawl-delay is a powerful tool for protecting server resources from aggressive crawlers, but it's not a one-size-fits-all solution. Google ignores it entirely, so use GSC for Googlebot control. For other bots, target specific User-agents, start with small delays, monitor carefully, and adjust based on real data. When in doubt, leave it out—most websites don't need Crawl-delay at all.

Share Article

Marcus Thorne

Marcus Thorne

System Administrator

Marcus is a veteran systems administrator and infrastructure architect who focuses on server efficiency and load balancing for high-traffic networks.

Article Details

đź“… PublishedSeptember 15, 2024
⏱️ Read Time9 min read
đź“‚ CategoryWeb Performance
#crawldelay#serverperforman#webscrapers#bingbot#yandexbotcrawlr
🤖

Ready to Generate Your Robots.txt?

Free Robots.txt Generator. Instantly build error-free directives and optimize search engine crawling for your website.

Start Generating Now →