Bot Detection

VistoShield identifies and blocks malicious bots using User-Agent signature matching, rDNS verification, and configurable rate limiting.

How It Works

  1. Every incoming HTTP request's User-Agent header is checked against the bot signature database.
  2. Matched signatures are classified by action: block, challenge, monitor, or allow.
  3. Legitimate bots (Googlebot, Bingbot, etc.) are verified via reverse DNS lookup to confirm they originate from known IP ranges.
  4. Requests from bots set to "block" are denied and the IP is temporarily blocked.

Signature Database

Signatures are stored in /etc/vistoshield/bot-signatures.dat, one per line in the following format:

pattern|action|category|description
FieldDescription
patternRegex pattern matched against the User-Agent string
actionblock, challenge, monitor, or allow
categoryCategory name (e.g., scraper, crawler, spam, seo, scanner)
descriptionHuman-readable description of the bot

Example entries:

AhrefsBot|block|seo|Ahrefs SEO crawler
SemrushBot|block|seo|Semrush SEO crawler
MJ12bot|block|scraper|Majestic-12 crawler
Googlebot|allow|search|Google search crawler
bingbot|allow|search|Microsoft Bing crawler
python-requests|monitor|library|Python requests library

Updating Signatures

Pull the latest signatures from the VistoShield repository:

vistoshield update-signatures

This merges remote signatures with your local customizations. Your manually added entries and action overrides are preserved.

Tip: Set up a cron job to update signatures automatically: 0 4 * * * /usr/local/bin/vistoshield update-signatures --quiet

rDNS Verification

Good bots like Googlebot and Bingbot claim their identity via User-Agent but can be spoofed. VistoShield verifies them by:

  1. Performing a reverse DNS lookup on the source IP.
  2. Checking that the rDNS hostname matches the expected domain (e.g., *.googlebot.com, *.search.msn.com).
  3. Performing a forward DNS lookup on the hostname to confirm it resolves back to the source IP.

Bots that fail rDNS verification are treated as impostors and blocked.

Good Bot Allowlist

The following bots are allowed by default when they pass rDNS verification:

BotExpected rDNS Domain
Googlebot*.googlebot.com or *.google.com
Bingbot*.search.msn.com
Yahoo Slurp*.crawl.yahoo.net
Yandex*.yandex.com or *.yandex.net
Baidu Spider*.crawl.baidu.com
DuckDuckBot*.duckduckgo.com

Rate Limiting for Bots

Even allowed bots can overload your server. VistoShield applies separate rate limits to bot traffic:

BOT_RATE_LIMIT=60        # Max requests/minute for bots
BOT_RATE_BURST=10        # Burst allowance for bots
BOT_BLOCK_TIME=600       # Block duration for rate-limited bots (10 min)

Custom Signatures

Add your own entries to the end of bot-signatures.dat or create a separate file:

# /etc/vistoshield/bot-signatures-custom.dat
MyPrivateBot|allow|custom|My internal crawler
BadScraper.*v2|block|scraper|Known bad scraper variant

Custom files are loaded after the main database, so they can override default actions.

Viewing Bot Activity

# List recent bot detections
vistoshield list --type bot

# Show bot statistics
vistoshield status --bots

# Watch bot detections in real time
tail -f /var/log/vistoshield/vistoshield.log | grep BOT