Bot Detection
VistoShield identifies and blocks malicious bots using User-Agent signature matching, rDNS verification, and configurable rate limiting.
How It Works
- Every incoming HTTP request's
User-Agentheader is checked against the bot signature database. - Matched signatures are classified by action: block, challenge, monitor, or allow.
- Legitimate bots (Googlebot, Bingbot, etc.) are verified via reverse DNS lookup to confirm they originate from known IP ranges.
- Requests from bots set to "block" are denied and the IP is temporarily blocked.
Signature Database
Signatures are stored in /etc/vistoshield/bot-signatures.dat, one per line in the following format:
pattern|action|category|description
| Field | Description |
|---|---|
pattern | Regex pattern matched against the User-Agent string |
action | block, challenge, monitor, or allow |
category | Category name (e.g., scraper, crawler, spam, seo, scanner) |
description | Human-readable description of the bot |
Example entries:
AhrefsBot|block|seo|Ahrefs SEO crawler
SemrushBot|block|seo|Semrush SEO crawler
MJ12bot|block|scraper|Majestic-12 crawler
Googlebot|allow|search|Google search crawler
bingbot|allow|search|Microsoft Bing crawler
python-requests|monitor|library|Python requests library
Updating Signatures
Pull the latest signatures from the VistoShield repository:
vistoshield update-signatures
This merges remote signatures with your local customizations. Your manually added entries and action overrides are preserved.
0 4 * * * /usr/local/bin/vistoshield update-signatures --quiet
rDNS Verification
Good bots like Googlebot and Bingbot claim their identity via User-Agent but can be spoofed. VistoShield verifies them by:
- Performing a reverse DNS lookup on the source IP.
- Checking that the rDNS hostname matches the expected domain (e.g.,
*.googlebot.com,*.search.msn.com). - Performing a forward DNS lookup on the hostname to confirm it resolves back to the source IP.
Bots that fail rDNS verification are treated as impostors and blocked.
Good Bot Allowlist
The following bots are allowed by default when they pass rDNS verification:
| Bot | Expected rDNS Domain |
|---|---|
| Googlebot | *.googlebot.com or *.google.com |
| Bingbot | *.search.msn.com |
| Yahoo Slurp | *.crawl.yahoo.net |
| Yandex | *.yandex.com or *.yandex.net |
| Baidu Spider | *.crawl.baidu.com |
| DuckDuckBot | *.duckduckgo.com |
Rate Limiting for Bots
Even allowed bots can overload your server. VistoShield applies separate rate limits to bot traffic:
BOT_RATE_LIMIT=60 # Max requests/minute for bots
BOT_RATE_BURST=10 # Burst allowance for bots
BOT_BLOCK_TIME=600 # Block duration for rate-limited bots (10 min)
Custom Signatures
Add your own entries to the end of bot-signatures.dat or create a separate file:
# /etc/vistoshield/bot-signatures-custom.dat
MyPrivateBot|allow|custom|My internal crawler
BadScraper.*v2|block|scraper|Known bad scraper variant
Custom files are loaded after the main database, so they can override default actions.
Viewing Bot Activity
# List recent bot detections
vistoshield list --type bot
# Show bot statistics
vistoshield status --bots
# Watch bot detections in real time
tail -f /var/log/vistoshield/vistoshield.log | grep BOT