# Blocking bad link checker robots
User-agent: Rogerbot
User-agent: Exabot
User-agent: MJ12bot
User-agent: Dotbot
User-agent: Gigabot
User-agent: AhrefsBot
User-agent: BlackWidow
User-agent: Bot\ mailto:[email protected]
User-agent: ChinaClaw
User-agent: Custo
User-agent: DISCo
User-agent: Download\ Demon
User-agent: eCatch
User-agent: EirGrabber
User-agent: EmailSiphon
User-agent: EmailWolf
User-agent: Express\ WebPictures
User-agent: ExtractorPro
User-agent: EyeNetIE
User-agent: FlashGet
User-agent: GetRight
User-agent: GetWeb!
User-agent: Go!Zilla
User-agent: Go-Ahead-Got-It
User-agent: GrabNet
User-agent: Grafula
User-agent: HMView
User-agent: HTTrack
User-agent: Image\ Stripper
User-agent: Image\ Sucker
User-agent: Indy\ Library
User-agent: InterGET
User-agent: Internet\ Ninja
User-agent: JetCar
User-agent: JOC\ Web\ Spider
User-agent: larbin
User-agent: LeechFTP
User-agent: Mass\ Downloader
User-agent: MIDown\ tool
User-agent: Mister\ PiX
User-agent: Navroad
User-agent: NearSite
User-agent: NetAnts
User-agent: NetSpider
User-agent: Net\ Vampire
User-agent: NetZIP
User-agent: Octopus
User-agent: Offline\ Explorer
User-agent: Offline\ Navigator
User-agent: PageGrabber
User-agent: Papa\ Foto
User-agent: pavuk
User-agent: pcBrowser
User-agent: RealDownload
User-agent: ReGet
User-agent: SiteSnagger
User-agent: SmartDownload
User-agent: SuperBot
User-agent: SuperHTTP
User-agent: Surfbot
User-agent: tAkeOut
User-agent: Teleport\ Pro
User-agent: VoidEYE
User-agent: Web\ Image\ Collector
User-agent: Web\ Sucker
User-agent: WebAuto
User-agent: WebCopier
User-agent: WebFetch
User-agent: WebGo\ IS
User-agent: WebLeacher
User-agent: WebReaper
User-agent: WebSauger
User-agent: Website\ eXtractor
User-agent: Website\ Quester
User-agent: WebStripper
User-agent: WebWhacker
User-agent: WebZIP
User-agent: Wget
User-agent: Widow
User-agent: WWWOFFLE
User-agent: Xaldon\ WebSpider
User-agent: Zeus
Disallow: /
También puedes bloquear los bots usando el fichero .htaccess, como te muestro en este ejemplo:
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} .*Twice.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*Yand.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*Voil.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*libw.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*Java.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*Sogou.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*psbot.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*Exabot.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*boitho.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*ajSitemap.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*Rankivabot.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*DBLBot.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*MJ1.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*Rankivabot.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*AhrefsBot.*
RewriteRule ^(.*)$ http://anysite.com/ [L,R=301]
Order Allow,Deny
Allow from all
Deny from 110.0.0.0/8
Deny from 111.0.0.0/8
Deny from 112.0.0.0/5
Deny from 120.0.0.0/6
Deny from 124.0.0.0/8
Deny from 125.0.0.0/8
Deny from 147.0.0.0/8
Deny from 169.208.0.0
Deny from 175.0.0.0/8
Deny from 180.0.0.0/8
Deny from 182.0.0.0/8
Deny from 183.0.0.0/8
Deny from 202.0.0.0/8
Deny from 203.0.0.0/8
Deny from 210.0.0.0/8
Deny from 211.0.0.0/8
Deny from 218.0.0.0/8
Deny from 219.0.0.0/8
Deny from 220.0.0.0/8
Deny from 221.0.0.0/8
Deny from 222.0.0.0/8
Aquí tienes una lista de bots:
http://foroblackhat.com/hilo-listado-de-...n-internet
También existe un plugin para Wordpress que puedes hacer esto cómodamente desde la administración de tu blog:
https://www.mcssl.com/content/65522/link-privacy.zip