How to Block, Scrapers, Hackers and Spammers with Wordfence


Wordfence is a popular WordPress security plugin. Features include a scanner that monitors for hacked files and a firewall with regularly updated rules that actively blocks malicious bots.

The tool also has a useful feature that provides user-configurable firewall rules that can supercharge your ability to block hackers, scrapers, and spammers.

Scrapers are especially troublesome because they copy your content and publish it elsewhere.

Using a tool like Wordfence can help reduce the amount of material that can be stolen by scrapers.

There are many WordPress security plugins and SaaS solutions to choose from, which come highly recommended, including Sucuri Security and Cloudflare. Wordfence is one of many security solutions available and it’s up to you to figure out which one feels more comfortable with your workflow.

Wordfence and other solutions work fine as a set and forget it solution.

However, in my own experience I’ve found that the user configurable firewall in Wordfence gives anyone an opportunity to dial down the bot’s hammering power and really stick it to hackers and scrapers.

But before you dial down the firewall, it’s important to know how far these firewall rules can be taken and we’ll take a look at that as well.

wordfence wordpress security

Wordfence is trusted by over 4 million users to protect their WordPress sites.

The default firewall behavior is to block bots that grab too many pages too quickly or display activities to bots and humans that indicate intent to hack the site.

The firewall will block the rogue bot’s IP address for a certain period of time, after which Wordfence drops the block.

The default settings on the firewall work great.

But sometimes bots still pass by and are able to scrape a site or check for vulnerabilities by slowly scraping the site.

A common approach by hackers is to set up a bot to quickly hit the site and when it is blocked it will rotate other IP addresses and user agents, causing the firewall to start the detection process again. .

But these bots aren’t always programmed very well which makes it easier to block them more efficiently than the default Wordfence settings.

Background information about Wordfence firewall rules

It is possible to accomplish efficient bot blocking with the use of server level tools, multiple plugins and even a .htaccess file.

But editing the .htaccess file can be difficult as there are strict rules to follow and a mistake in the .htaccess file can cause the entire site to fail.

An easy way to block bots is to use firewall rules.

What can you block with Wordfence?

Wordfence allows you to create rules for blocking according to each of the following reasons:

  • IP address range
  • host name
  • browser user agent
  • reference

IP address range

IP address means the IP address of the server or ISP from which the bot or human is coming.

host name

Hostname means the name of the host. The host is not always declared, sometimes the bot/human visitor only displays an IP address.

browser user agent

Each site visitor usually tells the server which browser he is using. Browser User Agent refers to the browser that the visitor says he is using. A bot can say it’s virtually any browser, which they sometimes do to evade detection.

reference

This is a page from which a bot or human has clicked on a link.

Wordfence Custom Pattern Blocking

The way to block bad bots using any of the above four variables is to add a custom rule to the Custom Pattern Blocking Tool.

Here’s how to reach it.

step 1

In WordPress, from the admin menu on the left, click on the link for Firewall

Wordfence Step 1

Phase 2

Select the tab labeled Blocking

Wordfence Step 2

step 3

Select the “Custom Pattern” tab and create a firewall rule in the appropriate field. One of the fields is labeled “Block Reason”. Use that field to add a descriptive phrase like hostname, user agent or whatever. This will help you review all the rules you’ve created, according to what type of blocks it’s able to sort.

Wordfence Step 3

step 4

Wordfence Step 4

Step 5

Create your rule by clicking the “Block visitors who match this pattern” button and you’re done.

Wordfence Step 5

Wordfence rules can use asterisks

as a wild card.

Should You Block IP Addresses from Wordfence?

Wordfence makes it easy for publishers to set up firewall rules that effectively block bots.

This is a boon but can also be a curse. For example, permanently blocking thousands of IP addresses using the Wordfence firewall is not efficient and probably not a fair use of Wordfence.

It’s okay to temporarily block an IP address. Permanently blocking IP addresses is probably not okay because, as I understand it, going from memory, it can slow down or slow down your WordPress installation.

In general, it’s best to permanently block thousands or even millions of IP addresses with an .htaccess file.

Blocking Hostnames with Wordfence

Blocking hostnames with Wordfence can be one way to block hackers, spammers, and scrapers. You can view the Wordfence live traffic log by clicking on Wordfence > Tools.

It shows you bots and human visitors, including bots that were automatically blocked by Wordfence.

Not all site visitors display their hostname. However in some cases they display their hostname and this makes it easy to block an entire web host.

For example, a site, for whatever reason, attracts DDOS levels of bot traffic from a single host. No other site of mine gets that much attention from this host, just this one site.

One site received over 250,000 attacks between March 2020 and December 2021, and each one of them was blocked by Wordfence.

Clearly, blocking bots by hostname can be useful if you want to block cloud hosts that send nothing but hackers and scrapers.

However some hosts, such as Amazon Web Services (AWS) send both bad bots and good bots. Blocking AWS servers can also inadvertently block good bots.

That’s why it’s important that you monitor your traffic and ensure that blocking hostnames won’t have any upside effects.

On the other hand, if you have no access to traffic coming from Russia or China, it’s easy to block hackers, scrapers, and spammers from those two countries by creating a firewall rule using the hostname field.

All you have to do is create a rule that blocks all hostnames ending in .ru and .cn. This will block all Russian and Chinese hostnames ending in .ru and .cn.

This is what you enter in the Hostname field:
*.ru

*.cn

This is not meant to encourage anyone to use Wordfence to block Russian and Chinese bots via hostnames. This is just an example to show how it is done.

Block Hackers and Scrapers by User Agent

Many rogue bots use old and outdated browser user agents.

After Russia invaded Ukraine I noticed an increase in hacking bots using Chrome 90 User Agent (UA) from the same group of web hosts. Generally bot traffic is different on different websites. So when they all looked the same on all my sites it was different.

Whenever Wordfence automatically blocked these bots to hit my site too fast, the bots would switch IP addresses and start hitting the sites over and over again.

So I decided to block these bots by their browser user agent (often simply referred to as UA). first i checked Statcounter website

To determine how many users around the world are using Chrome 90. According to StatCounter data, the Chrome 90 browser had a 0.09% market share in the USA as of January 2022.

The Chrome browser is at version 100 at the time of this writing. Given that Chrome automatically updates browser versions for the vast majority of users, it should come as no surprise that Chrome 90 is used for virtually nothing, so it’s highly unlikely that all visitors will have one. use to be blocked. Chrome 90 browser user agent will not block any genuine and legitimate person visiting your site.

So I decided it was safe to block anything that appeared on my site with the Chrome 90 user agent.

However, there are online tools, such as GTMetrix and a security server header checker, that use the Chrome 90 user agent.

So if I blocked all versions of Chrome 90 (using this rule: *Chrome/90.*), I would block those two online tools as well.

Another approach is to look for specific Chrome 90 variants used by hackers and online tools.

Chrome/90.0.4430.212

GTMetrix and other tools use this Chrome UA:

Chrome/90.0.4400.8
Chrome/90.0.4427.0
Chrome/90.0.4430.72
Chrome/90.0.4430.85
Chrome/90.0.4430.86
Chrome/90.0.4430.93

Hackers and scrapers use the following Chrome UA: So, if you want online tools to still scan your site but also block bad bots, this is the one Example

*Chrome/90.0.4400.8*
*Chrome/90.0.4427.0*
*Chrome/90.0.4430.72*
*Chrome/90.0.4430.85*
*Chrome/90.0.4430.86*
*Chrome/90.0.4430.93*

how to do this:

Here's how to block Chrome/90.0.4430.93:

How to Block Chrome 90 with Wordfence

Warning about blocking user agents

Before blocking Chrome 90 I checked the wordfence traffic log (accessible at wordfence > tools) to make sure no legitimate bots, such as gtmetrics, were using chrome 90, Was using that user agent.

For example, you might not want to block Chrome 96 because some Google tools use Chrome 96 as the user agent.

Always research whether legitimate bots are using a particular user agent or hostname.

And there is an easier way to do research using wordfence traffic logs.

wordfence traffic log

The Wordfence traffic log shows you at a glance all the user agents accessing your site in near real-time. The traffic log shows information such as a user agent, indicates whether the visitor is a bot or a human, IP address, hostname, page being accessed and provides other information that helps determine whether a visitor is legitimate or No.

The way to access the traffic log is to click on Wordfence > Tools.

Blocking older browser versions is an easy way to block a lot of bad bots. Chrome versions of the 80, 70, 60, 50, 30 and 40 series are notably numerous on some sites. here is one Example

*Chrome/8*.*
*Chrome/7*.*
*Chrome/6*.*
*Chrome/5.0*
*Chrome/95.*
*Chrome/5*.*
*Chrome/3*.*
*Chrome/4*.*

How to block old Chrome UA used by bad bots:

Again, the above is not an incentive to block bots.

The reason I use *Chrome/6*.* is that with one rule I can block the entire Chrome 60 chain of user agents, Chrome 60, 61, 63, etc., without writing all ten user agents Am.

I can block the whole 60 series with one rule. Do not block ten and above series like this *chrome/1*.*

Because that will also block the most current version of Chrome, Chrome 100. the above is one Example

Described how to block bad bots using Chrome User Agents. Bad bots also use old and retired firefox browsers display the user agent and anything python-request /

as a user agent.

Be careful when creating firewall rules

Always do your research to determine what bad bots are using your site and make sure no legitimate bots or site visitors are using those old and retired browser user agents.

The way to do your research is to inspect your traffic log files or Wordfence traffic logs to determine which user agents (or hostnames) are from malicious traffic you don’t want.



Source link

Leave a Comment