It’s been a while since I posted anything about hackers or bots, and this is just going to be a little update. About the unrepentantly bad MJ12 bot.
See, there is a bad bot, and its name is MJ12. It belongs to bad owners called Majestic 12, which is supposedly an up-and-coming search engine but mainly just compiles website data for reports which they then sell to other website owners who are trying to scope out the competition. Believe it or not this is not illegal, it’s all publicly available information, and what people are actually paying Majestic and the other companies that do this for is the collection and compilation of the data. Website owners don’t like it, of course – not just because these companies are selling data about the structure of their site to the competition, but also because the swarms of greedy little bots can easily eat up a site’s bandwidth allotment and possibly make the site slow down or even crash. People who have websites don’t like things that might make the site crash, unless those things are viral posts and we have them monetized.
I’ve been blocking the MJ12 bot for quite a while now, and in the past year it has tried to hit my site 22,524 times – that’s 1,877 times a month, or an average of 63 times a day. I did try to contact Majestic 12 last June about the ridiculous number of times their bots were hitting my tiny little site, and what I got in return was a copy/paste response about how I should just use robots.txt to block their bots and a link to their site so I could understand their project better. Basically, the people who run Majestic 12 have no fucks to give when it comes to the rapacious locust-swarm of bots they have descending on innocent websites all over the world via their distributed crawler system. Although for a while when there was a really really bad bot pretending to be the MJ12 bot, they were extremely proactive when it came to responding to discussions about how bad their bots were by mentioning that the problem had to have been that other bot and threatening to sue people who said otherwise. Yes, really.
Right now you may be saying, okay, why aren’t you complaining about Google’s bots, or Yahoo’s, Bing’s, Baidu’s, or Yandex’s? Good question! Most of us don’t complain about the big search engines because 1) their bots are usually well-behaved and don’t over-crawl our sites, and 2) because Google et al. are giving us something back – they’re indexing our sites so our sites will come up in organic searches. I let Googlebot crawl my site, Google makes it easier for people to find my site. Yay! When their project first started, this is also what Majestic 12 was trying to do; they were trying to build a search engine that could compete with Google, and it was a lovely idea. Unfortunately, they seem to have kind of dead-ended on that.
At the end of the day – end of the year, I should say – I gave these guys and their bots and their grand idea a chance. Several, actually. I wouldn’t have minded letting them crawl my site – I even tried to tell them that. I used my security plugin to block their bot so I could see if they would slow down; they appeared to once, I removed the block, and the swarm came back. The new block had profanity in the description and I also went on a rant about it. This time you get a new rant that isn’t quite as funny, and I’m going to go block these bastards the old-fashioned way. I’m done. Majestic 12 gets no more chances from me.