Forum:Spam bots (let's get this settled)

Seeing a lot of posts in various corners of RW complaining about the rise in spam bots. Recaptcha and a few real basic filters (no href in wiki text, etc) keep out many, many, many bots. But it is not full proof and their is a rise of them on the wiki. The problem with the battle of spam bots is one of balance between inconveniencing or flat out pissing off legitimate users versus stopping the bots.

I would like to see a centralized discussion of this and the direction people want to go. Are we content with the level of spam and clean up relative to user inconvenience or do we need to adjust it? What alternative methods are people interested in looking at? Or is my impression that this seems to be an issue misplaced? Tmtoulouse (talk) 23:53, 26 March 2012 (UTC)
 * I'm really kinda meh on this, but a common suggestion is disabling page creation for new users. Тy rannis 23:58, 26 March 2012 (UTC)
 * I've said it before, and I'll say it again. Yeah, they're an inconvenience, not a site-threatening issue, but a major pain in the ass nonetheless. I like the idea of X edits before you can create a page, but, as others have pointed out, it wouldn't be long before people found a work-around, so I guess we'll just have to suck it up and deal with it. P-Foster Talk " "Santorum is the cream rising to the top." " 23:58, 26 March 2012 (UTC)
 * Personally I don't see the issue - we are not being hit like aSK for example. But that is just me. No page creation for new users could work though. AceModerator 00:07, 27 March 2012 (UTC)
 * I agree with the Ace. CopperheadHisssssss 13:49, 27 March 2012 (UTC)
 * I don't think most legitimate users would mind that they could not create pages for a few edits. And it doesn't even have to be more than, say 3.[[Image:Pink mowse.png|25px]]Godot     What do cats dream about? 00:23, 27 March 2012 (UTC)
 * IIRC Bob said that replacing the capcha - which is clearly not bot-proof - with a simple "what is the 3rd letter in rational" type question, stopped the bots over at our summer home, teflwiki. -- PsyGremlin  14:02, 27 March 2012 (UTC)
 * Rather than blocking new users from posting at all, could we not block them from posting hyperlinks until their 3rd edit? Although we have enough sysops to deal with spammers it does get tedious to have to wade through RC sometimes to find a change that isn't spam related. Crundy Talk nerdy to me 14:04, 27 March 2012 (UTC)
 * Also, what about using the MediaWiki extension to block users whose IPs / email addresses are in the Stop Forum Spam blocklist? Crundy Talk nerdy to me  14:05, 27 March 2012 (UTC)

I would suggest Questy capture. Users have to answer questions posed at registration. A bot cannot answer these. However, my understanding (which could be way off) is that spammers have set these things up so that the questions are sent to cyber-sweatshops in the far east where humans answer the questions. However if the answer to the question is not immediately apparent then they don't bother with it. A question such as "What colour is our logo" would be sufficient. Easy for a human actually on the site but impossible for software and too inconvenient for a human in the far east. We installed this some time ago at Teflpedia and it works a treat. --Bob"What can be asserted without evidence can also be dismissed without evidence." 14:04, 27 March 2012 (UTC)
 * As I understand it the CAPTCHAs are farmed out, mostly to porn sites, so any question which requires the respondee to be actually posting in a RW edit window would be useful. It might also be useful to have any challenge time limited because it does seem that there is often a delay between the creation of an account and posting the spam message, presumably while the CAPTCHA is being decoded. My personal gripe with the spammers is the excessive amount of red usernames which will never be utilised. 15:24, 27 March 2012 (UTC)

Question CAPTCHA
Okay, we can give that a try with some context sensitive questions. People want to help brain storm a set of questions? Tmtoulouse (talk) 16:11, 27 March 2012 (UTC)
 * What kind of context sensitive questions are you thinking about? Provide a few examples and I might be able to expand the list. MDB (talk) 16:19, 27 March 2012 (UTC)
 * Something like "what are the third and fifth letters of this site's name". 16:29, 27 March 2012 (UTC)
 * They have that at TEFLpedia, Bob M says it's stopped the spammers. Sophie  because liberals  16:35, 27 March 2012 (UTC)


 * "which is better, goats or jerboas?" Spammers are the dregs of society and would therefore say "jerboas" while all right-thinking people know the goat is supreme. Sophie  because liberals  16:38, 27 March 2012 (UTC)
 * Something as simple as "what is the main image in our logo" would also work, as it would be very hard for a computer ot parce the brain from the brackets, as it's just one image.[[Image:Pink mowse.png|25px]]Godot    What do cats dream about? 16:44, 27 March 2012 (UTC)
 * Obligatory. It would have to be a very large set (hundreds) to be effective. Anything less, and you're basically giving spammers a key to the place. I guess we could use GK's form and make them generic, e.g. "What are the X and Y letters of this site's name" would generate 132. You'd still need a large core of generic questions. Also, I need to get into the spamming game. Occasionaluse (talk) 16:49, 27 March 2012 (UTC)

What we need are questions about the logo. Something that the average human being who knows nothing about the site or its history can answer easily. What are the first or second letters or whatever. It doesn't need to be subtle to work. They need to be questions which can only be answered by someone who is present on the site and actually looking at it.--Bob"What can be asserted without evidence can also be dismissed without evidence." 18:00, 27 March 2012 (UTC)
 * There's absolutely no reason to make the CAPTCHA hard or subtle. It just needs to be a scheme unique to RW to stop spam. Just a flat "how many letters in 'rational'" or something will do the job, as long as the fields are uniquely named and the bot can't grab the question and stick it on a porn site somewhere. -- 18:41, 27 March 2012 (UTC)

A few points, Bob is absolutely correct that we need context questions that are answerable by anyone looking at our website but know little to nothing about it. Occasionaluser is right that if we have a small database of questions it is easier for spammer but only if they specifically target our site and build a database of our questions/answers. I think we can rely on "security through obscurity" as far as that goes, much like the ClubTM it doesn't really protect your car but it's easier to just go steal your neighbours car than deal with it. Tmtoulouse (talk) 18:43, 27 March 2012 (UTC)
 * So how many questions do you need?[[Image:Pink mowse.png|25px]]Godot    What do cats dream about? 18:49, 27 March 2012 (UTC)
 * I'd say "one", something like "How many letters are there in goat? (Hint, there are four letters in goat.)" and just accept 4 and "four" as right answers. -- 18:51, 27 March 2012 (UTC)
 * "How many letters are there in goat?" would be a bad question. You do not need to be looking at the site to to answer it. "How many letters are there in our logo?" would work.  In Teflpedia we've only get something like three questions.--Bob"What can be asserted without evidence can also be dismissed without evidence." 20:00, 27 March 2012 (UTC)

In which somebody parties like it's 2007

 * Generic questions like "which is a type of pie: apple pie or Bruce Lee?" are good, but we should also have questions relate to RW, like "which wiki is our mortal enemy?" An American Fallacy  ( super crazy fun time! ) 17:11, 27 March 2012 (UTC)
 * Well, CP, ASK and Citi don't meet it, meta doesn't, RWW doesn't... what wiki are you talking about?-- il' Dictator   Mikal  17:19, 27 March 2012 (UTC)
 * It's obviously CP. Don't try to fucking deny that. GodothasArrived  ( super crazy fun time! ) 17:28, 27 March 2012 (UTC)
 * CP hasn't been the "mortal enemy" of RW in a long time buddy-- il' Dictator   Mikal  17:32, 27 March 2012 (UTC)
 * Yeah, I guess WIGO:CP, the biggest WIGO, and the entire 'Conservapedia' namespace, not to mention the numerous references to CP and its sysops in various other places on this site, magically don't exist anymore, you fucking moron. An American Fallacy  ( super crazy fun time! ) 17:35, 27 March 2012 (UTC)
 * I'm working on it. One day at a time. Тy rannis 21:31, 27 March 2012 (UTC)
 * I find it hard to believe that somebody who shows up in so many conversations would not notice the amount of time put into topics NOT related to CP, or fail to notice how old the CP stuff is compared to the rest. CP isnt our archnemesis, it hasnt since we started branching out, our enemy is anti-science in general, not any one place. -- il' Dictator   Mikal  17:40, 27 March 2012 (UTC)
 * It's more the case that a new user who comes over without knowing about CP, and yes, if you read the "how you found us" page, they do exist, won't know enough to answer that question. Scarlet A.pnggnostic 17:41, 27 March 2012 (UTC)
 * A site can focus on a general topic and still have another site as its 'archenemy', you penis-boned stuffed-butt.
 * All they need to do is look at the list of WIGOs on the Main Page. It's not a hard question. GodothasArrived  ( super crazy fun time! ) 17:44, 27 March 2012 (UTC)
 * Again, when did CP become our main enemy again? this isnt 2008, this is 2012, and we arent soley dedicated to cP-- il' Dictator   Mikal  17:47, 27 March 2012 (UTC)
 * That doesn't change the fact that it is still a large part of this site, as well as its history. It's like saying that The Joker isn't Batman's archenemy because he fights other dudes. GodothasArrived  ( super crazy fun time! ) 17:50, 27 March 2012 (UTC)
 * "Well, its still part of our history, and some of us are still mostly CP watchers!" is what you just said, and its the shittiest defense ever. Is france still the mortal enemy of england? Turkey to russia, mongolia to China, everybody to poland?-- il' Dictator   Mikal  17:52, 27 March 2012 (UTC)
 * Never in my life have I wanted to use the trolltop template...until now. Occasionaluse (talk) 17:55, 27 March 2012 (UTC)
 * You're ignoring that we haven't moved on from CP. Maybe expanded from, but not moved on. An American Fallacy  ( super crazy fun time! ) 17:55, 27 March 2012 (UTC)
 * And your suggestion for moving on is something that tells all new users that CP is our "mortal enemy"? 19:16, 27 March 2012 (UTC)
 * Your guess is as good as mine. Тy rannis 21:31, 27 March 2012 (UTC)

Possible Questions

 * 1) What is the x and x letter in our name on the logo?
 * 2) what color is the main color of our logo?
 * 3) What is the primary picture in the logo?
 * 4) How many brackets are in the logo?
 * 5) Are you a spam bot? Y or N
 * 6) Which is a type of pie: apple pie or Bruce Lee? (but...isn't he related to Sara Lee?)
 * 7) How many capital letters in the logo?
 * 8) How many items are under Navigation/Community/Toolbox on the create account page?
 * 9) What is the airspeed velocity of an unladen swallow?
 * African or European?

Considerations
I don't think these two questions are good: You don't need to be looking at the site to answer them.--Bob"What can be asserted without evidence can also be dismissed without evidence." 07:29, 28 March 2012 (UTC)
 * Are you a spam bot? Y or N
 * Which is a type of pie: apple pie or Bruce Lee? (but...isn't he related to Sara Lee?)

Out of interest....
Can you go in to the database and grab a couple of example profile rows of typical spam accounts and paste them here? I'm interested in seeing whether it looks like they grab just the captcha from the site, or whether they farm out the whole registration form. -- 18:55, 27 March 2012 (UTC)

BoNs
So, would we disable editing outright for BoNs, or just page creation? Peter tanquam ex ungue leonem 21:00, 27 March 2012 (UTC)
 * I'd guess if they can get past the new captchathey're probably human, so why disable page creation for them? It's not something I'd want to see, suppose a BoN wants to make a legitimate comment on a red talk page? Sophie  because liberals  21:33, 27 March 2012 (UTC)
 * There's also the Saloon Bar 82.whatever, and the 50-something who edits reasonably often. ArchieGoodwin (talk) 21:37, 27 March 2012 (UTC)
 * and several articles created by BoNs over the years. BTW This edit comment shows someone knows their RW history. Sophie  because liberals  21:45, 27 March 2012 (UTC)
 * Exactly. BoNs are not the problem.  The vast majority of spam is by accounts registered specifically for that purpose, not bare IPs.   22:54, 27 March 2012 (UTC)
 * But if we put this system up for every non-autoconfirmed edit, we'd need a lot more questions. How about only for page creation and account creation, with the other capatchas being retained elsewhere? Peter tanquam ex ungue leonem 05:38, 28 March 2012 (UTC)
 * For information it didn't stop IP's at TP and we ended up`blocking them. Bu we have less resources there than RW.  I don't see much IP spam here, but I do see some valid IP edits and comments.  I'd say leave them alone for now.--Bob"What can be asserted without evidence can also be dismissed without evidence." 07:26, 28 March 2012 (UTC)
 * Surely the only questions you need are random combinations of two or three letters from the site name, just like my online banking login.  Lily Inspirate me. 21:54, 30 March 2012 (UTC)
 * Yes.--Bob"What can be asserted without evidence can also be dismissed without evidence." 10:41, 31 March 2012 (UTC)

So...
Are we going to do anything? BTW, I just saw this&mdash;I didn't know they did that. Peter tanquam ex ungue leonem 07:52, 29 March 2012 (UTC)

Secret messages to farmhands
I say we put messages asking the people doing the captchas to come to this site and tell us where they saw the captcha, if there be quality porn there, or if their sweatshop is well known to unbiased multicultural orgs such as Human Rights Watch.

The captcha would be a good place for other informative adverts in the form of informative terms that you could scroogle or startpage.com. The Respaminator! (memeware copysate Lumecorp) Unicow (talk) 21:37, 30 March 2012 (UTC)
 * What the screaming tentacles of fuck are you jabbering on about? Sophie  because liberals  21:40, 30 March 2012 (UTC)
 * We are talking about using a "captcha" that asks a question. The people who run the spambots apparently farm these out to be solved by people who don't actually know of this site. With me so far?
 * What I suggest is that instead of doing things the usual way and putting only a question for the captcha, we put a special message to the people who are solving the captcha for the spambots. Ask them to come to RationalWiki and tell us where they come from. If they already know that they are working for the spambots they probably won't tell us anything but if they are farming these things out to porn sites then the people who solve the captchas will learn the porn site is run by the same people who run the spambots and they might come tell us.
 * Second idea is that we put "advertisements" with the captcha question so that the spambots will advertise for us. So the captcha might read like this, "To edit the wiki, answer this question: How many brackets are in the logo? (This message is to spambot captcha solvers: To learn how to convert human poop into food for chickens, fish, pigs, or dogs, google 'soldier grub composting toilet'.)" Perhaps this is more practical if you have an unpopular wiki that is mostly edited by spambots. =D I find it amusing that spammers will be sending spamish messages to their "employees" or "customers" whist they are trying to circumvent the spambot blocker. Perhaps I am alone in this amusement. Unicow (talk) 02:02, 31 March 2012 (UTC)
 * [[File:Facepalm.png]] 10:49, 31 March 2012 (UTC)
 * Headers like "Secret messages to farmhands" are part of why no one takes you seriously. The other part is the incoherent post beneath the header.   02:36, 31 March 2012 (UTC)
 * I think the "porn sites solve captchas" may be a bit of an urban myth. OK, somebody may have tried it but I doubt it's the norm. The official captcha.net doesn't take it very seriously anyway.
 * Captcha Farms are the most likely possibility.--Bob"What can be asserted without evidence can also be dismissed without evidence." 06:36, 31 March 2012 (UTC)
 * Eggcellint clues about our target market, Bob.
 * Stabby the Retard, it was perfectly coherent to the lumenated. What would y'all do if you were to take "you" "seriously"? Unicow (talk) 17:37, 31 March 2012 (UTC)

Questy CAPTCHA
Lets give it a shot and see what happens easily reversible if there are problems. Tmtoulouse (talk) 22:03, 30 March 2012 (UTC)
 * Are you sure it's working? I had some problems creating an account the other minute - I'm sure I had the right input, but it wasn't having none of it. Peter tanquam ex ungue leonem 08:01, 31 March 2012 (UTC)
 * This took me about five attempts until I met a capatcha that would let me pass (it was the 'what is the name of this wiki' one). The one you give below, which I originally got, wouldn't work. Peter tanquam ex ungue leonem 08:05, 31 March 2012 (UTC)
 * Something not quite right. I had the question "what is the third and fifth letter of the site name" (or something similar. "to" didn't work, "TO" didn't work but "t and o" did.--System (talk) 13:22, 31 March 2012 (UTC)
 * A perfect example of why this idea is so shit. How is the user supposed to know what format the answer should take?  TO, to, t/o, t & o, t + o, etc. etc.  Don't even get me started on "what colour is our logo?"  17:52, 31 March 2012 (UTC)


 * That can be addressed by using multiple choice style answers. Tmtoulouse (talk) 17:55, 31 March 2012 (UTC)

Question answer format
The format for the question/answer looks like this:

$wgCaptchaQuestions[] = array( 'question' => "What is the first and last letter in our name on the logo?", 'answer' => "ri" );

If people want to format up some more questions for variety and paste them up here I would be grateful. Tmtoulouse (talk) 22:04, 30 March 2012 (UTC)
 * $wgCaptchaQuestions[] = array( 'question' => "What is the first and last letter in our name on the logo?", 'answer' => "ri", 'answer' => "r i", 'answer' => "r and i" );
 * Nihilist 22:06, 30 March 2012 (UTC)
 * Yeah can we not have multiple answers for each question? Also, is case taken into account? Crundy Talk nerdy to me 09:20, 2 April 2012 (UTC)
 * If there were four answers, a bot would get through one time in four, which would make things manageable. If it were allowed more than one guess per login attempt, that rate would rise. I'll go with questions like "what in the Xth letter in word Y?" and no multiple choice. Sophie  because liberals  10:40, 2 April 2012 (UTC)
 * You can't use a question which can be answered by looking at the question alone, as spam farms can deal with that. As mentioned you need something like "What is the nth letter in the site logo" or similar. Speaking of which, after this whole conversation, about a week ago I changed the spam questions on one of my forums (which was getting about 10-15 spam signups a day) to the above and I haven't had a single spam signup since. Crundy Talk nerdy to me 12:16, 12 April 2012 (UTC)
 * Exactly right. You only need to ask about one letter.  Asking about two just introduces an unnecessary complication.
 * You've got to think about what is happening at the other end. There is some poor chap in a thirld world country who spends his life answering captchas. All he sees is the captcha. He gets a pittance for each one answered and he's on the clock.  Anything which means that he's got to open up a web browser to find out the answer is just going to be ignored.  So they have no chance to learn or remember answers and, even if they, did they don't get the same sites each time.
 * Obviously at some point spammers will get round this, but it works for now. --Bob"What can be asserted without evidence can also be dismissed without evidence." 12:52, 12 April 2012 (UTC)
 * Sounds good to me, making it the nth (randomly) letter of the logo ought to do just fine. 00:14, 15 April 2012 (UTC)

borken captcha alert
Please see following conversation plagiarised from RWW 62.212.67.209 (talk) 08:27, 24 April 2012 (UTC)

CAPTCHA on RW is borked
The CAPTCHA on RW seems to be borked. No matter how many times I give the right answer, it just doesn't accept. 93.182.132.100 03:39, 23 April 2012 (UTC)
 * You mean the 'nth letter in the logo' one? Have you tried deliberately inputting the wrong letter, i.e. 'a' when it asks for the first? Tobul Oltarolin 03:49, 23 April 2012 (UTC)
 * This guy managed to get through. Tobul Oltarolin 03:52, 23 April 2012 (UTC)
 * Two insights: FIRST, it doesn't work at all for LiquidThreads. SECOND, it works okay for rationalwiki.org, but not if you use another domain, like mail.evowiki.org. Why would anyone use that domain? It's indexed in Google, so people finding RW pages from Google will end up on it and not realize, or just not know any better. 166.70.207.2 06:30, 24 April 2012 (UTC)
 * Given that the LQT thing means you can't get through to Trent directly, could you post your findings to ? Tobul Oltarolin 07:32, 24 April 2012 (UTC)