Jean-Marc

JM's Blog

Web, Society, Technology, and Innovation

Blog Home

March 19, 2006

A simple trick to help prevent form spam (2)

This is a more sophisticated setup of this blog comment spam prevention tactic. I've implemented this several weeks ago and it seems to work well.

  1. After changing the URL of your blog's comment executable, encode the new URL here. For this, enter your URL where indicted "Step 1: enter your e-mail address". Copy the code given to you and extract the strings string1 and string2, which look respectively like FPZEVPKWZXDS2M and %27%23%3E%237%23/1%3C7+1S%3F.
  2. Place the following code after the <head> tag of the page, after replacing string1 and string2 by their actual values:
    <script type='text/javascript'><!--
    function blurl(){var v2="string1";var v7=unescape("string2");
    var v5=v2.length;var v1="";for(var v4=0;v4<v5;v4++)
    {v1+=String.fromCharCode(v2.charCodeAt(v4)^v7.charCodeAt(v4));}return v1}
    //--></script>
  3. Locate the comment form in the template of your blog, and remove the action attribute specifying the URL of the executable used to process the form results.
  4. Finally, add this.action=blurl() to the onsubmit handler of the form:
    <form method="post" onsubmit="this.action=blurl()"> ... </form>



Unprotected email addresses on blogs get spammed

How many of the owners of the 10 millions existing blogs show their email address on their blog, ready for spammers to collect?

A study shows that out of the many ways email addresses are collected online, 97% of spam originates from addresses harvest on websites or blogs. Our own study shows that it takes as little as two days after an email address is published online before it gets spammed.

It's of course legitimate to publish you email address on a personal blog, but there are few reasons why you should not protect it from spam. This page shows you to generate an encoded link and integrate it with your template.

February 11, 2006

Adsense spam

There two great things about Google's Adsense contextual ads. Firstly, they are not aggressive and quite repectful of users both in terms of content and bandwidth (compare this with Flash ads or popups). Secondly, they are targeted to be as close a possible to the content of the webpage that hosts them, in the hope that the user will find the ads relevant to himself in his particular situation, and will see the advertised websites as a valuable content, rather than as something that is being imposed on him.

Being mostly an automated system however, Adsense is being abused by both publishers and advertisers. The case of publishers is well know. Some sites automatically generate thousands of pages by aggregating search results, replicating content from the Open Directory, Wikipedia, RSS feeds, etc. and generously including Adsense banners among these. Google is certainly working on ways to reduce this particular type of spam, being doubly involved as a seach provider and an advertising network.

The second type of Adsense spam is the act of some ad publishers, and as Google gets revenues from publishers spam, it seems they are far less concerned about it. Type any keyword of no commercial value in Google and you'll see ads like the one below:

Adsense Spam by Ebay adsense-spam-3.png


Adwords allows Ebay and others to buy hundreds of low-value keywords and automatically generate text ad banners. You'll find these not only along Google's search results, but also on other sites publishing Adsense ads. The problem with these ads, of course, is relevance. I'm not aware of idiots being sold on Ebay, but this is what the ad says. Is it what the publisher wants us to believe? probably not. They just are just using those ads as cheap bates to attract you into their site.

Those ads have a negative impact on the content of the hosting site, so that's bad for content publishers that value more the general quality of their site than the cent they'll get for the click.

Some publishers are buying those cheap keywords to publish unrelated ads. That's how I got ads for a religious group on a foreign languages website, and ads for adult content alongside Google search results.

December 29, 2005

A simple trick to help prevent form spam

If you have a blog or a webpage with a guestbook facility, you will have certainly experienced automated posts known as comment spam.

There is a simple thing you can try to prevent spammers from posting to your blog: make it more difficult for them to find how to post to your site using automated means. The HTML code for submitting a form looks as follows:

<form method="post" action="http://www.example.com/bin/comment">...</form>

The action attribute above specifies the web address (URL) of the executable used to process the form results. Simply remove it and add an onsubmit attribute as shown below.

<form method="post"
onsubmit="this.action='http://www.example.com/'+'bin/post-a-comment'">
...</form>

I have made two changes. I now use a trivial JavaScript snippet to build the URL of the executable, and I have renamed it. The reason for using JavaScript is because comment-spamming programs are most unlikely to be sophisticated enough to understand it. Since once spammers have found a comment posting URL they will keep it and reuse it every now and then, I have also renamed the executable to start with a fresh URL.

If you are thinking of implementing this, you should use a better encoding technique since it doesn't involve much more work.

Update: this seems to work pretty well, and I have described how to encode your blog comment URL in another post.

August 03, 2005

Beware of SMS.ac invitation emails!

For many months I have been receiving regular emails inviting me to join a service named "SMS.ac". All messages claimed to be from people I more or less knew of (e.g. former students of mine). After receiving one of these emails pretending to be from a friend, I queried her about this and she denied having sent or instructed SMS.ac to send me any of these emails.
Trying to understand, I went to their registration page and I was amazed: innocently, as part of the registration process, they ask your hotmail login and password!
I assume people that give away their details in that way think SMS.ac are only going to use them to import their contacts in a passive way, as done by some other web services. The site does mention in convoluted ways that they will use the imported email addresses to "invite" people in your address book to SMS.ac. And they do mention the clause that one should obtain his contacts' "implicit consent" (they must agree but you don't need to ask them!) before proceeding… but everything is phrased carefully to avoid startling candidates. So it is probably fair to say that a large share of the people facing this situation are no way near realizing that SMS.ac are going to repeatedly send spam to all their friends, in their name, until they signup or explicitly opt out.
But just imagine someone knowingly allowing sms.ac to send ads to his contacts.

  • this person has made the effort to read and understand the small prints and believes it's fine
  • he finds sms.ac so good—even before trying it (remember, we are still in the signup process)—that he wants to tell everyone.
I am not sure how realistic this scenario is, but I carry on: is this person authorized to disclose the email addresses contained in his address book to a third party? As far as I am concerned the answer is clearly no: if I give you my email address, I grant you the right to use it, but it is a non-transmissible right. I'm obviously not trying to show here that you should blame your friends for spamming you, but simply that sms.ac makes them do things they are not entitled to do anyway.
Many reputable sites allow Alice to email a link to Bob or to refer Bob to their site. Such legitimate referrals tend to have the following properties:
  • Alice knows Bob, and believes that he—individually—would be interested in the information or service referred
  • Bob's email address is manually entered in the referral form
  • the email is sent once when Alice submits the form
  • Alice's email address is not kept by the referral system, since further emails would not be legitimate
  • the message makes the circumstances in which the email was sent clear and explicit, so that Bob knows from reading the message that Alice visited a site and genuinely thought he would be interested.

In contrast, sms.ac
  • operate a bulk, automated, and non-discriminative collection of email addresses
  • they keep Bob's email address and send him recurring emails on behalf of Alice, without Alice being aware of it
  • they imply Alice was involved in sending the messages
  • Bob has to take explicit action to stop receiving emails.

I wrote to SMS.ac to demand they cease sending me these unsolicited emails. I noted in my email that their messages were unsolicited, automated and repeated commercial emails, with no prior contact between the parties; therefore clearly qualifying as spam. I also noted that by using your acquaintance's name in the emails they imply the messages come from him/her, which is a form of identity theft. I got a childish and contemptuous reply, but I have not received any spam from them since. So complaining to them can help your particular case, but it won't make them change their practice: SMS.ac boast a very large customer base, which is certainly largely due to the unethical methods they employ to enroll them. If you don't like the way they are doing business, tell their prospective customers openly.

Some actions we can take to oppose and help publicize sms.ac's fraudulent practices are:

  1. systematically report their mails as spam ("report spam" button) if you are using a webmail service (Yahoo!,...)
  2. post an appropriate review on Alexa and/or blog about it
  3. make sure the person who gave out your email address to SMS.ac is aware of the scam.

If you have also been spammed by SMS.ac, please post a comment below.

All trademarks, names, and services referenced above belong to their respective owner.

June 04, 2005

Spam: you probably know about email address harvesting, what about email gleaning?

Most people are aware that e-mail addresses posted on the web are being harvested by spammers using software that crawles the web from page to page following links and retrieves everythings that looks like an email address. This happens on any website or newsgroup, including forums and of course blogs.
The one thing spammers need to decide is where on the web to start collecting addresses. On this matter, they proceed as everyone else looking for information: they either use a directory, or use a search engine. Initiating the crawl from a large web directory like the Open Directory and its derivatives gives them the option to target certain categories of victims: personal sites, small businesses, or universities, thereby giving more value to their email dataset. Using search engines allows them to attempt to shortlist sites containing up-to-date contact information (e.g. searching for “contact 2005”). The search results can then be used as pre-processed data for further automated email address extraction.

More surprising is the fact that there are some people out there spending their days manually gleaning email addresses on the web. They are mostly connecting from Internet cafés in places like Ivory-Coast or Nigeria and use tools such as Google, Yahoo or search engine aggregators to look for email addresses using queries like “contact john 2005” or “email me 2005”. Look for “2005” in your webserver log and chances are you will find evidences of this happening on your site.
There are ways to help avoid automated email harvesting without sacrificing too much web usability (i.e. using encoded email links). There are also ways to help prevent manual email address collection: a simple thing to do is to remove the year appearing in the copyright notice of your contact page, and replace it with a simple script:

<script type="text/javascript"><!--
document.write((new Date()).getFullYear())
//--></script>

Copyright ©2007 Syronex