Sigh. I guess it is a good sign that I’m researching CAPTCHA’s at the moment. Spammers finally seem to have found our site watvindenwijover.nl and deemed it good enough to spam it. Surely a sign of succes, but also a nuisance for the normal people using the site. Making the signup procedure a bit more difficult for computers seems the way to go, so some kind of CAPTCHA has to go in. The other approach would be to use a DNS-based blocking list, e.g. through an apache module like mod-defensible but in general I’m not a big fan of blocking lists due to false positives.
I’ve ended up cobbling my own invisible captcha together based on the ideas put forward in the .NET article mentioned above, and so far it seems to work fine, keeping the spambots out while letting normal people in. I’d link to our signup page so you can see for yourself, except that it’s an invisible CAPTCHA. :-)
Today I had to map free text to plausible filenames, with the caveat that the text could contain UTF-8 characters with accents. Even though it is possible to have filenames with these characters, I wanted to end up with ASCII-only filenames for easier handling. Also, the filenames will be exposed via URLs, and just having ASCII there takes away a log of headaches. But how to convert this?
I quickly found the apparently wonderful Text::Unidecode for Perl which seemed to do anything I wanted, but since we build our web services with Ruby on Rails I needed a Ruby solution. I hoped that someone would already have created a ruby version of Text::Unidecode, but that’s not the case (or I could not find it). I did find the Asciify gem, though. Although simpler in design and reach than Text::Unidecode, it does enough for my purposes and custom mappings can be created for it.
Asciify’s documentation is pretty much non-existing, but some reading of the source code revealed that this was how I could convert my text:
Asciify.new(Asciify::Mapping.new(:default, '_')).convert('some text')
The default replacement character for Asciify is a question mark, which makes sense in general, but not in URLs, so I opted to use the underscore character instead for lack of a better candidate. Since I’ve included the gem as a plugin in the Rails project I’ve just changed the default mapping to include some characters rather than using my own mapping.