Today I had to map free text to plausible filenames, with the caveat that the text could contain UTF-8 characters with accents. Even though it is possible to have filenames with these characters, I wanted to end up with ASCII-only filenames for easier handling. Also, the filenames will be exposed via URLs, and just having ASCII there takes away a log of headaches. But how to convert this?

I quickly found the apparently wonderful Text::Unidecode for Perl which seemed to do anything I wanted, but since we build our web services with Ruby on Rails I needed a Ruby solution. I hoped that someone would already have created a ruby version of Text::Unidecode, but that’s not the case (or I could not find it). I did find the Asciify gem, though. Although simpler in design and reach than Text::Unidecode, it does enough for my purposes and custom mappings can be created for it.

Asciify’s documentation is pretty much non-existing, but some reading of the source code revealed that this was how I could convert my text:, ‘_’)).convert(‘some text’)

The default replacement character for Asciify is a question mark, which makes sense in general, but not in URLs, so I opted to use the underscore character instead for lack of a better candidate. Since I’ve included the gem as a plugin in the Rails project I’ve just changed the default mapping to include some characters rather than using my own mapping.

Published on 27/02/2007 at 16h59 by Hans de Graaff, tags

comment Asciify

Powered by Publify | Photo Startup stock photos