Posts tagged ‘hindi’

Behold!

My name, written in Hindi, written in Unicode:

ऐलन डेिवडसन

Yeah, that’s right—real programmers code in binary (or hexadecimal, if they get lazy). The coolest thing about this is that if I had been more confident, I could have done it without getting help from the Internet. but I wasn’t, so I double checked stuff online. I’m still not entirely sure I got it right, so if you or someone you know is familiar with the Devanagari alphabet, please double check my spelling. I have written this so that people who don’t have Hindi vowel-rendering turned on (which I suspect is the majority of my readers) will see this correctly, while anyone who actually has a computer set up to read Hindi/Sanskrit/&c will think the ि and व should be swapped. I’m aware of the problem, but can’t fix it for everyone.

Unicode is surprisingly intricate: like x86 machine code, UTF-8 (the most common encoding of Unicode, since it’s backwards compatible with ASCII) and UTF-16 use a variable-length encoding for characters, so that common character sets like ASCII take up less room than uncommon ones like Braille (which is not as widespread on the Internet as it is elsewhere). Unicode text files typically start off with a Byte-Order Mark, which describes the basic unit size of characters along with the endianness of the machine on which it was encoded; these BOMs are partly why it’s such a universal encoding system. Unicode actually raises some pretty challenging questions in terms of “alphabetical” sorting and accent placement, and even presents some security problems by opening the way for homograph phishing attacks (for instance, see this Shmoo article on IDN attacks, which mentions that www.pаypal.com can be registered with a Cyrillic first ‘а’ and could be full of scams. Yes, I have written both the URL and the ‘а’ with the actual Cyrillic letter).

Yes, it’s totally dorky to learn about Unicode, but it’s actually kinda cool at the same time.