Posts tagged ‘unicode’

Random stuff

Εὕρηκα! I have found it: final and conclusive proof that I am not the biggest nerd in the world!

This past weekend, Emily surprised me by coming down to visit. We went to the 3rd Street Promenade and had sushi (her first time eating real sushi). We also went to Venice beach and watched this dude catch a flounder. We tried to go see Grauman’s Chinese Theater, but they were opening The Fountain, and we couldn’t get through the throngs of Hugh Jackman fans. So instead, we drove up into the foothills and found some excellent views of the city. I introduced her to Jeeves and Wooster and Penn & Teller’s Bullshit. A good time was had by all.

I’ve been making some really stupid mistakes at work. Hopefully I can put a stop to them and get back to doing stuff correctly.

and I know I need to post news soon…

Integer Factorisation program

I wrote an integer factorisation program (Java bytecode also available for those without a compiler) using an algorithm I just made up, and it works surprisingly well (significantly better than brute force, not nearly as good as the best-known algorithms out there today). Yes, it still has exponential running time, but I thought it was a neat idea.

A summary of the algorithm →

Behold!

My name, written in Hindi, written in Unicode:

ऐलन डेिवडसन

Yeah, that’s right—real programmers code in binary (or hexadecimal, if they get lazy). The coolest thing about this is that if I had been more confident, I could have done it without getting help from the Internet. but I wasn’t, so I double checked stuff online. I’m still not entirely sure I got it right, so if you or someone you know is familiar with the Devanagari alphabet, please double check my spelling. I have written this so that people who don’t have Hindi vowel-rendering turned on (which I suspect is the majority of my readers) will see this correctly, while anyone who actually has a computer set up to read Hindi/Sanskrit/&c will think the ि and व should be swapped. I’m aware of the problem, but can’t fix it for everyone.

Unicode is surprisingly intricate: like x86 machine code, UTF-8 (the most common encoding of Unicode, since it’s backwards compatible with ASCII) and UTF-16 use a variable-length encoding for characters, so that common character sets like ASCII take up less room than uncommon ones like Braille (which is not as widespread on the Internet as it is elsewhere). Unicode text files typically start off with a Byte-Order Mark, which describes the basic unit size of characters along with the endianness of the machine on which it was encoded; these BOMs are partly why it’s such a universal encoding system. Unicode actually raises some pretty challenging questions in terms of “alphabetical” sorting and accent placement, and even presents some security problems by opening the way for homograph phishing attacks (for instance, see this Shmoo article on IDN attacks, which mentions that www.pаypal.com can be registered with a Cyrillic first ‘а’ and could be full of scams. Yes, I have written both the URL and the ‘а’ with the actual Cyrillic letter).

Yes, it’s totally dorky to learn about Unicode, but it’s actually kinda cool at the same time.