For all you computer security types

I’m sure all the CS people reading this (and maybe even some of the non-CS types!) are familiar with buffer overflow attacks, and know how to both protect against them and exploit them in other people’s code, or at least have a vague idea about how to do it. However, fewer people have heard of format string attacks. Here’s a fairly detailed explanation, but I’ll summarize:

If, in your C or C++ code, you write printf(foo) (where foo is typically a const char*), it will just print foo to the screen. The one exception here is when foo contains the percent sign, in which case it prints corresponding things from the stack. If there are more %’s in the string than there are other things in the stack frame, it will begin printing out previous parts of foo itself. If foo was defined as input from a clever yet malicious user, they can craft strings that do nasty things to your program. Most importantly, they can read from (using %08x) and even write to (using %n) arbitrary locations in memory. Given that, they can pretty much do anything they want on your machine. Nifty!

The simple and obvious way to avoid this attack is to change all instances of printf(foo) in your code to printf("%s", foo) instead. The less obvious but much better solution is to not code in C or C++ ever again, and instead use a modern, high-level language like Python or Java (or if you’re Michael and worry about the speed of your program, use an actual low-level language like Assembly).

Leave a Reply

11 Comments

  1. Have a look at “cross-site scripting vulnerabilites”; this is basically the same thing, if I understand your post (and “XSS”, as they call them) correctly.

    Which is to say, (a) NEVER TRUST YOUR USER, and (b) many languages have a “quote” or “escape” function on strings that will go through a string and escape all the control characters it contains.

    That’s a pretty stylish hack, I must say.

    • Alan says:

      They’re similar in that both of them (along with buffer overflow attacks, SQL injection, and most other security problems) are caused by programs not validating their inputs.

      XSS (as discussed in this very informative tech talk around minute 55) is a web-based vulnerability in which websites allow users to post arbitrary code on the site. As a general rule, websites with user logins put unique “session ID” cookies on users’ machines to validate their sessions after they log in (and remove these cookies when they log out). If you can get someone else’s cookie, you can log in as them (at least for a while) without that tedious business of actually logging in.

      XSS works by putting a small piece of Javascript on a website. When other users go to this website, their browsers run your Javascript, which sends the users’ cookies, encapsulated within, say, an image request (for a single invisible pixel, typically) to your server. You can now take these cookies and log in as the other users. This only works on websites that allow you to post arbitrary code without validation, and gives you access to accounts of other users who view your code (but only for accounts on this website). It does not give you access to the webserver itself.

      String format attacks only work on programs written in C or C++. They are used to own the box on which the program is run, and will not directly interact with any other users running the program.

      But the way to stop both (and most other) attacks is to simply validate all input from users.

      • I’ve got my terminology mixed up, then; I’m thinking of the case where you echo back user input on a web page, and the user inputs something fun that makes your page do things you didn’t mean for it to. That way, you could, say, run arbitrary PHP code, and have an absolute ball screwing people up. Again, with the “validate your input” meme that it’s good to see people echoing.

        It’s funny, because the first thing I thought when you said “…C++…” was “Prof. O’Neill would kill me if I used printf() in a C++ program”, even though she’s thousands of miles from here. ostreams and << all the way, was the message that got pounded into our heads. (Even though formatted printing this way is a pain in the ass).

        An interesting side note is that lots of languages implement variations on printf(); strings in Python have __mod__ set up to take iterable structures and format them, for instance (more an “sprintf” than a “printf”, strictly speaking).

        OCaml (which I’ve been producing a lot of, lately) has some crazy, crazy voodoo with format strings, because it has to keep them type-safe; the result is that (as far as I’ve been able to tell, and I haven’t looked very hard) they are partially evaluated at compile-time, or something equally weird.

        Off-topic: Duke is offering (undergrad) topics in robotics next term, but I don’t have to start TAing until next year…I wonder if they’ll let me sit in.

  2. mockery0 says:

    You don’t really think assembly is better than C/C++ for something like a game, do you? How would you recommend writing a performance-intensive application that’s shipping on three dramatically different hardware targets? (eg. Xbox, Playstation and PC…)

    :)

    I’m pretty sure C++ is still The Right Language to be using for games and other high-performance programs, although a well-integrated higher-level scripting language is nice for things like game logic if you can keep it lean and mean…

  3. jcmdev0 says:

    You can write YOUR operating system in python.

    • Alan says:

      I’ll grant that it would be slower and probably have poor memory use, but it would certainly be more secure (assuming you could get a boot loader to start up a python script in the first place).

      • Alan says:

        and come to think of it, I’d need to expunge all C++ code from the machine or else get it to run on a VM to avoid this problem. Perhaps it would be simplest to run VMware as my main OS? I’d rather not make an x86 interpreter for all the 3rd party code out there. Perhaps it would be best just to completely rewrite everything with security in mind.

        \sheep{Perhaps I’ll just suffer these security problems, since fixing them is a hard task.}

        • Anonymous says:

          There is a certain appeal to running a java os in a virtual machine. I have strong doubts that you will be able to replace the C glue once you get close enough to the raw silicon. (You still need to run the VMM on something).

          I wouldn’t be terribly upset if we threw out the x86 architecture. Aside from the economic impact of everyone upgrading, it would let us get rid of a mammoth pile of backwards compatibility garbage in hardware and software.

          • Python would be a terrible language to write an OS in, at the very least because the way imports work makes it completely impossible to run anything set{u,g}id.

            You folks probably saw this (I think it was on /.) but Sun’s been working on embedding a Java VM in the Solaris kernel, so they can write device drivers in Java.

            Why you would want to do that is beyond me, but it’s sort of the same idea.

  4. code65536 says:

    I’m still bitter at Python being a whitespace nazi. :P

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>