## How to swap two integers in C++

Warning: this post got embarrassingly long. For the short version, read up through the paragraph that starts with “If you use.”

This is something that’s been bugging me for a while, and I’d like to lay it to rest. If you’ve done any work in C++, you’ve probably heard the riddle about how to swap two integers without using a temporary variable. The question is exactly what it sounds like: you have two integers (call them `x` and `y`), you want to swap their values, and you may or may not want to use a third, temporary variable while doing it. Here are some ways of swapping integers that I have actually seen in professional code written by professional coders:

Method Name Code
Method A
```{  // Limit the scope of t
int t = x;
x = y;
y = t;
}```
Method B
```x ^= y;
y ^= x;
x ^= y;```
Method C
`x ^= y ^= x ^= y;`
Method D
```x = x + y;
y = x - y;
x = x - y;```
Method E
`std::swap(x, y);`

Which of these is the fastest way to swap integers? Which way is the fastest method that doesn’t require extra memory? When will you want to use a different method? What is the overall best method?

When you’ve decided on your answers, let’s take a closer look. No, on second thought, this got really long. For those of you who don’t want to read all the way to the end, let me just get my main point out of the way. After that, we’ll take a closer look. If you don’t want to spoil the ending, skip the next paragraph.

If you use the xor trick to swap integers, STOP IT! Unless you have a brain-dead compiler, the naive swap with a temporary integer doesn’t use any more space than the xor swap, but it is easier to read and often requires less CPU time. On top of this, the xor swap may actually have incorrect behavior if you do it all on one line (a la Method C). You’re not being cute or clever by using this trick; you’re actually making your code worse. Yes, the xor swap is an interesting mathematical curiosity, but it should never be used in real code. Just write what you mean (i.e. use either Method A or E) and let the compiler handle it, because it can compile things better than you can. and in the (far-fetched) event that it can’t do this right and you really do need those extra couple bytes of stack space, either get a better compiler or write it in assembly. but stop propagating the myth that the xor swap is a good coding trick.

Now that I’ve gotten that out of my system, we can continue. Throughout this post, I’ll be using g++ 4 to compile stuff for a 32-bit x86 Linux distribution, because that’s what I happen to have on hand at the moment. This post applies to many other compilers for most other architectures and all other OSes, too, but now you have a way of verifying my work. The g++ command line flags I will use are `-Wall` (to turn on all warnings) and `-g` (to add in debugging strings so I can more easily see what’s going on when I disassemble the binaries). I will also be using `objdump -S` to look at the disassembled machine code. We’ll start out with something simple and straightforward, and compile in the simple and straightforward way:

```username@computername\$cat swap.cc
#include <iostream>  // Contains std::cout and std::endl
#include <algorithm>  // Contains std::swap

void PrintValues(int x, int y) {
std::cout << "x is " << x << std::endl;
std::cout << "y is " << y << std::endl;
}

int main() {
int x = 3;
int y = 5;

PrintValues(x, y);

{
int t = x;  // Method A
x = y;
y = t;
}

PrintValues(x, y);

x ^= y;  // Method B
y ^= x;
x ^= y;

PrintValues(x, y);

x ^= y ^= x ^= y;  // Method C

PrintValues(x, y);

x = x + y;  // Method D
y = x - y;
x = x - y;

PrintValues(x, y);

std::swap(x, y);  // Method E

PrintValues(x, y);
}
swap.cc: In function 'int main()':
swap.cc:29: warning: operation on 'x' may be undefined

Well, there's our first snafu—Method C is not actually valid C++! From the fourth paragraph of Section 5 of the C++ standard, "[b]etween the previous and next sequence point a scalar object shall have its stored value modified at most once...; otherwise the behavior is undefined." In case you're not familiar with sequence points, they're defined in Section 1.9 Paragraph 7 as the points where "all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place." In this code, the important sequence points are all the semicolons (there are some others, but they don't have any effect in this example, so I will ignore them). In particular, the one-line xor swap modifies the value of `x` twice without a sequence point in between, so the behavior isn't well-defined. One way to think about it is that we can't know the value of `x` after that statement: without any internal sequence points, we can't tell whether the store on the left or the right happens last. In practice, Method C usually does swap the integers correctly, but the standard has no such guarantees. Indeed, the behavior is undefined, which technically means that running this code could very well wipe your hard drive and then start downloading kiddie porn. It's unlikely that anyone will write a compiler which, when faced with this code, creates a program with that behavior, but such a malicious compiler would totally conform to the C++ standard. So Method C is definitely out, and will be deleted from all further analysis.

With that removed, everything else compiles fine and runs fine and has the expected behavior. Let's disassemble the binary we just compiled and take a look at exactly what it's doing:

```  int x = 3;
80486da: c7 45 f8 03 00 00 00  movl   \$0x3,-0x8(%ebp)
int y = 5;
80486e1: c7 45 f4 05 00 00 00  movl   \$0x5,-0xc(%ebp)

PrintValues(x, y);
80486e8: 8b 45 f4              mov    -0xc(%ebp),%eax
80486eb: 8b 55 f8              mov    -0x8(%ebp),%edx
80486ee: 89 44 24 04           mov    %eax,0x4(%esp)
80486f2: 89 14 24              mov    %edx,(%esp)
80486f5: e8 52 ff ff ff        call   804864c <_Z11PrintValuesii>

{
int t = x;  // Method A
80486fa: 8b 45 f8              mov    -0x8(%ebp),%eax
80486fd: 89 45 fc              mov    %eax,-0x4(%ebp)
x = y;
8048700: 8b 45 f4              mov    -0xc(%ebp),%eax
8048703: 89 45 f8              mov    %eax,-0x8(%ebp)
y = t;
8048706: 8b 45 fc              mov    -0x4(%ebp),%eax
8048709: 89 45 f4              mov    %eax,-0xc(%ebp)
}

PrintValues(x, y);
804870c: 8b 45 f4              mov    -0xc(%ebp),%eax
804870f: 8b 55 f8              mov    -0x8(%ebp),%edx
8048712: 89 44 24 04           mov    %eax,0x4(%esp)
8048716: 89 14 24              mov    %edx,(%esp)
8048719: e8 2e ff ff ff        call   804864c <_Z11PrintValuesii>

x ^= y;  // Method B
804871e: 8b 55 f8              mov    -0x8(%ebp),%edx
8048721: 8b 45 f4              mov    -0xc(%ebp),%eax
8048724: 31 d0                 xor    %edx,%eax
8048726: 89 45 f8              mov    %eax,-0x8(%ebp)
y ^= x;
8048729: 8b 55 f4              mov    -0xc(%ebp),%edx
804872c: 8b 45 f8              mov    -0x8(%ebp),%eax
804872f: 31 d0                 xor    %edx,%eax
8048731: 89 45 f4              mov    %eax,-0xc(%ebp)
x ^= y;
8048734: 8b 55 f8              mov    -0x8(%ebp),%edx
8048737: 8b 45 f4              mov    -0xc(%ebp),%eax
804873a: 31 d0                 xor    %edx,%eax
804873c: 89 45 f8              mov    %eax,-0x8(%ebp)

PrintValues(x, y);
804873f: 8b 45 f4              mov    -0xc(%ebp),%eax
8048742: 8b 55 f8              mov    -0x8(%ebp),%edx
8048745: 89 44 24 04           mov    %eax,0x4(%esp)
8048749: 89 14 24              mov    %edx,(%esp)
804874c: e8 fb fe ff ff        call   804864c <_Z11PrintValuesii>

x = x + y;  // Method D
8048751: 8b 55 f8              mov    -0x8(%ebp),%edx
8048754: 8b 45 f4              mov    -0xc(%ebp),%eax
8048757: 8d 04 02              lea    (%edx,%eax,1),%eax
804875a: 89 45 f8              mov    %eax,-0x8(%ebp)
y = x - y;
804875d: 8b 55 f8              mov    -0x8(%ebp),%edx
8048760: 8b 45 f4              mov    -0xc(%ebp),%eax
8048763: 89 d1                 mov    %edx,%ecx
8048765: 29 c1                 sub    %eax,%ecx
8048767: 89 c8                 mov    %ecx,%eax
8048769: 89 45 f4              mov    %eax,-0xc(%ebp)
x = x - y;
804876c: 8b 55 f8              mov    -0x8(%ebp),%edx
804876f: 8b 45 f4              mov    -0xc(%ebp),%eax
8048772: 89 d1                 mov    %edx,%ecx
8048774: 29 c1                 sub    %eax,%ecx
8048776: 89 c8                 mov    %ecx,%eax
8048778: 89 45 f8              mov    %eax,-0x8(%ebp)

PrintValues(x, y);
804877b: 8b 45 f4              mov    -0xc(%ebp),%eax
804877e: 8b 55 f8              mov    -0x8(%ebp),%edx
8048781: 89 44 24 04           mov    %eax,0x4(%esp)
8048785: 89 14 24              mov    %edx,(%esp)
8048788: e8 bf fe ff ff        call   804864c <_Z11PrintValuesii>

std::swap(x, y);  // Method E
804878d: 8d 45 f4              lea    -0xc(%ebp),%eax
8048790: 89 44 24 04           mov    %eax,0x4(%esp)
8048794: 8d 45 f8              lea    -0x8(%ebp),%eax
8048797: 89 04 24              mov    %eax,(%esp)
804879a: e8 8b 00 00 00        call   804882a <_ZSt4swapIiEvRT_S1_>

PrintValues(x, y);
804879f: 8b 45 f4              mov    -0xc(%ebp),%eax
80487a2: 8b 55 f8              mov    -0x8(%ebp),%edx
80487a5: 89 44 24 04           mov    %eax,0x4(%esp)
80487a9: 89 14 24              mov    %edx,(%esp)
80487ac: e8 9b fe ff ff        call   804864c <_Z11PrintValuesii>```

In case you're not familiar with assembly code, let me give a quick explanation of what's going on here (if you're already comfortable with it, skip this paragraph). The leftmost column is the position of this code within the program; you can ignore it. The bytes that come after it are the actual machine code generated by the compiler; you can ignore them. Then there is a column of the assembly instructions that correspond to the machine code we're ignoring, followed by the arguments to those instructions. These last two columns are the important part, because they're nearly readable and they describe exactly what the computer is doing. Interspersed throughout are the lines of C++ that were compiled to get the machine code just below them. Things that start with % are registers, which are the locations in the processor that store the data you're working with. Parentheses around registers denote that the register contains a pointer to somewhere in memory, and it's talking about that memory location instead of the value in the register itself. The assembly instructions used here can move data from memory to registers and back again (the `mov` and `movl` instructions), call other functions (the `call` instruction), take the exclusive or of two values (the `xor` instruction), load and add values (`lea`, which admittedly is confusing because it is used here both to add numbers together and to pass references into `std::swap`), and subtract values (`sub`). If you're not very comfortable with this, don't worry; I'll walk you through the important parts.

The compiled instructions are pretty straightforward here. `x` is stored 8 bytes below the pointer stored in register `%ebp`, and `y` is stored 12 (0xc) bytes below that same pointer. To call `PrintValues`, we push `y` and then `x` onto the top of the stack (by loading them into registers `%eax` and `%edx` and then putting them near where the stack pointer register, `%esp`, points) and then we call `PrintValues` (whose name has been changed slightly but is still recognizable). Note that the arguments are pushed onto the stack in the reverse order that we listed them in the code. Method A loads `x` into the register `%eax`, then stores it again 4 bytes before `%ebp` (presumably this is `t`). It then loads `y` and stores it where `x` goes, etc, just the way you'd expect. For Method B, we first load the variables into registers `%edx` and `%eax`, we xor them together so that the result is in `%eax`, and we store that value back into memory. We then repeat this twice more for the other two xors. The rest of the code should be equally straightforward (maybe not counting the `lea` instructions, but trust me that it's doing what you'd expect there).

If this were our final compiled program, I would totally agree that the naive swap requires an extra 4 bytes of memory in the stack frame, and therefore truly does require an extra temporary variable that the rest of the methods don't need. However, we compiled with the default optimization level, which has absolutely no optimizations at all whatsoever. If you use g++ and/or GCC, you should really use optimization level 2 for any release-worthy code, and only use level 0 for debugging purposes. So let's try this again with the `-O2` flag passed to g++ as well, and see what machine code we get this time:

```  int x = 3;
int y = 5;

PrintValues(x, y);
80486dc: c7 44 24 04 05 00 00  movl   \$0x5,0x4(%esp)
80486e3: 00
80486e4: c7 04 24 03 00 00 00  movl   \$0x3,(%esp)
80486eb: e8 7c ff ff ff        call   804866c <_z11printvaluesii>

{
int t = x;  // Method A
x = y;
y = t;
}

PrintValues(x, y);
80486f0: c7 44 24 04 03 00 00  movl   \$0x3,0x4(%esp)
80486f7: 00
80486f8: c7 04 24 05 00 00 00  movl   \$0x5,(%esp)
80486ff: e8 68 ff ff ff        call   804866c <_z11printvaluesii>

x ^= y;  // Method B
y ^= x;
x ^= y;

PrintValues(x, y);
8048704: c7 44 24 04 05 00 00  movl   \$0x5,0x4(%esp)
804870b: 00
804870c: c7 04 24 03 00 00 00  movl   \$0x3,(%esp)
8048713: e8 54 ff ff ff        call   804866c <_z11printvaluesii>

x = x + y;  // Method D
y = x - y;
x = x - y;

PrintValues(x, y);
8048718: c7 44 24 04 03 00 00  movl   \$0x3,0x4(%esp)
804871f: 00
8048720: c7 04 24 05 00 00 00  movl   \$0x5,(%esp)
8048727: e8 40 ff ff ff        call   804866c <_z11printvaluesii>

swap(x, y);  // Method E

PrintValues(x, y);
804872c: c7 44 24 04 05 00 00  movl   \$0x5,0x4(%esp)
8048733: 00
8048734: c7 04 24 03 00 00 00  movl   \$0x3,(%esp)
804873b: e8 2c ff ff ff        call   804866c <_z11printvaluesii>```

Whoops! Here, all the variables have been completely optimized away! We just push the values 3 and 5 onto the stack when we want to call `PrintValues`, and continually swap their order. In this case, all four swapping methods have the same performance: none of them take any CPU cycles at all, and none of them take any memory at all. Clearly, this is not what people have in mind when they ask about swapping integers. So, I need a way to force the values to be stored somewhere, and an easy way to do that is to conditionally modify their values. I'll make some test that the compiler hopefully can't figure out, and set `x` and `y` to 0 if the test turns out true. The complier won't know that the result is always false and no variables are modified, so it will need to store their values somewhere just in case we do modify them. To accomplish this, I'll use the following function:

```bool NontrivialFunction(int x, int y, int base) {
// I need a way to force x and y to be stored instead of completely optimized
// away. So, here's the plan: if either of them are not prime, I will return
// true and set them to zero.
if (x < base * base && y < base * base)
return false;
if (x % base == 0)
return true;
if (y % base == 0)
return true;
return NontrivialFunction(x, y, base + 1);
}
...
// and this part is added to main() just after defining x and y
if (NontrivialFunction(x, y, 2)) {
x = 0;
y = 0;
}```

`NontrivialFunction` is reminiscent of a recursive primality checker, and because it's recursive, the compiler can't inline it (and probably won't try to precompute it). Since `x` and `y` may change depending on the result, the compiler will need to store them somewhere so that it's possible to set them to zero if necessary. So now let's compile again (with `-O2` for optimization again), and look at the result:

```  int x = 3;
int y = 5;
if (NontrivialFunction(x, y, 2)) {
8048739: c7 44 24 08 02 00 00  movl   \$0x2,0x8(%esp)
8048740: 00
8048741: c7 44 24 04 05 00 00  movl   \$0x5,0x4(%esp)
8048748: 00
8048749: c7 04 24 03 00 00 00  movl   \$0x3,(%esp)
8048750: e8 b7 fe ff ff        call   804860c <_Z18NontrivialFunctioniii>
8048755: 84 c0                 test   %al,%al
8048757: 74 52                 je     80487ab <main+0x81>
8048759: 31 db                 xor    %ebx,%ebx
804875b: 31 f6                 xor    %esi,%esi
804875d: 31 ff                 xor    %edi,%edi
x = 0;
y = 0;
}

PrintValues(x, y);
804875f: 89 74 24 04           mov    %esi,0x4(%esp)
8048763: 89 1c 24              mov    %ebx,(%esp)
8048766: e8 5b ff ff ff        call   80486c6 <_Z11PrintValuesii>

{
int t = x;  // Method A
x = y;
y = t;
}

PrintValues(x, y);
804876b: 89 5c 24 04           mov    %ebx,0x4(%esp)
804876f: 89 34 24              mov    %esi,(%esp)
8048772: e8 4f ff ff ff        call   80486c6 <_Z11PrintValuesii>

x ^= y;  // Method B
y ^= x;
8048777: 31 fb                 xor    %edi,%ebx
x ^= y;
8048779: 89 de                 mov    %ebx,%esi
804877b: 31 fe                 xor    %edi,%esi

PrintValues(x, y);
804877d: 89 5c 24 04           mov    %ebx,0x4(%esp)
8048781: 89 34 24              mov    %esi,(%esp)
8048784: e8 3d ff ff ff        call   80486c6 <_Z11PrintValuesii>

x = x + y;  // Method D
y = x - y;
x = x - y;

PrintValues(x, y);
8048789: 89 74 24 04           mov    %esi,0x4(%esp)
804878d: 89 1c 24              mov    %ebx,(%esp)
8048790: e8 31 ff ff ff        call   80486c6 <_Z11PrintValuesii>

std::swap(x, y);  // Method E

PrintValues(x, y);
8048795: 89 5c 24 04           mov    %ebx,0x4(%esp)
8048799: 89 34 24              mov    %esi,(%esp)
804879c: e8 25 ff ff ff        call   80486c6 <_Z11PrintValuesii>

}
80487a1: 31 c0                 xor    %eax,%eax
80487a3: 8d 65 f4              lea    -0xc(%ebp),%esp
80487a6: 5b                    pop    %ebx
80487a7: 5e                    pop    %esi
80487a8: 5f                    pop    %edi
80487a9: 5d                    pop    %ebp
80487aa: c3                    ret
int x = 3;
int y = 5;
if (NontrivialFunction(x, y, 2)) {
80487ab: bb 03 00 00 00        mov    \$0x3,%ebx
80487b0: be 05 00 00 00        mov    \$0x5,%esi
80487b5: bf 06 00 00 00        mov    \$0x6,%edi
80487ba: eb a3                 jmp    804875f <main+0x35>```

This assembly is a bit harder to read, so let's start at the top: We push 2, then 5, then 3 onto the stack and call `NontrivialFunction` (remember that arguments on the stack are in reverse order, so this is really calling `NontrivialFunction(3, 5, 2)`) without yet assigning any of those values to anything you could call `x` or `y`. Once that is done, the result is stored in register `%al`, and we `test` whether it's zero (it looks like we're comparing it to itself, but it's actually just checking the sign of `%al`). If it's zero (false), we jump (`je`) down to the bottom (to line 80487ab). Once there, we store the values 3 and 5 into registers `%ebx` and `%esi` (and the value 6, which is the precomputed value of `x ^ y`, into register `%edi`; this will be used in Method B). Then we jump (`jmp`) back up to the first call to `PrintValues`, and continue on. If we had wanted to go through the if statement (i.e. if the result had been nonzero/true), we wouldn't have jumped and instead we'd have gone through the three `xor`'s and zeroed out the values (a number xor'ed with itself is 0). Don't worry about the `pop`'s and `ret` at the end; that's just some clean-up done as the `main` function finishes up, and we can safely ignore it.

So at this point, our trick has worked: `x` is stored in register `%ebx` and `y` is in `%esi`. but now take a look at the rest of the code: for most of it, the swapping code is removed completely, and we just call `PrintValues(x, y)` and then `PrintValues(y, x)` to "swap" them! Most of these swaps take absolutely no CPU time or extra memory. The one exception, however, is Method B, which only got half-optimized away (that first xor is precomputed, but the rest is still there). so in this case, all methods can be completely optimized away by the compiler except for the xor swap, which is slower and unambiguously worse.

"but surely this isn't what real code looks like either," I hear the defenders of xor say. "Surely real code stores the variables in memory instead of directly in registers, since the registers are presumably used for much more than just storing our two piddly little integers." Fair enough; let's take a look! To get the variables stored in memory, we'll need something that makes use of these locations, or in other words something that requires references to the variables and something that the compiler doesn't inline and/or optimize away. The primality tester worked before, so let's just adapt that a little:

```void NontrivialReferenceFunction(int& x, int& y) {
// This is really just a fancy no-op, but the compiler doesn't know that.
if (NontrivialFunction(x, y, 2)) {
NontrivialReferenceFunction(x, y);
}
}
...
// In main replacing the part about NontrivialFunction that we added in last time
NontrivialReferenceFunction(x, y);```

We re-compile, and re-dump the assembly to get this:

```  int x = 3;
8048768: c7 45 fc 03 00 00 00  movl   \$0x3,-0x4(%ebp)
int y = 5;
804876f: c7 45 f8 05 00 00 00  movl   \$0x5,-0x8(%ebp)

NontrivialReferenceFunction(x, y);
8048776: 8d 45 f8              lea    -0x8(%ebp),%eax
8048779: 89 44 24 04           mov    %eax,0x4(%esp)
804877d: 8d 45 fc              lea    -0x4(%ebp),%eax
8048780: 89 04 24              mov    %eax,(%esp)
8048783: e8 de fe ff ff        call   8048666 <_Z27NontrivialReferenceFunctionRiS_>

PrintValues(x, y);
8048788: 8b 45 f8              mov    -0x8(%ebp),%eax
804878b: 89 44 24 04           mov    %eax,0x4(%esp)
804878f: 8b 45 fc              mov    -0x4(%ebp),%eax
8048792: 89 04 24              mov    %eax,(%esp)
8048795: e8 5e ff ff ff        call   80486f8 <_Z11PrintValuesii>

{
int t = x;  // Method A
804879a: 8b 45 fc              mov    -0x4(%ebp),%eax
x = y;
804879d: 8b 55 f8              mov    -0x8(%ebp),%edx
80487a0: 89 55 fc              mov    %edx,-0x4(%ebp)
y = t;
80487a3: 89 45 f8              mov    %eax,-0x8(%ebp)
}

PrintValues(x, y);
80487a6: 89 44 24 04           mov    %eax,0x4(%esp)
80487aa: 89 14 24              mov    %edx,(%esp)
80487ad: e8 46 ff ff ff        call   80486f8 <_Z11PrintValuesii>

x ^= y;  // Method B
80487b2: 8b 45 f8              mov    -0x8(%ebp),%eax
80487b5: 8b 55 fc              mov    -0x4(%ebp),%edx
80487b8: 31 c2                 xor    %eax,%edx
y ^= x;
80487ba: 31 d0                 xor    %edx,%eax
80487bc: 89 45 f8              mov    %eax,-0x8(%ebp)
x ^= y;
80487bf: 31 c2                 xor    %eax,%edx
80487c1: 89 55 fc              mov    %edx,-0x4(%ebp)

PrintValues(x, y);
80487c4: 89 44 24 04           mov    %eax,0x4(%esp)
80487c8: 89 14 24              mov    %edx,(%esp)
80487cb: e8 28 ff ff ff        call   80486f8 <_Z11PrintValuesii>

x = x + y;  // Method D
80487d0: 8b 55 f8              mov    -0x8(%ebp),%edx
80487d3: 8b 45 fc              mov    -0x4(%ebp),%eax
y = x - y;
80487d8: 89 c1                 mov    %eax,%ecx
80487da: 29 d1                 sub    %edx,%ecx
80487dc: 89 4d f8              mov    %ecx,-0x8(%ebp)
x = x - y;
80487df: 29 c8                 sub    %ecx,%eax
80487e1: 89 45 fc              mov    %eax,-0x4(%ebp)

PrintValues(x, y);
80487e4: 89 4c 24 04           mov    %ecx,0x4(%esp)
80487e8: 89 04 24              mov    %eax,(%esp)
80487eb: e8 08 ff ff ff        call   80486f8 <_Z11PrintValuesii>
swap(_Tp& __a, _Tp& __b)
{
// concept requirements
__glibcxx_function_requires(_SGIAssignableConcept<_Tp>)

_Tp __tmp = __a;
80487f0: 8b 45 fc              mov    -0x4(%ebp),%eax
__a = __b;
80487f3: 8b 55 f8              mov    -0x8(%ebp),%edx
80487f6: 89 55 fc              mov    %edx,-0x4(%ebp)
__b = __tmp;
80487f9: 89 45 f8              mov    %eax,-0x8(%ebp)

std::swap(x, y);  // Method E

PrintValues(x, y);
80487fc: 89 44 24 04           mov    %eax,0x4(%esp)
8048800: 89 14 24              mov    %edx,(%esp)
8048803: e8 f0 fe ff ff        call   80486f8 <_Z11PrintValuesii>```

Now we're getting somewhere! This time our variables are stored in memory (4 and 8 bytes before the place that `%ebp` points to), and this time they really are loaded into registers, swapped by various means, and then stored back into memory, just like we wanted. Like the previous calls to `std::swap`, the `lea` command is used to get the address of (reference to) variables to pass to `NontrivialReferenceFunction`. We also now have an instance of the `add` command, which does just what you'd expect. Other than that, though, no new commands are used here. This looks like it's exactly what we wanted, so let's examine this in detail.

Method A loads both variables into registers, and then stores them back in the opposite locations. We don't actually allocate any memory at all for the temporary variable, which has been optimized away yet again. In Method B, we load the variables into registers, xor them, xor them again, store that result in memory (as `y`), xor them a final time, and stick that result back into memory as `x`. Like Method A, this requires two registers and four `mov` commands (two for loading and two for storing the data). Unlike Method A, Method B requires three extra `xor` commands to get the same result. Method D is a lot like Method B, in that it requires the same 4 `mov`'s, but also requires some extra addition and subtraction work that Method A didn't need. On top of that, it uses a third register, `%ecx`, while the previous two methods only used two registers. Method E is barely recognizeable because it's got all these strange pieces of code and variables with underscores in their names. The compiler has inlined `std::swap`, which turns out to be a templated function. Its two arguments have type `_Tp` (for our purposes, `_Tp` is `int`), and are called `__a` and `__b`. If you read the C++ code, it creates a temporary `_Tp` called `__tmp` and basically just does Method A. The assembly does the same 4 `mov`'s as Method A with nothing extra.

So to sum up, Methods A and E were the same, Method B took more time, and Method D took more time and an extra register to accomplish the same thing. In other words, when we got down to it, the naive swap didn't use a temporary variable, and still ran faster than the xor swap. It turns out the same result happens if you make your variables `volatile` (provided that you remove the call to `NontrivialReferenceFunction`, because you can't take references of volatile data). For kicks, I tried the same thing on a 64-bit x86 Linux architecture with g++ 4.1, and found that everything was optimized away into the alternating calls to `PrintValues(x, y)` and `PrintValues(y, x)`, with no actual swaps, even if I called `NontrivialReferenceFunction`. If I made the variables `volatile`, I could actually get the values to swap, but the result was the same: Methods A and E were by far the best, Method B took the same amount of space but more time, and Method D took extra space and extra time.

So now that we've beaten this code to death, I have one final tangent: there is an assembly instruction called XCHG (or EXCH, or something like that, depending on what architecture you're on) which swaps the values of two words. Unfortunately, it's generally a bad idea to use it because if you use it on a location in memory, the processor considers it a memory fence and messes up all sorts of pipelining and slows down all the code around it. So the only time you should consider using XCHG is if you want to swap two registers, and in that case, it's probably better just to remap the registers in the rest of the program (like in the assembly example I showed for the `NontrivialFunction` version), rather than explicitly swapping them.

To bring this full circle, here are my answers to the original questions: Methods A and E are the fastest way to swap two integers. Methods A and E are also the fastest way to do it without using any extra memory (even though you write C++ with a temporary variable, it gets optimized away). I can't think of a realistic situation where I'd want to use a different method, and the only unrealistic situation I can come up with is when your compiler can't optimize anything at all and makes assembler instructions like the original version we examined, and you really, truly need those extra couple bytes of stack space. However, my first course of action there would be to get a better compiler. Only after I've found that I can't get a good compiler and I can't improve the compiler I already have and I can't easily write it in assembly would I consider using Method B. I'm inclined to call Method A the best because unlike Method E it explicitly shows you what it's doing and it doesn't require linking in the entire `algorithm` library. However, there are definitely arguments for Method E, for instance that someone who knows what they're doing wrote the standard libraries and if there's a better way to do it then the standard library will probably be updated before your code. Another reason to pick Method E is that there is a whole system you can implement around `std::swap` that optimizes the swapping of larger objects, though I personally have never yet needed that optimization. but regardless, in this day and age there isn't a good excuse for using Methods B or D, despite any claims by GameDev and others to the contrary (and I strongly suspect that there wasn't any good excuse for the xor swap when that article was written in 2001, either). and under no circumstances should anyone ever use Method C because it isn't guaranteed to do what you want.

If you've actually gotten this far, thank you for reading my rant.

1. muddernh says:

I didn’t read the whole thing, but the variable swapping I did was always using method A. I didn’t know what B or C would even do at first glance, although it’s possible that syntax is somewhere back there under cobwebs…

but it’s nice to know that the simple method I know is decently good. (c=

2. kitty_tape says:

But you do have to admit D has the cute property of making you think about int addition as addition in an abelian group to convince yourself that it works in the face of overflow.

And, for no reason other than it’s been awhile since I have written a proof…

Let x’ and y’ represent the original values in x and y. After x = x + y, x == x’ + y’.

After y = x – y, y == (x’ + y’) + (-y’) == x’ + (y’ + (-y’)) == x’ + 0 == x’.

After x = x – y, x == (x’ + y’) + (-x’) == (y’ + x’) + (-x’) == y’ + (x’ + (-x’)) == y’ + 0 == y’.

Admittedly, a real proof would have provided a proper semantics for addition, subtraction, and assignment, but I am not that bored. =)

• Alan says:

Oh, I totally agree! It’s interesting that both Methods B and D rely solely on operators that define groups over the integers (note that subtraction isn’t abelian, but it’s still a group). I wonder if we can make a swapping mechanism that doesn’t require the semblance of a temporary variable and only uses operators which don’t form groups over the integers? I think we’d be limited to multiplication, division, modulo, bitwise and, bitwise or, and the 3 different shifts (and unary operators like negation and stuff). Any ideas? To make this more interesting, you aren’t allowed to implement the xor swap again by just using lots and lots of NAND gates. :-)

• Alan says:

Wait, I’m being an idiot. Subtraction over the integers is not a group because it’s not associative. However, it would almost count as an abelian group if we defined it as addition with a few unary negations thrown in at opportune times.

3. Seems to me like that’s a lot of effort to double-check that the Compiler Is Smarter Than You, which we already knew.

Also, have you really seen the x ^= y ^= x ^= y in code? There’s no way that’s allowed in the style guide.

• Alan says:

I agree that it was a lot of effort, but there are still lots of people out there who don’t yet seem to realize that the compiler is better at compiling than they are.

What is it with you harping on the style guide all the time? There is code written by other companies and organizations that either have different style guides or don’t have one at all, and I’ve seen Method C used in their code.

• It’s certainly an interesting piece of analysis; I never realized it wasted space; I just figured it was unreadable, and should be skipped on that account.

Besides, I liked the style guide! There’s a lot of high-density coding wisdom in there.

Also, I kept seeing “Method C” and thinking you’d found some new programming language that I’d never heard of. :-D

• Alan says:

I never realized it wasted space; I just figured it was unreadable

Wait, what? The point of the post is that none of these methods use any extra memory (and only method D uses any extra registers), and that the naive swap is just as space efficient as the xor swap because it doesn’t actually allocate the temporary variable you write in the code. I’ve edited the paragraph that starts with “if you use” to make that clearer.

• Well, now you’ve thoroughly shamed me for not reading the post closely enough.
:-) I’ll have to go through it in more detail.

• Alan says:

have you really seen the x ^= y ^= x ^= y in code?

Cases in point: it is (hopefully was?) in the source code for Mozilla’s SHA implementation and BSD Airtools. I wish I could get a real count of how often Google Code Search can find it, but either I can’t escape the regex right or GCS doesn’t actually support all of the POSIX extended regex expressions (in particular, I can’t get \(…\) and \1 to work).

4. dhalps says:

I ran the same test a few years back, and yes, the obvious thing to do is the best. Wheeee!

FYI, I’d recommend you don’t generalize the conclusion this post implies that Method E is as good as Method A. As an example, here’s some C++ pseudocode:

```complex array a;
complex array b;
complex array c;
for i in 1..k
c[i] = a[i] * b[i]; // choice 1
d[i] = a[i].mag(); // choice 2; note that mag is a function in the complex data type
e[i] = a[i].conj() * b[i]; // choice 3; note that conj is a function in the complex data type
end for```

In my experience, versions 2 and 3 of this function are much slower than the typed-out expansions thereof, because g++ isn’t smart enough to vectorize over (even inline) std library calls. Or something like that. This was a few months back on a 64 bit Intel machine running 64 bit linux. This is true even if you apply manual loop unrolling.

5. Anonymous says:

Method D has undefined behavior, also

Method D also has undefined behavior: signed overflow in C++ is undefined. From the 1998 c++ standard: Chapter 5 -5-: “If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined, unless such an expression is a constant expression (5.19) in which case the program is ill-formed.”

— mec

• Alan says:

Re: Method D has undefined behavior, also

Nice call! I had thought that all integers were supposed to be 2’s complement (in which case it works despite the overflow/underflow), but according to paragraph 3.9.1-7, integral types can also have 1’s complement or signed magnitude representations. Thanks for pointing this out!

6. Anonymous says:

Nice analysis

I did a similar analysis long back and concluded the same (using temp var for swapping is better than others)..

Thanks