How to swap two integers in C++
Warning: this post got embarrassingly long. For the short version, read up through the paragraph that starts with “If you use.”
This is something that’s been bugging me for a while, and I’d like to lay it to rest. If you’ve done any work in C++, you’ve probably heard the riddle about how to swap two integers without using a temporary variable. The question is exactly what it sounds like: you have two integers (call them x
and y
), you want to swap their values, and you may or may not want to use a third, temporary variable while doing it. Here are some ways of swapping integers that I have actually seen in professional code written by professional coders:
Method Name | Code |
---|---|
Method A |
{ // Limit the scope of t int t = x; x = y; y = t; } |
Method B |
x ^= y; y ^= x; x ^= y; |
Method C |
x ^= y ^= x ^= y; |
Method D |
x = x + y; y = x - y; x = x - y; |
Method E |
std::swap(x, y); |
Which of these is the fastest way to swap integers? Which way is the fastest method that doesn’t require extra memory? When will you want to use a different method? What is the overall best method?
Try to come up with some answers before reading further.
When you’ve decided on your answers, let’s take a closer look. No, on second thought, this got really long. For those of you who don’t want to read all the way to the end, let me just get my main point out of the way. After that, we’ll take a closer look. If you don’t want to spoil the ending, skip the next paragraph.
If you use the xor trick to swap integers, STOP IT! Unless you have a brain-dead compiler, the naive swap with a temporary integer doesn’t use any more space than the xor swap, but it is easier to read and often requires less CPU time. On top of this, the xor swap may actually have incorrect behavior if you do it all on one line (a la Method C). You’re not being cute or clever by using this trick; you’re actually making your code worse. Yes, the xor swap is an interesting mathematical curiosity, but it should never be used in real code. Just write what you mean (i.e. use either Method A or E) and let the compiler handle it, because it can compile things better than you can. and in the (far-fetched) event that it can’t do this right and you really do need those extra couple bytes of stack space, either get a better compiler or write it in assembly. but stop propagating the myth that the xor swap is a good coding trick.
Now that I’ve gotten that out of my system, we can continue. Throughout this post, I’ll be using g++ 4 to compile stuff for a 32-bit x86 Linux distribution, because that’s what I happen to have on hand at the moment. This post applies to many other compilers for most other architectures and all other OSes, too, but now you have a way of verifying my work. The g++ command line flags I will use are -Wall
(to turn on all warnings) and -g
(to add in debugging strings so I can more easily see what’s going on when I disassemble the binaries). I will also be using objdump -S
to look at the disassembled machine code. We’ll start out with something simple and straightforward, and compile in the simple and straightforward way:
username@computername$cat swap.cc #include <iostream> // Contains std::cout and std::endl #include <algorithm> // Contains std::swap void PrintValues(int x, int y) { std::cout << "x is " << x << std::endl; std::cout << "y is " << y << std::endl; } int main() { int x = 3; int y = 5; PrintValues(x, y); { int t = x; // Method A x = y; y = t; } PrintValues(x, y); x ^= y; // Method B y ^= x; x ^= y; PrintValues(x, y); x ^= y ^= x ^= y; // Method C PrintValues(x, y); x = x + y; // Method D y = x - y; x = x - y; PrintValues(x, y); std::swap(x, y); // Method E PrintValues(x, y); } username@computername$g++ -Wall -g swap.cc swap.cc: In function 'int main()': swap.cc:29: warning: operation on 'x' may be undefined username@computername$
Well, there's our first snafu—Method C is not actually valid C++! From the fourth paragraph of Section 5 of the C++ standard, "[b]etween the previous and next sequence point a scalar object shall have its stored value modified at most once...; otherwise the behavior is undefined." In case you're not familiar with sequence points, they're defined in Section 1.9 Paragraph 7 as the points where "all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place." In this code, the important sequence points are all the semicolons (there are some others, but they don't have any effect in this example, so I will ignore them). In particular, the one-line xor swap modifies the value of x
twice without a sequence point in between, so the behavior isn't well-defined. One way to think about it is that we can't know the value of x
after that statement: without any internal sequence points, we can't tell whether the store on the left or the right happens last. In practice, Method C usually does swap the integers correctly, but the standard has no such guarantees. Indeed, the behavior is undefined, which technically means that running this code could very well wipe your hard drive and then start downloading kiddie porn. It's unlikely that anyone will write a compiler which, when faced with this code, creates a program with that behavior, but such a malicious compiler would totally conform to the C++ standard. So Method C is definitely out, and will be deleted from all further analysis.
With that removed, everything else compiles fine and runs fine and has the expected behavior. Let's disassemble the binary we just compiled and take a look at exactly what it's doing:
int x = 3; 80486da: c7 45 f8 03 00 00 00 movl $0x3,-0x8(%ebp) int y = 5; 80486e1: c7 45 f4 05 00 00 00 movl $0x5,-0xc(%ebp) PrintValues(x, y); 80486e8: 8b 45 f4 mov -0xc(%ebp),%eax 80486eb: 8b 55 f8 mov -0x8(%ebp),%edx 80486ee: 89 44 24 04 mov %eax,0x4(%esp) 80486f2: 89 14 24 mov %edx,(%esp) 80486f5: e8 52 ff ff ff call 804864c <_Z11PrintValuesii> { int t = x; // Method A 80486fa: 8b 45 f8 mov -0x8(%ebp),%eax 80486fd: 89 45 fc mov %eax,-0x4(%ebp) x = y; 8048700: 8b 45 f4 mov -0xc(%ebp),%eax 8048703: 89 45 f8 mov %eax,-0x8(%ebp) y = t; 8048706: 8b 45 fc mov -0x4(%ebp),%eax 8048709: 89 45 f4 mov %eax,-0xc(%ebp) } PrintValues(x, y); 804870c: 8b 45 f4 mov -0xc(%ebp),%eax 804870f: 8b 55 f8 mov -0x8(%ebp),%edx 8048712: 89 44 24 04 mov %eax,0x4(%esp) 8048716: 89 14 24 mov %edx,(%esp) 8048719: e8 2e ff ff ff call 804864c <_Z11PrintValuesii> x ^= y; // Method B 804871e: 8b 55 f8 mov -0x8(%ebp),%edx 8048721: 8b 45 f4 mov -0xc(%ebp),%eax 8048724: 31 d0 xor %edx,%eax 8048726: 89 45 f8 mov %eax,-0x8(%ebp) y ^= x; 8048729: 8b 55 f4 mov -0xc(%ebp),%edx 804872c: 8b 45 f8 mov -0x8(%ebp),%eax 804872f: 31 d0 xor %edx,%eax 8048731: 89 45 f4 mov %eax,-0xc(%ebp) x ^= y; 8048734: 8b 55 f8 mov -0x8(%ebp),%edx 8048737: 8b 45 f4 mov -0xc(%ebp),%eax 804873a: 31 d0 xor %edx,%eax 804873c: 89 45 f8 mov %eax,-0x8(%ebp) PrintValues(x, y); 804873f: 8b 45 f4 mov -0xc(%ebp),%eax 8048742: 8b 55 f8 mov -0x8(%ebp),%edx 8048745: 89 44 24 04 mov %eax,0x4(%esp) 8048749: 89 14 24 mov %edx,(%esp) 804874c: e8 fb fe ff ff call 804864c <_Z11PrintValuesii> x = x + y; // Method D 8048751: 8b 55 f8 mov -0x8(%ebp),%edx 8048754: 8b 45 f4 mov -0xc(%ebp),%eax 8048757: 8d 04 02 lea (%edx,%eax,1),%eax 804875a: 89 45 f8 mov %eax,-0x8(%ebp) y = x - y; 804875d: 8b 55 f8 mov -0x8(%ebp),%edx 8048760: 8b 45 f4 mov -0xc(%ebp),%eax 8048763: 89 d1 mov %edx,%ecx 8048765: 29 c1 sub %eax,%ecx 8048767: 89 c8 mov %ecx,%eax 8048769: 89 45 f4 mov %eax,-0xc(%ebp) x = x - y; 804876c: 8b 55 f8 mov -0x8(%ebp),%edx 804876f: 8b 45 f4 mov -0xc(%ebp),%eax 8048772: 89 d1 mov %edx,%ecx 8048774: 29 c1 sub %eax,%ecx 8048776: 89 c8 mov %ecx,%eax 8048778: 89 45 f8 mov %eax,-0x8(%ebp) PrintValues(x, y); 804877b: 8b 45 f4 mov -0xc(%ebp),%eax 804877e: 8b 55 f8 mov -0x8(%ebp),%edx 8048781: 89 44 24 04 mov %eax,0x4(%esp) 8048785: 89 14 24 mov %edx,(%esp) 8048788: e8 bf fe ff ff call 804864c <_Z11PrintValuesii> std::swap(x, y); // Method E 804878d: 8d 45 f4 lea -0xc(%ebp),%eax 8048790: 89 44 24 04 mov %eax,0x4(%esp) 8048794: 8d 45 f8 lea -0x8(%ebp),%eax 8048797: 89 04 24 mov %eax,(%esp) 804879a: e8 8b 00 00 00 call 804882a <_ZSt4swapIiEvRT_S1_> PrintValues(x, y); 804879f: 8b 45 f4 mov -0xc(%ebp),%eax 80487a2: 8b 55 f8 mov -0x8(%ebp),%edx 80487a5: 89 44 24 04 mov %eax,0x4(%esp) 80487a9: 89 14 24 mov %edx,(%esp) 80487ac: e8 9b fe ff ff call 804864c <_Z11PrintValuesii>
In case you're not familiar with assembly code, let me give a quick explanation of what's going on here (if you're already comfortable with it, skip this paragraph). The leftmost column is the position of this code within the program; you can ignore it. The bytes that come after it are the actual machine code generated by the compiler; you can ignore them. Then there is a column of the assembly instructions that correspond to the machine code we're ignoring, followed by the arguments to those instructions. These last two columns are the important part, because they're nearly readable and they describe exactly what the computer is doing. Interspersed throughout are the lines of C++ that were compiled to get the machine code just below them. Things that start with % are registers, which are the locations in the processor that store the data you're working with. Parentheses around registers denote that the register contains a pointer to somewhere in memory, and it's talking about that memory location instead of the value in the register itself. The assembly instructions used here can move data from memory to registers and back again (the mov
and movl
instructions), call other functions (the call
instruction), take the exclusive or of two values (the xor
instruction), load and add values (lea
, which admittedly is confusing because it is used here both to add numbers together and to pass references into std::swap
), and subtract values (sub
). If you're not very comfortable with this, don't worry; I'll walk you through the important parts.
The compiled instructions are pretty straightforward here. x
is stored 8 bytes below the pointer stored in register %ebp
, and y
is stored 12 (0xc) bytes below that same pointer. To call PrintValues
, we push y
and then x
onto the top of the stack (by loading them into registers %eax
and %edx
and then putting them near where the stack pointer register, %esp
, points) and then we call PrintValues
(whose name has been changed slightly but is still recognizable). Note that the arguments are pushed onto the stack in the reverse order that we listed them in the code. Method A loads x
into the register %eax
, then stores it again 4 bytes before %ebp
(presumably this is t
). It then loads y
and stores it where x
goes, etc, just the way you'd expect. For Method B, we first load the variables into registers %edx
and %eax
, we xor them together so that the result is in %eax
, and we store that value back into memory. We then repeat this twice more for the other two xors. The rest of the code should be equally straightforward (maybe not counting the lea
instructions, but trust me that it's doing what you'd expect there).
If this were our final compiled program, I would totally agree that the naive swap requires an extra 4 bytes of memory in the stack frame, and therefore truly does require an extra temporary variable that the rest of the methods don't need. However, we compiled with the default optimization level, which has absolutely no optimizations at all whatsoever. If you use g++ and/or GCC, you should really use optimization level 2 for any release-worthy code, and only use level 0 for debugging purposes. So let's try this again with the -O2
flag passed to g++ as well, and see what machine code we get this time:
int x = 3; int y = 5; PrintValues(x, y); 80486dc: c7 44 24 04 05 00 00 movl $0x5,0x4(%esp) 80486e3: 00 80486e4: c7 04 24 03 00 00 00 movl $0x3,(%esp) 80486eb: e8 7c ff ff ff call 804866c <_z11printvaluesii> { int t = x; // Method A x = y; y = t; } PrintValues(x, y); 80486f0: c7 44 24 04 03 00 00 movl $0x3,0x4(%esp) 80486f7: 00 80486f8: c7 04 24 05 00 00 00 movl $0x5,(%esp) 80486ff: e8 68 ff ff ff call 804866c <_z11printvaluesii> x ^= y; // Method B y ^= x; x ^= y; PrintValues(x, y); 8048704: c7 44 24 04 05 00 00 movl $0x5,0x4(%esp) 804870b: 00 804870c: c7 04 24 03 00 00 00 movl $0x3,(%esp) 8048713: e8 54 ff ff ff call 804866c <_z11printvaluesii> x = x + y; // Method D y = x - y; x = x - y; PrintValues(x, y); 8048718: c7 44 24 04 03 00 00 movl $0x3,0x4(%esp) 804871f: 00 8048720: c7 04 24 05 00 00 00 movl $0x5,(%esp) 8048727: e8 40 ff ff ff call 804866c <_z11printvaluesii> swap(x, y); // Method E PrintValues(x, y); 804872c: c7 44 24 04 05 00 00 movl $0x5,0x4(%esp) 8048733: 00 8048734: c7 04 24 03 00 00 00 movl $0x3,(%esp) 804873b: e8 2c ff ff ff call 804866c <_z11printvaluesii>
Whoops! Here, all the variables have been completely optimized away! We just push the values 3 and 5 onto the stack when we want to call PrintValues
, and continually swap their order. In this case, all four swapping methods have the same performance: none of them take any CPU cycles at all, and none of them take any memory at all. Clearly, this is not what people have in mind when they ask about swapping integers. So, I need a way to force the values to be stored somewhere, and an easy way to do that is to conditionally modify their values. I'll make some test that the compiler hopefully can't figure out, and set x
and y
to 0 if the test turns out true. The complier won't know that the result is always false and no variables are modified, so it will need to store their values somewhere just in case we do modify them. To accomplish this, I'll use the following function:
bool NontrivialFunction(int x, int y, int base) { // I need a way to force x and y to be stored instead of completely optimized // away. So, here's the plan: if either of them are not prime, I will return // true and set them to zero. if (x < base * base && y < base * base) return false; if (x % base == 0) return true; if (y % base == 0) return true; return NontrivialFunction(x, y, base + 1); } ... // and this part is added to main() just after defining x and y if (NontrivialFunction(x, y, 2)) { x = 0; y = 0; }
NontrivialFunction
is reminiscent of a recursive primality checker, and because it's recursive, the compiler can't inline it (and probably won't try to precompute it). Since x
and y
may change depending on the result, the compiler will need to store them somewhere so that it's possible to set them to zero if necessary. So now let's compile again (with -O2
for optimization again), and look at the result:
int x = 3; int y = 5; if (NontrivialFunction(x, y, 2)) { 8048739: c7 44 24 08 02 00 00 movl $0x2,0x8(%esp) 8048740: 00 8048741: c7 44 24 04 05 00 00 movl $0x5,0x4(%esp) 8048748: 00 8048749: c7 04 24 03 00 00 00 movl $0x3,(%esp) 8048750: e8 b7 fe ff ff call 804860c <_Z18NontrivialFunctioniii> 8048755: 84 c0 test %al,%al 8048757: 74 52 je 80487ab <main+0x81> 8048759: 31 db xor %ebx,%ebx 804875b: 31 f6 xor %esi,%esi 804875d: 31 ff xor %edi,%edi x = 0; y = 0; } PrintValues(x, y); 804875f: 89 74 24 04 mov %esi,0x4(%esp) 8048763: 89 1c 24 mov %ebx,(%esp) 8048766: e8 5b ff ff ff call 80486c6 <_Z11PrintValuesii> { int t = x; // Method A x = y; y = t; } PrintValues(x, y); 804876b: 89 5c 24 04 mov %ebx,0x4(%esp) 804876f: 89 34 24 mov %esi,(%esp) 8048772: e8 4f ff ff ff call 80486c6 <_Z11PrintValuesii> x ^= y; // Method B y ^= x; 8048777: 31 fb xor %edi,%ebx x ^= y; 8048779: 89 de mov %ebx,%esi 804877b: 31 fe xor %edi,%esi PrintValues(x, y); 804877d: 89 5c 24 04 mov %ebx,0x4(%esp) 8048781: 89 34 24 mov %esi,(%esp) 8048784: e8 3d ff ff ff call 80486c6 <_Z11PrintValuesii> x = x + y; // Method D y = x - y; x = x - y; PrintValues(x, y); 8048789: 89 74 24 04 mov %esi,0x4(%esp) 804878d: 89 1c 24 mov %ebx,(%esp) 8048790: e8 31 ff ff ff call 80486c6 <_Z11PrintValuesii> std::swap(x, y); // Method E PrintValues(x, y); 8048795: 89 5c 24 04 mov %ebx,0x4(%esp) 8048799: 89 34 24 mov %esi,(%esp) 804879c: e8 25 ff ff ff call 80486c6 <_Z11PrintValuesii> } 80487a1: 31 c0 xor %eax,%eax 80487a3: 8d 65 f4 lea -0xc(%ebp),%esp 80487a6: 5b pop %ebx 80487a7: 5e pop %esi 80487a8: 5f pop %edi 80487a9: 5d pop %ebp 80487aa: c3 ret int x = 3; int y = 5; if (NontrivialFunction(x, y, 2)) { 80487ab: bb 03 00 00 00 mov $0x3,%ebx 80487b0: be 05 00 00 00 mov $0x5,%esi 80487b5: bf 06 00 00 00 mov $0x6,%edi 80487ba: eb a3 jmp 804875f <main+0x35>
This assembly is a bit harder to read, so let's start at the top: We push 2, then 5, then 3 onto the stack and call NontrivialFunction
(remember that arguments on the stack are in reverse order, so this is really calling NontrivialFunction(3, 5, 2)
) without yet assigning any of those values to anything you could call x
or y
. Once that is done, the result is stored in register %al
, and we test
whether it's zero (it looks like we're comparing it to itself, but it's actually just checking the sign of %al
). If it's zero (false), we jump (je
) down to the bottom (to line 80487ab). Once there, we store the values 3 and 5 into registers %ebx
and %esi
(and the value 6, which is the precomputed value of x ^ y
, into register %edi
; this will be used in Method B). Then we jump (jmp
) back up to the first call to PrintValues
, and continue on. If we had wanted to go through the if statement (i.e. if the result had been nonzero/true), we wouldn't have jumped and instead we'd have gone through the three xor
's and zeroed out the values (a number xor'ed with itself is 0). Don't worry about the pop
's and ret
at the end; that's just some clean-up done as the main
function finishes up, and we can safely ignore it.
So at this point, our trick has worked: x
is stored in register %ebx
and y
is in %esi
. but now take a look at the rest of the code: for most of it, the swapping code is removed completely, and we just call PrintValues(x, y)
and then PrintValues(y, x)
to "swap" them! Most of these swaps take absolutely no CPU time or extra memory. The one exception, however, is Method B, which only got half-optimized away (that first xor is precomputed, but the rest is still there). so in this case, all methods can be completely optimized away by the compiler except for the xor swap, which is slower and unambiguously worse.
"but surely this isn't what real code looks like either," I hear the defenders of xor say. "Surely real code stores the variables in memory instead of directly in registers, since the registers are presumably used for much more than just storing our two piddly little integers." Fair enough; let's take a look! To get the variables stored in memory, we'll need something that makes use of these locations, or in other words something that requires references to the variables and something that the compiler doesn't inline and/or optimize away. The primality tester worked before, so let's just adapt that a little:
void NontrivialReferenceFunction(int& x, int& y) { // This is really just a fancy no-op, but the compiler doesn't know that. if (NontrivialFunction(x, y, 2)) { NontrivialReferenceFunction(x, y); } } ... // In main replacing the part about NontrivialFunction that we added in last time NontrivialReferenceFunction(x, y);
We re-compile, and re-dump the assembly to get this:
int x = 3; 8048768: c7 45 fc 03 00 00 00 movl $0x3,-0x4(%ebp) int y = 5; 804876f: c7 45 f8 05 00 00 00 movl $0x5,-0x8(%ebp) NontrivialReferenceFunction(x, y); 8048776: 8d 45 f8 lea -0x8(%ebp),%eax 8048779: 89 44 24 04 mov %eax,0x4(%esp) 804877d: 8d 45 fc lea -0x4(%ebp),%eax 8048780: 89 04 24 mov %eax,(%esp) 8048783: e8 de fe ff ff call 8048666 <_Z27NontrivialReferenceFunctionRiS_> PrintValues(x, y); 8048788: 8b 45 f8 mov -0x8(%ebp),%eax 804878b: 89 44 24 04 mov %eax,0x4(%esp) 804878f: 8b 45 fc mov -0x4(%ebp),%eax 8048792: 89 04 24 mov %eax,(%esp) 8048795: e8 5e ff ff ff call 80486f8 <_Z11PrintValuesii> { int t = x; // Method A 804879a: 8b 45 fc mov -0x4(%ebp),%eax x = y; 804879d: 8b 55 f8 mov -0x8(%ebp),%edx 80487a0: 89 55 fc mov %edx,-0x4(%ebp) y = t; 80487a3: 89 45 f8 mov %eax,-0x8(%ebp) } PrintValues(x, y); 80487a6: 89 44 24 04 mov %eax,0x4(%esp) 80487aa: 89 14 24 mov %edx,(%esp) 80487ad: e8 46 ff ff ff call 80486f8 <_Z11PrintValuesii> x ^= y; // Method B 80487b2: 8b 45 f8 mov -0x8(%ebp),%eax 80487b5: 8b 55 fc mov -0x4(%ebp),%edx 80487b8: 31 c2 xor %eax,%edx y ^= x; 80487ba: 31 d0 xor %edx,%eax 80487bc: 89 45 f8 mov %eax,-0x8(%ebp) x ^= y; 80487bf: 31 c2 xor %eax,%edx 80487c1: 89 55 fc mov %edx,-0x4(%ebp) PrintValues(x, y); 80487c4: 89 44 24 04 mov %eax,0x4(%esp) 80487c8: 89 14 24 mov %edx,(%esp) 80487cb: e8 28 ff ff ff call 80486f8 <_Z11PrintValuesii> x = x + y; // Method D 80487d0: 8b 55 f8 mov -0x8(%ebp),%edx 80487d3: 8b 45 fc mov -0x4(%ebp),%eax 80487d6: 01 d0 add %edx,%eax y = x - y; 80487d8: 89 c1 mov %eax,%ecx 80487da: 29 d1 sub %edx,%ecx 80487dc: 89 4d f8 mov %ecx,-0x8(%ebp) x = x - y; 80487df: 29 c8 sub %ecx,%eax 80487e1: 89 45 fc mov %eax,-0x4(%ebp) PrintValues(x, y); 80487e4: 89 4c 24 04 mov %ecx,0x4(%esp) 80487e8: 89 04 24 mov %eax,(%esp) 80487eb: e8 08 ff ff ff call 80486f8 <_Z11PrintValuesii> swap(_Tp& __a, _Tp& __b) { // concept requirements __glibcxx_function_requires(_SGIAssignableConcept<_Tp>) _Tp __tmp = __a; 80487f0: 8b 45 fc mov -0x4(%ebp),%eax __a = __b; 80487f3: 8b 55 f8 mov -0x8(%ebp),%edx 80487f6: 89 55 fc mov %edx,-0x4(%ebp) __b = __tmp; 80487f9: 89 45 f8 mov %eax,-0x8(%ebp) std::swap(x, y); // Method E PrintValues(x, y); 80487fc: 89 44 24 04 mov %eax,0x4(%esp) 8048800: 89 14 24 mov %edx,(%esp) 8048803: e8 f0 fe ff ff call 80486f8 <_Z11PrintValuesii>
Now we're getting somewhere! This time our variables are stored in memory (4 and 8 bytes before the place that %ebp
points to), and this time they really are loaded into registers, swapped by various means, and then stored back into memory, just like we wanted. Like the previous calls to std::swap
, the lea
command is used to get the address of (reference to) variables to pass to NontrivialReferenceFunction
. We also now have an instance of the add
command, which does just what you'd expect. Other than that, though, no new commands are used here. This looks like it's exactly what we wanted, so let's examine this in detail.
Method A loads both variables into registers, and then stores them back in the opposite locations. We don't actually allocate any memory at all for the temporary variable, which has been optimized away yet again. In Method B, we load the variables into registers, xor them, xor them again, store that result in memory (as y
), xor them a final time, and stick that result back into memory as x
. Like Method A, this requires two registers and four mov
commands (two for loading and two for storing the data). Unlike Method A, Method B requires three extra xor
commands to get the same result. Method D is a lot like Method B, in that it requires the same 4 mov
's, but also requires some extra addition and subtraction work that Method A didn't need. On top of that, it uses a third register, %ecx
, while the previous two methods only used two registers. Method E is barely recognizeable because it's got all these strange pieces of code and variables with underscores in their names. The compiler has inlined std::swap
, which turns out to be a templated function. Its two arguments have type _Tp
(for our purposes, _Tp
is int
), and are called __a
and __b
. If you read the C++ code, it creates a temporary _Tp
called __tmp
and basically just does Method A. The assembly does the same 4 mov
's as Method A with nothing extra.
So to sum up, Methods A and E were the same, Method B took more time, and Method D took more time and an extra register to accomplish the same thing. In other words, when we got down to it, the naive swap didn't use a temporary variable, and still ran faster than the xor swap. It turns out the same result happens if you make your variables volatile
(provided that you remove the call to NontrivialReferenceFunction
, because you can't take references of volatile data). For kicks, I tried the same thing on a 64-bit x86 Linux architecture with g++ 4.1, and found that everything was optimized away into the alternating calls to PrintValues(x, y)
and PrintValues(y, x)
, with no actual swaps, even if I called NontrivialReferenceFunction
. If I made the variables volatile
, I could actually get the values to swap, but the result was the same: Methods A and E were by far the best, Method B took the same amount of space but more time, and Method D took extra space and extra time.
So now that we've beaten this code to death, I have one final tangent: there is an assembly instruction called XCHG (or EXCH, or something like that, depending on what architecture you're on) which swaps the values of two words. Unfortunately, it's generally a bad idea to use it because if you use it on a location in memory, the processor considers it a memory fence and messes up all sorts of pipelining and slows down all the code around it. So the only time you should consider using XCHG is if you want to swap two registers, and in that case, it's probably better just to remap the registers in the rest of the program (like in the assembly example I showed for the NontrivialFunction
version), rather than explicitly swapping them.
To bring this full circle, here are my answers to the original questions: Methods A and E are the fastest way to swap two integers. Methods A and E are also the fastest way to do it without using any extra memory (even though you write C++ with a temporary variable, it gets optimized away). I can't think of a realistic situation where I'd want to use a different method, and the only unrealistic situation I can come up with is when your compiler can't optimize anything at all and makes assembler instructions like the original version we examined, and you really, truly need those extra couple bytes of stack space. However, my first course of action there would be to get a better compiler. Only after I've found that I can't get a good compiler and I can't improve the compiler I already have and I can't easily write it in assembly would I consider using Method B. I'm inclined to call Method A the best because unlike Method E it explicitly shows you what it's doing and it doesn't require linking in the entire algorithm
library. However, there are definitely arguments for Method E, for instance that someone who knows what they're doing wrote the standard libraries and if there's a better way to do it then the standard library will probably be updated before your code. Another reason to pick Method E is that there is a whole system you can implement around std::swap
that optimizes the swapping of larger objects, though I personally have never yet needed that optimization. but regardless, in this day and age there isn't a good excuse for using Methods B or D, despite any claims by GameDev and others to the contrary (and I strongly suspect that there wasn't any good excuse for the xor swap when that article was written in 2001, either). and under no circumstances should anyone ever use Method C because it isn't guaranteed to do what you want.
If you've actually gotten this far, thank you for reading my rant.
I didn’t read the whole thing, but the variable swapping I did was always using method A. I didn’t know what B or C would even do at first glance, although it’s possible that syntax is somewhere back there under cobwebs…
but it’s nice to know that the simple method I know is decently good. (c=
But you do have to admit D has the cute property of making you think about int addition as addition in an abelian group to convince yourself that it works in the face of overflow.
And, for no reason other than it’s been awhile since I have written a proof…
Let x’ and y’ represent the original values in x and y. After x = x + y, x == x’ + y’.
After y = x – y, y == (x’ + y’) + (-y’) == x’ + (y’ + (-y’)) == x’ + 0 == x’.
After x = x – y, x == (x’ + y’) + (-x’) == (y’ + x’) + (-x’) == y’ + (x’ + (-x’)) == y’ + 0 == y’.
Admittedly, a real proof would have provided a proper semantics for addition, subtraction, and assignment, but I am not that bored. =)
Oh, I totally agree! It’s interesting that both Methods B and D rely solely on operators that define groups over the integers (note that subtraction isn’t abelian, but it’s still a group). I wonder if we can make a swapping mechanism that doesn’t require the semblance of a temporary variable and only uses operators which don’t form groups over the integers? I think we’d be limited to multiplication, division, modulo, bitwise and, bitwise or, and the 3 different shifts (and unary operators like negation and stuff). Any ideas? To make this more interesting, you aren’t allowed to implement the xor swap again by just using lots and lots of NAND gates. :-)
Wait, I’m being an idiot. Subtraction over the integers is not a group because it’s not associative. However, it would almost count as an abelian group if we defined it as addition with a few unary negations thrown in at opportune times.
Seems to me like that’s a lot of effort to double-check that the Compiler Is Smarter Than You, which we already knew.
Also, have you really seen the x ^= y ^= x ^= y in code? There’s no way that’s allowed in the style guide.
I agree that it was a lot of effort, but there are still lots of people out there who don’t yet seem to realize that the compiler is better at compiling than they are.
What is it with you harping on the style guide all the time? There is code written by other companies and organizations that either have different style guides or don’t have one at all, and I’ve seen Method C used in their code.
It’s certainly an interesting piece of analysis; I never realized it wasted space; I just figured it was unreadable, and should be skipped on that account.
Besides, I liked the style guide! There’s a lot of high-density coding wisdom in there.
Also, I kept seeing “Method C” and thinking you’d found some new programming language that I’d never heard of. :-D
I never realized it wasted space; I just figured it was unreadable
Wait, what? The point of the post is that none of these methods use any extra memory (and only method D uses any extra registers), and that the naive swap is just as space efficient as the xor swap because it doesn’t actually allocate the temporary variable you write in the code. I’ve edited the paragraph that starts with “if you use” to make that clearer.
Well, now you’ve thoroughly shamed me for not reading the post closely enough.
:-) I’ll have to go through it in more detail.
have you really seen the x ^= y ^= x ^= y in code?
Cases in point: it is (hopefully was?) in the source code for Mozilla’s SHA implementation and BSD Airtools. I wish I could get a real count of how often Google Code Search can find it, but either I can’t escape the regex right or GCS doesn’t actually support all of the POSIX extended regex expressions (in particular, I can’t get \(…\) and \1 to work).
I ran the same test a few years back, and yes, the obvious thing to do is the best. Wheeee!
FYI, I’d recommend you don’t generalize the conclusion this post implies that Method E is as good as Method A. As an example, here’s some C++ pseudocode:
In my experience, versions 2 and 3 of this function are much slower than the typed-out expansions thereof, because g++ isn’t smart enough to vectorize over (even inline) std library calls. Or something like that. This was a few months back on a 64 bit Intel machine running 64 bit linux. This is true even if you apply manual loop unrolling.
Interesting! I hadn’t realized that; thanks for pointing it out.
Method D has undefined behavior, also
Method D also has undefined behavior: signed overflow in C++ is undefined. From the 1998 c++ standard: Chapter 5 -5-: “If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined, unless such an expression is a constant expression (5.19) in which case the program is ill-formed.”
— mec
Re: Method D has undefined behavior, also
Nice call! I had thought that all integers were supposed to be 2’s complement (in which case it works despite the overflow/underflow), but according to paragraph 3.9.1-7, integral types can also have 1’s complement or signed magnitude representations. Thanks for pointing this out!
Nice analysis
I did a similar analysis long back and concluded the same (using temp var for swapping is better than others)..
Thanks
Padmakumar