C++ is bad: problems with the ternary operator
In today’s installment of “why not to program in C++,” I give you the following quiz, which Dustin, Steve, Tom, and I had to figure out today (Dustin’s code was doing the weirdest things, and we eventually traced it down to this):
Suppose you start out with the following code:
class Argument; Argument x; void Foo(const Argument& arg); bool test;
You can assume that all of these are defined/initialized elsewhere in the code. For each pair of code snippets below, decide whether the two snippets are equivalent to each other.
# | Code Snippet A |
Code Snippet B |
---|---|---|
1 |
if (test) Foo(x); else Foo(Argument()); |
Foo(test ? x : Argument()); |
2 |
{ // limit the scope of y Argument y; Foo(test ? x : y); } |
Foo(test ? x : Argument()); |
3 |
Foo(x); |
Foo(test ? x : x); |
4 |
Foo(x); |
Foo(true ? x : Argument()); |
Edit: what I meant by the curly braces in Question 2 is that you shouldn’t consider “
y
is now a defined variable” to be a significant difference between the two snippets.
Don’t read further until you think you have the answers.
Have you decided which are the same? Good.
Only the third pair are equivalent. Are you surprised? I certainly was! Here’s what’s going on:
Since Foo()
takes an Argument
reference, whatever is passed into Foo()
must be an lvalue (something that can go on the left side of an equals operator). The lvalues here are x
and y
, but not Argument()
(i.e., the line Argument() = x;
would not be valid).
When the ternary operator (the ?:
syntax) operates on two lvalues, the result is another lvalue. However, when it operates on something that is not an lvalue, the result isn’t one, either. To pass that result into Foo()
, it needs to be placed in a temporary location (which is an lvalue and whose reference can be passed to Foo()
). This means that the copy constructor is invoked, to copy the value returned by the ternary operator into a temporary location, so that Foo()
can get a reference to that location.
So now, justifications for the answers:
- If
test
is true, code snippet A does not call the copy constructor, while snippet B does (since the ternary operator won’t necessarily return an lvalue, it needs to copy it into a temporary location). If the copy constructor forArgument
has side effects, the behavior of the snippets will differ. If the copy constructor does something unusual (for instance, it does not copy a certain member variable, or it resets the value of some internal state in the copy),Foo()
will operate on different data in the two snippets (in B, it would operate on the new, uncopied member variable and the reset/reinitialized state, rather thanx
‘s version). Moreover, the location of the object passed intoFoo()
is different (one isx
itself, while the other is a copy ofx
, stored somewhere else). It’s unlikely thatFoo()
will change its behavior based on the location ofarg
, but you never know. Note that iftest
is false, the copy constructor is not called in either snippet because even though the default constructor does not return an lvalue, it can be stored in an lvalue without using the copy constructor. - Again, when
test
is true, the copy constructor is invoked in snippet B but not in A. In snippet A, both operands in the ternary operator are lvalues, so it returns an lvalue, which can be used directly byFoo()
, but this is not the case in snippet B, and the copy constructor needs to be invoked. This has the same issues as Problem 1. Moreover, iftest
is true, only snippet A invokesArgument
‘s default constructor and destructor (which might have side effects of their own; in an extreme case, the constructor could change the value oftest
itself so that one snippet passes a newly constructedArgument
toFoo
while the other passesx
or a copy thereof). Edit: also, ifArgument
is POD,y
will be uninitialized in snippet A, so whentest
is false snippet A will operate on uninitialized data while snippet B will operate on data that has been zeroed out because it used the default constructor due to the parentheses. Just as before, the snippets have the same behavior if (edit:Argument
is not POD and)test
is false (both snippets call the default constructor, both call the destructor, and neither calls the copy constructor). - These really are the same. Since both parts of the ternary operator are lvalues, the result is an lvalue, and the copy constructor is not used.
- Again, we have the same problems with the copy constructor being invoked in snippet B. Note that even in an optimized build, the copy constructor is still used! The test at the start of the ternary operator and the code to call the default constructor if the test turned out false are removed, but the copy constructor is still used in case you’re relying on one of the differences mentioned above.
This is yet another way in which C++ can have weird issues that are really hard to debug. If you are a fan of C++, please consider using a different (read: modern, high level) language. Both Java and Python only give you objects by reference, so the copy constructor would not be called in any of the above cases, which, for me at least, adheres more closely to the Principle of Least Surprise. The curmudgeons out there will want me to note that Java and Python do pass-by-value (not pass-by-reference, as you may have misinterpreted from my previous sentence) but the values themselves are references to the data stored in the objects, so they’re passing-by-value the references to the data. and yes, Python doesn’t really have a copy constructor, but that’s beside the point.
I realize that sometimes you need the speed available in C++, but there are a lot of times when it’s OK to be 2-3 times slower, and in those times you should use a language like Java (or Python, if you can stand being a bit slower than that). Remember that my Java runs just as fast on a new computer as your C++ does on a 2 year old computer. It’s not that big a performance hit.
Edit: See the addendum for another unexpected issue with the ternary operator.
Why not just avoid the ternary operator? It’s already harder to read.
It’s already harder to read.
Really? I think it makes things much easier in the right situations. Consider these two snippets, where log_file is a pointer:
if (log_file) // If the log file exists (it isn't NULL)
WriteErrorToLog(error_message, info_about_state, log_file);
else
WriteErrorToLog(error_message, info_about_state, default_log_file);
versus
WriteErrorToLog(error_message, info_about_state, (log_file ? log_file : default_log_file));
I personally think the second one is much more readable because it gets rid of the duplicated code.
I still think the first one is more readable; besides, doesn’t the style guide tell you not to use ternary? ;-)
Nope, it’s ternary agnostic. :-)
No, the write answer here is actually a #define or a function that encapsulates that if statement.
write, right. Yeah. Don’t right that code :).
Instead, Wrong it.
Thinking about what the compiler would do with the constructor, I expect you’d get something on the order of
Or something to that nature. I can’t think of what might happen if they are different types, but that seems like it would be asking for trouble.
Also, what is the scope of log_file? If you are going to be writing to the default anyway you could set log_file to default_log_file, and if you wanted to guard some of the writes to not be written if it was the default you could test log_file against default_log_file.
I’m not sure that this qualifies as fodder for a C++ vs other high level languages debate. Fwiw, c++ is getting revved in the near future.
I’m just glad 3 is equivalent. If that failed I’d probably have to change careers :)
It’s not the ternary operator. It’s the temporary object!
Like you, I’m a big fan of ?: for it’s ability to reduce a 4-line if/then/else block to a one liner. But I think in this case ?: is just complicating the fact that you’re using temporary objects. They’ve confused me in C++ for a while.
What are they? Well, compare the following:
Code 1)
Code 2)
In Code 1, it’s clear where the constructor for object y is called, and its destructor is called when we exit the block of code controlled by the if(). But in Code 2, an instance of Argument is created and destroyed too, but where? I’d guess it’s constructed after the first closing parenthesis on the indicated line, and destroyed after the semi-colon on the same line. But it’s a slippery object. The compiler creates it and destroys it for you, but you can’t really touch it – it doesn’t even have a name.
It’s even more confusing when a function returns temporary objects. Just go look at this (Q1 & Q2). I still don’t fully understand the line “binding a temporary object to a reference to const on the stack lengthens the lifetime of the temporary to the lifetime of the reference itself”.
To me it feels like the C++ compiler is doing *some* automatic object lifecycle management, but not going all the way. Whenever you only partially implement an idea, there always seem to be gotchas.
Or you can use the D language, that may be used as low as C, and it always gives objects by reference, like Python.
It’s merely a compiler issue
What compiler did you use with these code snippets? I’ve just checked the first one with mingw, digital mars and VC++ 9 express, and only the first one does work the way like you’ve explained.
Re: It’s merely a compiler issue
I’m using GCC. Are you sure you typed up the example correctly? Everything I described is detailed in Section 5.16 of the C++ Standard.