C++ is bad: inheritance is counterintuitive

In our latest installment of “why not to program in C++,” let’s take a look at a simple class hierarchy:

class Duck {
 public:
  Duck() {
    InitializeMigratoryRoute();
  }

  // Default behavior is nonmigratory; this is
  // overridden in the Mallard class.
  virtual void InitializeMigratoryRoute();
}

class Mallard : public Duck {
 public:
  Mallard();

  // Migrate south in the fall and north in the spring
  virtual void InitializeMigratoryRoute();
}

class DonaldDuck : public Duck {
 public:
  DonaldDuck() {
    outfit_.AddShirt(Outfit::Shirts::SAILOR);
    outfit_.AddHat(Outfit::Hats::SAILOR);
  }

 private:
  Outfit outfit_;
}

You can assume this code compiles and runs (i.e., the Outfit class is defined, it implements AddShirt(), both versions of InitializeMigratoryRoute() are defined, etc). I would say this looks like perfectly reasonable code: its intention is clear and straightforward, and any sane language should have no trouble implementing this. [1] However, we’re dealing with C++, which means there are always more problems, and in fact this code is not in the least alright. In particular, there are two major things wrong with it. Do you see what they are?

Take your time; try to find where this code goes awry.

 

Do you have some ideas about what’s wrong? Let’s take a look. First off, none of the Mallards will migrate. The first thing that happens in the Mallard constructor is a call to the Duck constructor, to set up the base class. This, in turn, calls InitializeMigratoryRoute(). However, since the Mallard object isn’t created yet, it would be unsafe to call the overridden version (which might depend on some member variables in the Mallard object itself that haven’t been constructed yet). The constructor therefore calls Duck::InitializeMigratoryRoute() instead of Mallard‘s version. It doesn’t matter that InitializeMigratoryRoute() is a virtual function that was overridden in the Mallard class; if you construct the base class before constructing the derived class (which is how derived classes are constructed in C++), calling the derived version may rely on data that isn’t created yet, and so the only safe thing to do is to call the implementation in the base class.

This leads to very nefarious bugs: any pure Ducks you create will function perfectly well, and the Mallard code is written correctly, yet it has the wrong behavior. In other words, the correctly implemented code has the wrong behavior while the incorrect code has the right behavior, and consequently it’s really hard to find this kind of bug.

Incidentally, Java does this the other way around— Mallard::InitializeMigratoryRoute() would be called in the constructor, even though it may very well use data whose value hasn’t been initialized (which is to say, the variables have automatically been set to 0/null/false/etc by the virtual machine, but their intended values haven’t been stored because the code in Mallard‘s constructor hasn’t run yet). I’m not convinced Java’s way of doing things is any better than C++’s way; they both can have unexpected surprises if you’re not on guard. Perhaps we should get in the habit of having initializers that are separate from the constructors, to avoid this issue entirely. Edit: if you use initializers, you can’t have const variables. Getting this whole issue right is a tricky, subtle problem.

So there’s the first thing wrong with that code. Any idea about the second problem? Any C++ programmer worth his or her salt knows about memory leaks. Do you see the memory leak in the above code? It doesn’t matter that there isn’t a new or malloc anywhere to be found; I assure you there’s a memory leak there, just waiting to happen. If you need a hint, take a look at this code:

Duck* temp = new DonaldDuck();
delete temp;

Do you see it? The destructor of Duck is not virtual, which means that when you delete a Duck pointer, you call ~Duck() as the destructor, even if you should be calling ~DonaldDuck(). Consequently, outfit_‘s destructor is never called. If it dynamically allocated its own member variables, that memory will never be deallocated. This can be tricky to notice, since the crux of the problem is that the code you didn’t write and can’t see (Duck‘s destructor) breaks the code you didn’t write and can’t see (the internals of Outfit).

Any class with a virtual function should have a virtual destructor to avoid this issue, and these days g++/GCC even has a -Wnon-virtual-dtor flag that warns you if you have a non-virtual destructor in a class with virtual functions. There is even an argument to be made that all destructors should always be virtual, to proactively avoid this problem whether or not you anticipate subclasses. The only drawback is that it would add a vtable to the classes that don’t have other virtual functions, and it adds an extra layer of indirection when calling the destructor. but unless you’re writing really performance intensive code, the time needed to follow one extra pointer doesn’t seem that onerous. The reason this isn’t considered a good practice is that the benefit gotten is even smaller than the cost incurred—very few classes without virtual functions ever get subclassed. Nevertheless I think it’s interesting to consider this idea, even if we reject it afterwards.

As always in my “C++ is bad” posts, the conclusion here is that even simple things in C++ can have very surprising outcomes, and to really understand what your code is doing requires knowledge of the underlying implementation that the compiler creates. I would much rather use a high-level language where I don’t need to worry about these details. The behavior of your code should be as unsurprising as possible, and I don’t think C++ passes this test.

[1] Sure, a good language shouldn’t need to distinguish between public and private inheritance (inheritance should always be an IsA/public relationship, not a HasA/private relationship, which is really encapsulation), and a good language should also not require the virtual keyword. but I’ll accept those problems as necessary evils of an old language.

Leave a Reply

14 Comments

  1. kitty_tape says:

    The reason this isn’t considered a good practice is that the benefit gotten is even smaller than the cost incurred; very few classes without virtual functions ever get subclassed.

    This reminds me of the funny/sad fact that my tech lead believes in writing every class as if it is going to be subclassed in the future. We are working in Java and every single member variable and non-public function is declared protected. And that’s all I will say for now or this will turn into a rant. =)

    (P.S. I was surprised to figure out both of the problems after not programming in C++ for 9 months.)

    • Alan says:

      I was surprised to figure out both of the problems after not programming in C++ for 9 months.

      Congratulations! The first one bit me at work about a year ago, and I wrote to the c-users mailing list with a friendly reminder not to do it (and received several commisserating replies from fellow victims). The second one is obvious in hindsight, but I didn’t learn about it until about a month ago.

  2. inferno0069 says:

    I do still like const.

    I’ve found myself occasionally thinking that we should perhaps be (to use Java terms) separating classes and interfaces and then requiring that publicly classes only implement interfaces, especially when we’re not allowing multiple inheritance (of classes). Privately, inheriting implementation from other classes would still be useful. I think my use of Java has had an effect on this—single public inheritance can be quite annoying—but so has my interest in Haskell, which already has something kind of like this.

    • Alan says:

      You make a good point about const; I hadn’t thought all the way through my initializer idea, and you’re right that initializers wouldn’t work with const. I have edited my post to reflect this.

      I totally agree that class declarations in Haskell are a slick way to do this sort of thing well. It’s kinda too bad that few other languages do it similarly.

  3. Anonymous says:

    I should say

    Thanks for the post

  4. Apropos of nothing except C++, I used the ?: operator for something useful today. I thought you’d approve. :-D

  5. Anonymous says:

    thank you

    thank you, bro

  6. Anonymous says:

    thank you

    thats it, dude

  7. Anonymous says:

    well done

    thank you, bro

  8. Anonymous says:

    I think you misclassify semantics of the language to defects. Especially for the first semantic there is a very good reason. What would be your counterproposal to the behaviour of C++? Objects in C++ are build from top to bottom in the hierarchy and that is quite intuitive. Try the other way around and you’ll realize how much code is going to break. And of course the semantics in C++ is that overriding in the ctor DOES NOT work. And how could it? Overriding requires calling a function of an object that by that time is not created. You are talking about default initialized primitive types in Java. Is that all an object is consisting of? What about files? what about network connections? This is far more dangerous.

    You argue that this is not the correct behaviour, but this is the case because you are not completely aware of the semantics of the language. The language has to solve pragmatic problems like the one above so it does get complicated in cornercases but then again comes the programmer’s understanding of how things work on the memory. I would be more than happy to hear another solution to that particular problem. I hope you agree though that building the objects bottom-up is not an alternative. Consider this:

    class A {
     public:
      A(const std::string& id)
        id(id) {}
      const std::string& GetID() const { return id; }
    
     private:
      std::string id;
    }
    
    class B: public A {
      B(const std::string& id)
        A(id) { std::cout << this->GetID(); }
    }

    So this is the same problem the other way around. If you construct objects bottom-up this is going to fail. At some point you have to accept a convention.

    As for the destructor, again the semantics in C++ is that it is NOT a special function. It is a function like every other, so in order to be overridden it has to be virtual. The only special thing about it is that it’s called at the destruction of the object.

    Anyway, what I’m trying to say is that C++ is indeed a complex language but all the intricacies are coming from the fact that creating a programming language is an engineering problem and you have to make compromises. Start ranting about it doesn’t help. It’s not a language for the web it has to be more flexible to allow solution for a wide range of problems.

    • Alan says:

      I agree that C++ is working as designed. However, programming languages are tools to make it easier to get computers to compute stuff. If a language’s semantics go against common intuition, I’d say that the language is poorly designed and we should get a new one with better semantics instead. By this metric, C++ is a bad language. Allowing constructors to break the contract imposed by the “virtual” keyword seems counterintuitive. and as far as I can tell, there is never, ever a good reason to have a non-virtual destructor in a class hierarchy, and having them by default seems counterintuitive as well (heck, having a “virtual” keyword in the first place seems unnecessary, which is why more modern languages like Java and Python have done away with it entirely).

      The way objects are constructed in C++ is “correct” only because that’s the way C++ was defined/designed. I argue that this is unintuitive and trips people up. The designers of Java saw that this was a problem, and came up with new semantics to try to work around it, but it still trips people up in different ways. I suppose what we really want is a dependency graph of each statement within the constructor, so that we always initialize variables after everything they depend on has already been initialized (or throw a compiler error if the dependency graph has a cycle in it). To the best of my knowledge, no language already does this; I’m just brainstorming ideas that fit my intuition better than the current options.

      • Alan says:

        Upon further consideration, my idea about the dependency graph can have lots of problems if the constructor has side effects (for instance, if you’ve added debugging information in to various parts of it, or if certain member variables get registered with some global datastore). This is a difficult, subtle problem, and it hasn’t been solved adequately yet.

  9. amateri says:

    Hi, great article!! I got you bookmarked. Thanks and best wishes

  10. Anonymous says:

    Thank you for such a good blog. This is what I looked for.

Leave a Reply to Alan

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>