2005-04-11

Divided Classes: Having your subclassing and not eating fragility too

It's a maxim of OOP that "inheritance breaks encapsulation". The difficulty is that in order to subclass a class and override some of its methods, you have to make sure that the method you are overriding is actually being invoked by other methods that you are not overriding. and that they aren't just bypassing you.

The usual solution to this problem amounts to "Forget inheritance; use delegation of one kind or another instead" or else "Document the connections for subclassers and let them hope they can trust you to get it right."

There is, however, a general method for preventing this problem, which consists in dividing your classes into what I will call, unoriginally, divisions. A division of a class is made up of a part of the class's state variables and all the methods that refer to the instance variables in that part. (State variables are instance variables, except for ones that immutably refers to an immutable object; a final String variable in Java, for example, is not part of the state.)

In particular, to divide an existing class into divisions, start with any state variable, then include all the methods that refer to it, then include all the state variables referred to by those methods, and so on until there's nothing more to do. That's one division. Then start with any remaining state variable, do the same thing, and so on until there are no more state variables. Any remaining methods are convenience methods, and are put into divisions by themselves. We can ignore private methods in this process, since they aren't visible to subclasses.

Now the rule is, When subclassing, you must override all the methods in a division or none of them. With all the methods in a division overridden, all the state shared by those methods is irrelevant to the subclass, and and other methods in the superclass don't refer to that state in any way. So encapsulation isn't broken by subclassing in this style.

Furthermore, you must not merge divisions in the subclass.. That is, there must be no shared state between an overriding method in one division and an overriding method in another division. That keeps you safe from having one overridden method call another that corrupts its subclass state. You can add state variables to each overriding division, though, because you control everything.

I can't claim credit for inventing this; it was written up in a paper (seemingly unavailable on line) called "Modular Reasoning in the Presence of Subtyping". I have reinvented the terminology as well. If anyone remembers the source, please tell me and I will credit it. Thanks.

6 comments:

John Cowan said...

In fact no, that's not enough. The shared state between the methods in a division are what's critical.

Here's an example. Suppose you want to override HashSet.add with a method that counts the number of adds. As things stand, you don't know whether to override just add or if you also need to override HashSet.addAll. (The first is correct.)

But since hashAll is implemented directly in terms of add and doesn't share any state with it (in fact it is a convenience method), add and addAll are in different divisions. Overriding them both and having them both modify shared state in the subclass (namely the counter) would violate the rule against merging divisions.

Anonymous said...

Following are things that struck me on reading this (don't worry, the bruises are healing as I type). They're first blush reactions, so take 'em as such.

Working in terms of divisions gives you a method for safely subclassing, but I don't see that it does all that much to do away with fragility -- after all, the subclass' implementation still depends on the details of the superclass', such that should the divisions in the superclass change, you'll need to change your subclass' implementation.

Pithy (?) summary of the technique: "It's necessary to destroy the encapsulation in order to save it."

In Java (and this of course depends on the implementation), there are potential state variable nexuses (is that a word?) in such overridable methods as equals() and hashCode(), which might mush a division all over the class' innards (and who wants to clean up that mess?)

John Cowan said...

Quite so. A division-aware programming language (or variant of a programming language) would enforce the rules for divisions, so that when you tried to recompile your subclass against the superclass you'd get a compiler error, and if you tried to use the old subclass against the new superclass, you'd get a run-time link exception.

equals() and hashCode() might or might not belong to the same division, depending on the implementation strategy.

farfetched said...

The source is Modular Reasoning in the Presence of Subtyping. (Anybody have an unencumbered copy?)

In a language with multiple inheritance, there's no reason to have a separate abstraction for divisions; just use classes.

This is simple to implement with metaclasses.

John Cowan said...

I appreciate the reference, and I think I had an unencumbered copy some time back, but I lost track of it.

It occurred to me last night that Fortress, with its objects and traits, is designed to resist this problem, since there's no inheritance of instance variables.

John Cowan said...

Unencumbered copy of the paper here.