2007-08-17

Third normal form for classes

It's been wisely and wittily said, though I don't know who by, that a relation is in third normal form (3NF) when all its fields depend on "the key, the whole key, and nothing but the key". This is generally considered to be a Good Thing, though people do deviate from it for the sake of performance (or what they think performance will be -- but that's a whole different rant).

I'd like to introduce an analogous notion of 3NF for classes in object-oriented programming. A class is in 3NF when its public methods depend on the state, the whole state, and nothing but the state. By state here I mean all the private instance variables of the class, without regard to whether they are mutable or not. A public method depends on the state if it either directly refers to an instance variable, or else invokes a private method that depends on the state.

So what do I mean, "the state, the whole state, and nothing but the state"? Three things:

  • If a method doesn't depend on the state at all, it shouldn't be a method of the class. It should be placed in a utility class, or (in C++) outside the class altogether, or at the very least marked as a utility method. It's really a customer of the class, and leaving it out of the class improves encapsulation. (Scott Meyers makes this point in Item 23 of Effective C++, Third Edition.)

  • Furthermore, if the state can be partitioned into two non-overlapping sub-states such that no methods depend on both of them, then the class should be refactored into two classes with separate states. This also improves encapsulation, as the methods in one class can now be changed without regard to the internals of the other class.

  • Finally, if the behavior of a method depends on something outside the state, encapsulation is once again broken — from the other direction this time. Such a method is difficult to test, since you cannot know what parts of what classes it depends on except by close examination.

At any rate, this is my current understanding. My Celebes Kalossi model grew out of considering how methods and state belong together, and this is the practical fruit of it.

Update: I didn't talk about protected methods, protected instance variables, or subclassing. The subclasses of a class are different from its customers, and need to be considered separately if any protected methods or state exist. I am a firm believer in "design for subclassing or forbid it": if you follow the rules above, then instead of subclassing a class, you can simply replace it with a work-alike that has different state while taking no risks of breaking it. (You probably need to make the original class implement some interface.)

Furthermore, the static methods of a class have static state, and the same analysis needs to be performed with respect to them.

Comments?

7 comments:

Anonymous said...

Are static methods excluded from these rules? There is no state from the perspective of a static method (though I guess there is a static state to the class).

Anonymous said...

I also thought about this. If you are interesting, here is my post on the subject.

John Cowan said...

Stand: I've updated the post.

Vadim: I think you are using "state" in a different sense to make a subtly different point, but it may be that we are making the same point and I don't quite follow your way of saying it.

Lars Marius Garshol said...

I like this! I don't know if I'll ever think along these lines when doing a design, but the general rule certainly seems right.

One nit, though: "If a method doesn't belong to the state at all". Shouldn't this be "depend on" rather than "belong to"?

John Cowan said...

Thanks, Lars Marius. Fixed.

Anonymous said...

As I understand, John referes to state as an abstract set of class data and Vadim considers how to map each possible combination in this set to a collection of state identifiers, which can be used in state management.
I should think, that John really refers not to 'state', but to class dataset. If we make this substitution, than John postulates become an obvious observation on OOD.
Vadims considerations are, however, more interesting to me. Would anyone be able to suggest a design pattern for a mutable class, which maps its state and uses this to control its lifecycle?

Raoul Duke said...

re: vadim's blog post, please also see http://www.infoq.com/presentations/Death-by-Accidental-Complexity