jueves, 12 de julio de 2007

Class diagrams considered harmful

I still have issues with the UML, specifically with the class diagrams.

Class diagrams show implementation details

Class diagrams can be delivered at the very end of a development effort, just when everything seems to work, so that we can show the implementation details to the extremely curious, but writing class diagrams when use cases are being written is a waste of time, sets in stone what really can't be decided so early and distracts the attention from what is important to what is irrelevant.

First, the important issues about object modeling is how objects behave in RAM, that is, object diagrams are a lot more important than class diagrams. And no, one can't be derived from the other. Object diagrams show how objects will interact in RAM, while class diagrams just show implementation details, and if the class hierarchy is set correctly, they can't show how objects behave in RAM. Otherwise, if your class diagrams and object diagrams are almost equivalent, you have a very *fixed* way to interrelate objects in RAM, obviously one that is completely directed by the class hierarchy. If the objects need to relate in another way, changes should be made to the class hierarchy, meaning the class hierarchy will never get stable. So you either err on the side on which you can't change the class hierarchy or you err on the side on which your class hierarchy never seems to get stable.

ORMs and class diagrams

There is another problem with confusing the object diagram with the class diagram. Let us suppose you need to store your object model in a database. Would you model your database after your class diagram or after your object diagram? If you think you want to store your classes, think again. You want to store objects, therefore the object diagram is the one to store.

There is more pervasive defects in the way most modellers approach design in UML. For example if I have a Person class, having a Person table may not be the best option. Tables are just object containers, persistent ones, but containers anyway. Could you imagine that if you wanted to store a new list of persons (let us say people who owe you something) and there was only one list of Person? Of course developers would complain, since it makes no sense, and I sincerely hope you see the problem here.

Then why don't OO developers complain if there is only one Person persistent container?

I suppose the problem has more to do with specialization than anything else.

Specialization in the software development field means that you either know object orientation (encapsulation + inheritance = polymorphism, plus something about object identity) or you either know about databases (relational algebra, SQL, ACID properties). It is very hard for developers to know intimately both worlds, because the language is arcane, one term used in one of the fields is not exactly the same in the other, and all concepts come with a baggage.

Even inside any of those fields there is some disagreement, as for example, SQL people like nulls, but the people who invented the relational model (Codd and Date) rejected the idea. Fortunately OO was influenced by Lisp so the null concept is pretty straightforward. There are other controversies about if SQL is a properly defined computer language, since it breaks a lot of rules other programming languages have achieved since the start of the discipline, like variables that can hold expressions: SQL is a very strange computer language, because you can't assign variables the result of a query (you can iteratively manipulate data using cursors, but that defeats the purpose of SQL which is to manage data in sets and never individually).

Also in the OO field (so that the database field does not take the criticism personally) there is some controversy about object identity (is it really necessary for an OO language to be defined as such?), about how inheritance breaks encapsulation, about if the important part of object orientation is polymorphism or not, and how polymorphism could be achieved without inheritance.

Back to the class diagram fiasco

It doesn't matter if class A extends class B or viceversa, all those decisions are just implementation details. The important issue is what is the protocol (the set of methods and their behavior) that each class can handle, what is normally expressed as the "responsibility" of the class. All the rest can be obtained as a logical consequence of the responsibility of the class.

Inheritance in particular is just an implementation detail to avoid code repetition, but other useful mechanisms could be used: automatic code generation, AOP, dynamic proxies, traits, mixins, etc.

Setting the class hierarchy in stone is a sure way to bang your head against the wall.

No hay comentarios: