martes, 31 de julio de 2007
Fumbling the Future
This rant is about what is likely to happen in the following 5 years:
Guessing the Future
1. Virtualization of computers on top of the JVM. Now it is up to 80% the speed, but it will get to 120% of the speed or more through optimizations (how I don't know, but if you look at past abstraction layers, once they were slower, then they became faster: Curses, Java, etc.). There will be no reason to run an operating system directly on hardware.
2. Multicore: 2007: 4 cpus, 2011: 16 CPUs, 2020: 256 CPUs. How to use all that raw power? One solution is virtualization for consolidating servers on one machine. Another solution is multithreading for making programs run faster, the problem is that typical multithreading is hard, specially if you want to improve the speed by 10. Typical speed improvements is not even 2x when coding multithreaded code by hand and optimizations are error prone. The EJB model allows to write multithreaded code as if it were single threaded, and although EJB was probebly not the best implementation, it is a very smart idea and I bet someone will come up with a way to implement it correctly so that magically all Java programs are multithreaded.
3. Hard drives replaced by pendrives on 2012: Computers will get smaller and consume less energy. Some people argue that this can't happen because the flash technology on which pendrives are based does not allow to record the same memory address more than a few million times. That's a very interesting argument, but it is a technological one. I can't know how it is going to be fixed, but it is going to be fixed for sure, somehow, just because there is a market impulse in that direction, so there is money to be invested, and smart ideas wil get funded. This means computers will get less expensive more people will buy computers, the software market will become bigger and therefore the software makers will become richer. Also since pendrives are not mechanic, the structure of data in disk will no longer follow the BTree structure, since access to contiguous portions of the hard disk meant faster access, but flash pendrives are RAM, and therefore they could be stored in HashMaps, TreeMaps or LinkedHashMaps and it wouldn't matter, access will be at least 10 times faster. Also paging memory will become faster, so computers with 64 bits will have unlimited memory (compared with today's miserable 2 GB). You laugh now, we will talk in 2012 and we will have computers with virtual RAMs of 32 GB.
4. 32" inches LCDs on every desktop on 2012: Less consumption and less space means lower prices and less eye strain, since the LCD monitors do not flicker.
5. WiMax means always connected and always on. IP-Radio, IP-Cells and IP-TV will be used on the road. TV programs will be stored on pendrives, and everyone will have their TV shows, which means less quality (can we even go any lower? yes, unfortunately, but at least you will have many options and you will have to dig for information, which is really good for Google). When will WiMax take off? Maybe 10 years from now, but then it will be to late to develop the technology. It must be created and perfected 10 years before, as happened with other technologies.
6. Since computers will be so cheap and powerful, fast operating systems will be considered legacy, and secure operating systems (microkernels written in Java for example) will be the operating systems du jour. As long as they can run Java (and its legacy). Windows and Linux wil run on top of Java, and since Java operating systems like the BEA's Liquid VM run on bare hardware, good bye operating systems.
7. ERPs like SAP have taken the market by storm because they can offer an integrated solution for the whole enterprise, but they are not easily configurable. All the 6 months projects to adapt SAP turned into 3 years projects which either did deliver or were simply killed. SAP is migrating to Java. There are many open source ERPs. So you will be able to fund your own company on cheap hardware and free software, and once everyone can do it, it is not a good business anymore. Companies will have to use cheap hardware and free software, but also invest in differentiation if they want to survive. The copycats will thrive.
8. All the software will run on the web, using Ajax technologies, mimicking the way Windows or MacOsX work. A new standard for web usability will emerge. Which? I wish I knew, but I guess people will prefer the tried and true (the Desktop???)...
9. Operating systems running on top of Java and Java running on a browser. This of course means operating systems running on top of a browser. There is a web browser called Lobo that is written in Java. What we are lacking yet is an OS written fully in Java, and then Java will be able to run independently of the all the legacy code, but will support all the legacy code. Then the hardware manufacturers will be free to improve the CPU design removing all that is not needed to run Java. Woudl it be significant an operating system written in Java? For starters the code must be short and simple, it must be microkernel and use virtualization to run several operating systems on top (even itself). This way if a driver fails, that one driver fails and it doesn't take the whole machine down.
10. The app server market is fragmented. If there was a mechanism for executing all the code form the different app servers into just one app server, it would be a real hit. For the moment there are machines like Azul that have more than 300 cores and can execute different OS and different app servers on the same machine, integratign several platforms, making everything execute faster, because the application can communicate with other servers using the internal data bus. It would have been a lot faster if the applications were developed for the Azul hardware instead (really meaning all in Java, since Java can run on top of a bare Azul system).
11. Ethernet at 40GB and 100GB will be standard on 2010.
12. The market for IT (converting manual process in automated and semiautomated processes) is way deeper than what is actually delivered today. It is only through constant failure, that the market still manages to deliver such a poor performance. The main problem is that when user and analysts design new systems, there is a lack of developers to implement the new tasks and they deliver in weeks, months and years instead of delivering in minutes, hours and days. There is a strong market for a need that has not been satisfied.
13. Devices connected through TCP/IP. This means lower time to market and lower desing expense, since all devices and drivers will simply use TCP/IP. Besides there are improvements in the use of the buses because information can get fragmented and therefore, you don't have to finish one operation to start the next. This means better throughput and more reliable bus communication. Also it would mean that you could connect to another computer devices without having to access the host's CPU.
lunes, 30 de julio de 2007
Database Algebra
Explanation
x = variable
X = defined name
x -> y = given x find y
{} = empty set
{x} = a set with one element called x
{x,y,z} = a set with x, y and z
[x] = list of x
(x..y) = range from x to y
(x, y) = vector of x and y
y :
x y = x or y
s1 U: s2 = union of set s1 and set s2
s1 I: s2 = intersection of set s1 and set s2
x # x :
x => y = if x is true then y is true
x = (y,z) => x.y = if x is composed of (y,z), you can address x.y
PrimitiveType = ( int, float, date, string, string[n], blob )
Once the notation is defined (or should we call them axioms?), we can define some properties that we can prove to be true.
Algebra
(x..y) -> z == x -> z
(x, y, z) == ((x, y), z)
(x) == x
x -> y == (
Not very impressive, huh?
Well, the devil is in the details, and therefore the paradise is also in the details, if you know how to find and exorcise those devils ;-)
Database
Type =
Fielddef = (name, Type)
Field = (name, value, Fielddef) # (value : fielddef.type)
Tabledef = (
Row =
Rowtx = ( tx, Row ) # tx : int
Rowdata = ( (lowertx..higher) -> Rowtx ) # ( lowertx : int && highertx : int )
Tabledata = ( TabvleDef, pk -> rowdata) # pk <=: Tabledef && row.
Table = (Tabledef, Tabledata) # ( Tabledata.pk <=: Tabledef && Tabledata.Tabledef == Tabledef )
Database = (
| Strategy | Write the code. | Test to find defects | Fix the detected defects |
| No functions | 3 days | 1 day | 16 days |
| Use functions | 3 days | 1 day | 3 days |
This example shows what happens with a very small task and very few developers, now let us supposse that we have many developers and very large tasks, what would happen?
First, let us suppose that only developer A does his task and developer B is not required to do so. Do we see a huge impact?
| Strategy | Write the code. | Test to find defects | Fix the detected defects |
| Write the code. | 0.3 days | 0.13 days | 2 days |
| Test to find defects | 0.3 days | 0.13 days | 1 days |
We now see that the numbers are not so different for smaller tasks and smaller teams, but when teams grow larger, the number of copy-pastes increase exponentially, since every line that was copied and pasted is now a potential candidate for a new copy and paste.
I think the real reason that Microsoft prefers to copy and paste instead of creating new functions is the added cost of creating a new name. I mean what can you expect from a company that names their windowing operating system "Windows", their document processor "Word", their web browser "internet explorer", their planning program "Project". They have problems inventing meaningful names, since Bob, PowerPoint and Excel are not really names one could consider remotely related to the functions they perform. (Well maybe making a power point does have some sense, but that is an exception).
lunes, 9 de julio de 2007
Predicting how long projects will take...


As you have (I suppose) already imagined, the line is a diagonal forming a triangle against the X and Y axis (when planning it is a diagonal, or at least almost a diagonal, what actually happens in the project, if you could measure any points of completion, would look more like a stair going up and down).
Well, so far it seems deceptively simple.
Scrum masters also draw another line including the new items that appear during the iteration. Those were unplanned items and therefore they have to be written somewhere else.
I was thinking why all those items were drawn below the X axis, since apparently they just pile waiting for someone to fix them, but I was feeling unconfortable with it. When the time on the iteration finishes, how much time is left?
Some Scrum books suggest that you project those 2 lines and they will meet at some point, therefore you will know for sure when this iteration is finished. But they also suggest that several days may go by and you may make no progress, because the originally planned line became flat... So now the lines never touch again.
What if people begin solving bugs first? I think this is one of the best strategies for finishing early, yet the WBC encourages working according to the plan and leaving all those pesky bugs for the next iteration...
Alistair Cockburn has a very good example of a burn down chart and its issues.
So I was wondering if we can know for sure how long it would take, without resorting to the "project the lines" misconception. I really think it is a misconception because all those new issues should be restimated and put in their own iteration and burn down chart. At least that was what Alistair was doing in his example, but I still think it is too expensive, because it is an "after the fact" exploratory system. When faced with an a-priori estimating method and a-posteriori one, I would certainly prefer the a priori estimating method because it would render value before the other method. If after the fact is important, I could always do another estimation later.
How could we estimate a priori the amount of work left? We can estimate a posteriori each and every item and actually fix them in the next iteration, but that iteration will also have some bugs, and so on.
Let us suppose that we empirically determine that for every iteration, we always have half the amount of work left (and for the sake of simplicity, let us suppose it is always half fo the previous iteration).
So if we had 256 hours left in the first iteration, after 256 hours of plentiful work, we still have 128 hours left just to finish was was left on the first iteration. Agilist will complain that the amount of time in an iteration is fixed, while the amount of work is variable during the iteration (we can remove items, not add them, others like to add them), but the problem now is that we finished all items, but new items appeared, because we didn't realize in advance all border cases for example. We will call this time to fix bugs, the re-iteration, in lack of a better term.
We reschedule and we work those 128 hours, only to find out that there are bugs and we estimate them and it amounts to 64 hours left. So we continue to work and this goes on and on. This is the 21st century of version of Achiles and the tortoise.
Can we know in advance how long it will take the fisrt iteration and all its re-iterations?
sum( n = 0 to infinitum, x^n ) = 1 / (1 - x)
See: http://en.wikipedia.org/wiki/Power_series
So if we always have half the work left, sum ( n = 0 to infinitum, 2^n) = 1 / ( 1 - 1/2 ) = 2
That is to say, any iteration it will always take double of what we estimate. Does it sound familiar?
The same can be calculated if we estimate that we have a third of work leff, sum ( n = 0 to infinitum, 3^n) = 1 / ( 1 - 1/3 ) = 2/3
I think this is a very important result, because it eliminates the need to make expensive burn down charts, although of course you still need to decompose the project using a work breakdown structure, identify risks, develop prototypes, use iterations, etc. The only advantage is that given a few iterations you can gather some statistics and predict accurately how long the project will take.
But remember you need to finish those iterations first in order to gather meaningful statistics, otherwise all that estimation is just wishful thinking.
viernes, 6 de julio de 2007
Rants about patterns
I agree, he is right. The TMP is rather poor, but it is the heart and intent of all object orientation (Encapsulation + Inheritance = Polymorphism). To be fair, Polymorphism comes in 2 flavors in Java:
- Class polymorphism, ie: the template method pattern.
- Interface polymorphism, ie: use interfaces instead of abstract classes.
I have to agree with Alex on this one, but I don't like his proposed solution: Simply use an interface. What about all the code repetition?
It may be a good mechanism, but it is certainly a lot of code for any programmer to write, the TMP is very simple in comparison.
I have another proposed solution: Use Traits. Simply stated, a trait is just a class that has methods but no instance variables, some of the methods are abstract and must be redefined somewhere and the rest of the methods depend on those methods.
Here is a paper that explains Traits when compared to Interfaces, Mixins and Multiple Inheritance.
Multiple Inheritance is something to avoid at all costs. Interfaces are well defined, but they do not share any behavior, as we all know. Mixins are the next good thing, but a Mixin just mixes 2 classes creating a new one, it reminds me of templates in C++, also something you want to avoid, because the code looks simple, but what is does under the hood is disgusting.
There are some intentions to create alternative versions of Java that support Traits. I wonder if we really need that. Isn't Java Turing complete? Why should I have to extend Java to implement something so simple as a Trait?
Let us explain a wee little bit. First of all, if you use the TMP, you can share some code in the base class and override the template methods in different classes, but the only problem, at least in Java, is that you can't extend several classes at once and therefore there is some code duplication.
For example, let us suppose you have 4 classes: A, B, C and D, and they are defined like this:
class A
{
public void a() { ... }
}
class B extends A
{
public void b() { ... }
}
class C extends A
{
public void c() { ... }
}
class D extends B
{
public void c() { ... }
public void d() { ... }
}
This is the typical Diamond problem (if it were defined using multiple inheritance, and its solution in a sinlge inheritance language). Yes, method c() is repeated both in C and D, but Java doesn't have multiple inheritance, so this is the only solution.
Nevertheless you hate repeated code, you read about Mixins and Traits, Mixins maintain the source clean, which is a good thing, but the compiled code is a mess, so you study Traits.
Are you still with me?
We need a more realistic example to show how a Trait would work.
class Person
{
String id;
String name;
}
class Professor extends Person
{
List courseList; // List of Course
void administerTest( Test test, Course course ) { ... }
}
class Student extends Person
{
List courseList; // List of Course
void giveTest( Test test, Course course ) { ... }
}
class Course
{
Professor professor;
List studentList; // List of Student
List testList;
}
class AssistantProfessor extends Student, Professor
{}
Since AssistantProfessor can't really extend Student and Professor, we need to either extend Student or Professor and copy and paste the missing methods, ie:
class AssistantProfessor extends Student
{
List courseList; // List of Courses to teach
void administerTest( Test test, Course course ) { ... }
}
or:
class AssistantProfessor extends Professor
{
List courseList; // List of Courses to study
void giveTest( Test test, Course course ) { ... }
}
As you can see, the code is heavily repeated one way or the other.
The same solution using Traits:
class PersonTrait {}
class Person extends PersonTrait
{
String id;
String name;
}
class ProfessorTrait extends PersonTrait
{
abstract List getCourseList(); // List of Course
void administerTest( Test test, Course course ) { ... }
}
class Professor extends ProfessorTrait
{
List courseList; // List of Course
List getCourseList() { return courseList; }
}
class StudentTrait extends PersonTrait
{
abstract List getCourseList(); // List of Course
void giveTest( Test test, Course course ) { ... }
}
class Student extends StudentTrait
{
List courseList; // List of Course
List getCourseList() { return courseList; }
}
Now probably you have noticed that each class exists twice, once as a traits class with no instance variables and once as a normal class descending from the traits class. The class hierarchy has only traits classes (which are abstract) and leaf classes descend directly from those traits classes.
Also leaf classes are sometimes identical, like the Professor and the Student classes.
What about the AssistantProfessor?
class AssistantProfessorTrait extends StudentTrait, ProfessorTrait
{}
class AssistantProfessor extends AssistantProfessorTrait
{
List courseList; // List of Course
List getCourseList() { return courseList; }
}
The main problem with this is that AssistantProfessorTrait can't extend 2 classes. Even if that worked, there is no way to define AssistantProfessor's getCourseList() so that it satisfies both StudentTrait and ProfessorTrait.
miércoles, 4 de julio de 2007
The Manager Role
1. The technical manager.
2. The non-technical manager.
The technical manager is a techie just like you and me, with the same objectives in life: Make life easier by automating stuff. He is a techinical manager because after so many years, he now the tricks of the trade and therefore he may direct a small group of developers, teach them, share stories on how to do things, and preserve old techinical knowledge that without him, for better or for worse, would become extinct.
The non-technical manager is a different beast. He knows techno-speak, but he doesn't really know the technical details, nor he is interested in those details, he just wants to know what are consequences of those decisions, so that he may talk to other technical managers and ask for help or offer help (by offering you), or otherwise, talk to technical managers and let them know what they want. Their mission is to know what the market wants and deliver it, if they are willing to pay the price.
They think differently because their objectives are opposed. While the technical manager wants to deliver more value in exchange for less (those are the forces that motivate change in the market), the non-techincal manager is after the financial gain of the company. While the technical manager tries to create new frameworks and make them work using less resources, the non-technical manager is thinking about ways to force potential clients into promoting their products, for example by forcing them to display banners and the like.
Do we really need non.technical managers? Apparently we do. But each year I see there is less need of them, since there knowledge seems to be rather spare and simple, while the technical managers seem to be getting more complex every year. Eventually non-technical managers will be reporting to technical managers, as they do at Google.
Why do companies hire both technical and non-techinical managers? Technical managers are the ones who can deliver value. There is a need to dose that value in order for clients to pay. As the techinical difficulty increases, since each year projects are more complex, no technical leader is able to understand everuthing, so the role of the non-technical manager becomes less and less necessary: The technical manager also is ignorant when talking to potential clients and has to manage the relationship superficially and with care.
Typically non-technical managers recommend that technical people, when interviewed by the customer, do not respond direct questions with yes or no, easy or hard, kind of answers, but with "delayed-execution" answers, like "I will look into it", no matter what. If you think a little bit about, let us suppose some potential customer is asking for a feature, but you just don't know if it can be implemented, you certainly need to make sure that it can be delivered on time and on budget, so you prefer to think on the alternatives and actually try them before saying when and how. That's ok for you, but puts the customer in a difficult position, because he has already expressed his reals requirements and you give him nothing to negotiate with. You simply take that information for a few days or weeks and give back the results. He may like the answer or not, but the precious time you use to decide how much it will cost is something terrible for him, because you could come up with a price he can't pay.
Now if you look at it from the perspective of the non-technical manager, it is exactly what he wants the customer to think. He wants him to think he has a solution in his hands, but probably the product is too expensive for him, so he may need to reduce the budget on other things... That's what the non-technical manager wants him to think. Delay, delay, delay, so that you can ask for ridicously large sums of money. Justifications? Sure, why not, licenses, people hired in the project, hardware, delayed meetings, incorrect specifications, etc.
So in a way the non-technical managers are necessary, for positioning the product in the market, etc., the only problem is who reports to who.
UML is Brain Damaged
I've thinking this for 15 years, and now I must let everyone know that I think that UML is brain damaged. Sorry UMLers, but from its inception, even before UML even existed (in 1995, when it was UML 0.8), I thought it was brain damaged.
For starters, UML is specified... in UML. This means that UML means... whatever UML means, because it is specified in itself. People who know algebra or geometry will be laughing.
UML should be called UMD (Unified Modeling Drawing). Really, since it is not a language, neither in the computer language sense nor in the natural language sense.
People who was working with me, who didn't know object orientation, liked UML, the language (or drawing notation) that would allow them to draw diagrams and then we C++ coders could implement those wonderful designs. Needless to say, their diagrams and ideas had to be redone, usually by just tossing them away and looking at the original problem they were trying to solve and presenting a really straight forward solution to it. No wonder why people who couldn't code liked UML so much. It is the same kind of people who mention that failure and success is just an opinion.
There are no successful projects delivered using UML, and I guess all those people who can't program are hiding behind huge piles of diagrams stacked on the floor. You can laugh, but UML is an energy drain, since there is no way you can prove a design with UML is right nor wrong, therefore it is a waste, because it is open to interpretation. Most successful projects ignore UML completely and avoid it like the plague. This is not coincidence, but a serious decision to make.
Java was not designed using UML, nor it's class library. Still there are no compelling UML design for the Java class library and it has been around for at least 10 years. How come? Nobody can do it or it is found that it wouldn't be useful?
I have another explanation: Wouldn't it be possible that real engineers know that UML is a scam?
Have you ever seen diagrams of design patterns (Java best practices)? There are at least 24 design patterns with name and example code, but all UML diagrams made for them look the same. Would you use UML to document the design patterns you used? I bet you wouldn't, because it would be considered a waste, but then how can you explain your design if you can't draw them? But if you draw them, they all look the same anyway, so they serve no purpose.
UML class diagrams do not show polymorphism in your programs and polymorphism is the key to object orientation. No, using separate diagrams for each polymorphic message is not practical. Why is that so? The 3 amigos... no, not those 3, but these 3, who invented UML were after something else, not after solving real problems. Or maybe they were really incompetent, or a mixture of the two.
UML sequence diagrams do not show polymorphism and break encapsulation. Encapsulation and polymorphism are at the center of object orientation. Why is UML marketed as an object oriented tool if it works against object orientation?
UML use cases are simply escenario-oriented documents, specifying "steps" for the user to use the system, while we know that modern user interfaces (since 1984) are event-oriented, and therefore you can't force the user into any steps since the user selects the steps he wants to do.
Furthermore, Rational was the company behind UML. Where is Rational now? Why their people lost so much momentum?
Finally, Rational Rose was the key product marketed by Rational, but the product was obviously bloated (too big), underperforming (too slow), unusable (hard to use) and buggy (a diagram that extended beyond a page was usually the one you could not save). Maybe it was because Rational developed rose using UML.
The fact that most UML architects can't code is a sign that shows that they don't know what they are talking about, yet they have very strong opinions backed by companies that create UML design tools, a whole ecosystem. Architects should be able to code and give recomendations on how to code, pointing to design patterns when necesary. Doing code reviews and writing code conventions, for example, should be at least 20% of the time spent every day by any serious Java architect.
The UML tools camp disguised itself into the MDA tools camp and obviously joined forces with the dying CASE tool camp of the 80's. For now it seems that their fad has not gained momentum. UML has been marketed even as a BPR tool (Business Process Reingeneering), what a nerve!
You know what, it is good if they gain momentum, since it drives all bad programmers into them. Cool ;-)
Disclaimer: UML is still evolving and maybe in 100 years it will be suitable to capture requirements and model systems. In the mean time you can read a lot about the things you shouldn't do in case you are forced to use UML.
Archivo del blog
-
▼
2007
(48)
- ► septiembre (24)
-
▼
julio
(14)
- Fumbling the Future
- Database Algebra
- Map Reduce (part 1)
- Closures, Anonymous functions and Blocks (Part1)
- Load Linked Store Conditional
- Development Tools
- Class diagrams considered harmful
- Multicores and lock-free programming
- Setting Prices
- On Software Quality...
- Predicting how long projects will take...
- Rants about patterns
- The Manager Role
- UML is Brain Damaged