Wednesday, 5 October 2011

5 Little Things I Want from a Build System

I have been thinking real hard lately what would the the top things I would look for in a build environment.

Granted I have had amble experience from early days and until now, which include very prehistoric makefiles and shell scripts up til the more modern equivalents. This includes ant, ant-ivy, maven-1, maven-2, as well as gradle and rake, and quite recently buildr.

I see a lot of people arguing over what is better, but more often than not I think these blogs tend to overlook the vital factors I would be looking for to provide some professionalism to the build, and to put it simple, it is not who has the biggest fastest build, the coolest syntax or the easiest extensibility, but a little more .... mature, I would say.

  • Dependency management
  • Build task materialization
  • Include QA heuristics in your build
  • Understand enterprise projects
  • IDE integration

Dependency management


Software today face increasingly reuse. Especially with runtime models such as OSGi, it is vital that we modularize into smaller and more dedicated units. This sooner rather than later starts to provide you with a lot of dependency graphs, that essentially needs to be resolved. Now, I don't really care how it is specified, but the point being, that some software knows and publish their dependency requirements, and as such we can use automated dependency management, actually we should. Why have to do this stuff manually?? Maven dependencies was the first. Good or bad, they were there and with the right metadata in your poms, it actually gives you full dependency management. However, over the years I have become increasingly disappointed with the quality of the provided software (as in the examples here). It is also emerging in OSGi applications to have this full dependency resolution, however, there are quite a few catches in there as well.

Build task materialization


One thing I have always really admired Maven for (believe it or not) is the ability to materialize everything on demand. Mostly people see this for the project dependencies, but I see it particularly useful for the build plugins. As soon as you need some functionality in your build it is automatically downloaded to your box. I have always hated this part of ant. You need to download all the ant extensions manually, or sometimes the entire tools projects, get it configured through xmlns extensions etc, before usage. Needless to say it does make it harder to get started, unless you start storing all the libraries in some project folder, and hence in source control... With ant I sometimes use the get task to download things I need to configure, but the build scripts doesn't look nice and slick, and there is a limit to how good it will get. Another very terrifying example are the ant tasks, which eclipse provides for Eclipse RCP ant builds (and their very cumbersome recursive ant scripts). Essentially the only way to use them is to use the Eclipse Antrunner.. So far I have only met two guys, who actually understood how eclipse rcp builds really worked when running headless...

Include QA heuristics in your build

This is kind of an offspring of the previous. I really fancy to have plugin materialization, because I like to include some stuff that will ALWAYS be executed in the build. Especially have a build cycle, which can encompass what you really need in modern builds, code-generation, compilation (plus cross language compilation), packaging, but also UNIT TESTING, PMD REPORTS, FINDBUGS, CHECKSTYLE etc, all the modern day QA heuristics, so they are always integrated into the build. It is easy to put into the build, and then it IS IN THERE, not something that really need a lot of maintenance and extra downloads etc.

Understand enterprise projects

This is basically the evolutionary understanding that any slightly more advanced project than hello-world application today is multi-modular, and needs to be able to expand when modularisation becomes necessary (Not just expanding into a bigger project, but multiple smaller, less coupled ones), so it needs to be able to handle this without exploding in build complexity. Ant doesn't really scale to multi project builds. Maven have some mechanism for treating this in an independent fashion using modules and parents, but the rudimentary distance between a project and its parent have actually created more problems in my builds than it has ever solved. Also maintaining the set of pom files unfortunately reach a peak point after which you just don't want to flexibly refactor anymore.

Projects also needs to be a single independent file tree. Mostly, such that it can be located anywhere without much any configuration and still succeed first time. Especially on the new developers machine, on your new laptop, the spare you bring for the trip, or even the continuous integration server or any of its slaves. This is often automatically the case in open source projects because of their distributed nature, but I'm surprised how bad it still can be in certain enterprise companies. Wake up and smell the dawn of the millenium...

IDE Integration

With this I do not mean that you should have the ability to run or edit build scripts in an IDE. Clearly all IDEs allow you to edit files and execute some external application. No, it is more the very little thing that seems to escape me the most. Given that the build system has a lot of knowledge of your project nature, your dependencies and your intrinsic QA requirements (pmd ruleset etc), it should be possible to setup a consistent ide environment from the build. I don't really care if it is the IDE, that provides the integration or the build system (like the difference between using Eclipse's Maven plugin or Maven's eclipse plugin), but it needs to be understood in a consistent manner. It has always annoyed me that ant would have all these paths setup, but they needed to be duplicated in the IDE, because they were not consistently parseable.

In essence these are my requirements. With this I believe that I can tackle the finer hurdles of enterprise software. I need solid dependency management in a complex, multi project build with location transparency and preferable good IDE integration from a single build solution. That it would be fast, imperative, extensible and easy to maintain is just an added bonus.. What do you think?

My latest surprise in this respect has been my encounter with apache buildr. Even though I have not done much Ruby, and not really up for it, I must admit this encounter was a very positive experience. Could this potentially tip me off from recommending using Maven in the enterprise.

Tuesday, 18 May 2010

A word from the Trenches

I’m tired of building the same software anytime. Does anybody know of a good data distribution mechanism for tabular data..?

Friday, 13 June 2008

One missing language construct

A while ago, when reading about a JavaScript template engine, I encountered a language construct called

forelse

The construct was placed to accomodate for the case where there was zero iterations in a for loop. This is quite handy in web scenarios, for instance when you need to present a list, and the list is empty. You would then do something like this:

for (SearchItem item : items) {
... (do something with item)
} else {
.. (do something when there are no items)
}
It's not by accidence that I show it like java code. I sometimes come across this scenario, but usually see it inside an if:

if (!items.isEmpty()) {
for (SearchItem item : items) {
... (Do something with item)
}
} else {
... (Do something when there are no items)
}
And I have actually begun to find it too verbose :-)

So, +1 for a forelse construction..

Monday, 26 May 2008

3 Vital Things to Know about Synchronized

I coincidentially came across code similar to the section below:
private boolean connected = false;
...
synchronized (connected) {
if (!connected)
return;
// If connected, change to disconnected and proceed, otherwise do nothing
connected = false;
}


Which gives rise to 3 vital things to know about synchronization:

1) Synchronize on the object, not the reference

This maybe a surprise to some, but it is actually the object at the end of a reference you synchronize on, and NOT the access to an instance variable. If you would like that, you should synchronize on this instead. If you don't trust me on this, try assigning a variable to null and synchronize it. It will throw a NPE because it cannot access the object to lock.

2) Don't synchronize on global objects, such as Boolean etc.

In this case, the private boolean will be autoboxed into either Boolean.TRUE or Boolean.FALSE. Synchronizing either one of these instances will lock for all similar synchronized access even in other classes (If others does the same mistake). Keep synchronization as small as possible and don't use objects that others can actually see.

3) Don't reassign the variable containing the synchronization.

By reassigning the value of the reference, the buck actually stops here. No more synchronization. This is because when others come to the synchronized block, they will find another object than the one that was locked. You see this occasionally with collections:
synchronized (map) {
map=new HashMap();

lock using one object, then create a new instance. Using collections as synchronization point is often seen, but may be worse than you anticipate (See point 2 above, and this article as well)

Friday, 21 March 2008

Tracking the NPE

For some reason you always end up with a NPE when you least of all need it.

The problem with NPE is usually that you can see WHEN you use the object that somebody has provided the WRONG information earlier on. NPE always means trying to trace backwards in the code, and find suspects, and that is not always possible.

Imagine a simple attribute on a class (say name). You will see NPE when it is USED, not when it is ASSIGNED. Yet, the problem is the assignment, not the use.

In some of the many JVM languages and suggestions, there are occassionally the brilliant idea of null-safe types, but some of them relates mostly to USAGE, not ASSIGNMENT.

I still work mostly in Java, and have to write (and advocate for) defensive assignment styles to avoid this problem (in attributes and parameters).

Some attributes/parameters have a legal right to contain null (This could actually be less than you think), and so we try to establish a contract of WHEN these CAN contain null. With the rest we reserve the right to fail fast. The earlier the better..

This view point is not really agile. It's pragmatic. And there is no reason why we haven't learned anything since the arrival of the Java language.

Tuesday, 27 November 2007

Jruby is heavily developing

I have tried to help out with the JRuby project, after Charles Nutter requested help with the 1.1 release, but it really disappointed me at first.

I remember going straight into Jira, and examining the current blockers and criticals to pick a few problems. The two problems I immediately picked were both concerned with race conditions, and I managed to work quite a while to drill down to the actual problem.

Both was regarding inappropriate considerations in thread global data structures (caches), somethings real nasty to put into production. No wonder it is reported as a blocker in Jira.

For the first one, I provided a patch 05.11, and for the second one I provided a patch 07.11. For trunk and for 1.0 branch. Since then nothing has been done. I have tried to mail the JRuby team, write on the dev list, talk on ICQ, but nobody has any interest in committing the changes.

As such all development are currently towards the compiler, and not about threading issues.
Charles explained this very reasonably, and my deepest respect, there needs to be priority, but I just know that 5*9 systems needs concurrency eventually.

Anyway, instead of just b*ing about it, I'll see if there is something other to look at.
However, it is not very clear how everything works by just reading the source code. A design document of some kind would be highly appreciated.!!!

Sunday, 11 November 2007

Who needs documentation anyway?

I remember a tutorial once, which explained how to install rails like this:

gem install rails -y --no-rdoc --no-ri --include-dependencies.

How, come the user, as a novice, does not need the documentation?

Well, the explanation was simple. If you include the documentation, it will take a long time to complete the installation.

Why do they think, that the new user, which is struggling to learn a topic, does not need documentation?

Seriously, I know documentation can be pretty bad occasionally or even missing, but this is absolutely nonsence, and shows how little people think out of the box.