One thing I have come across continously over the years is the lack of thought developers have about collections. I actually managed to find it again last week in the JRuby source, but it could basically have come from anywhere.
Take a look at this simple method, which I will use as an example:
class RubyModule {
public Map getMethods() {
return this.methods;
}
}
This is a map of methodname and method for a specific ruby module (or ruby class), and as fully dynamic as the language is, methods are added and removed dynamically.Class Responsibility
If you have the map publicly available, callers will most often iterate over the collection themselves to find what is important to them, or indirectly invoke operations on the map. Examples I found included iteration to find a specific method, or to see if a name exists.module.getMethods().containsKey(name)In the above situations (I won't show some of the more complex iterations), it would have been more appropriate if the class was responsible for answering the questions, having methods called:
class RubyModule {
public boolean containsMethodNamed(String name) {..}
public DynamicMethod getMethodNamed(String name) {..}
}
This takes some of the operations on the collection back into the responsible class, so the caller does not need be quite as verbose. Actually, that the caller manually iterates the collection is quite a common OO anti-pattern. Developers may be able to think abstract, however, once they start programming it has to be all too concrete.You might recognize the problem under the name of Law of Demeter.
The worst possible option when exposing a collection is that callers modify the collection directly by adding or removing elements. This, quite fortunately, only occurs seldom. Developers seem to respect the ownership of the collection in these situations.
Race conditions, critical regions and mutual exclusion
In all threaded scenarios, the only problem to worry about is race conditions. A race condition occur when there is more than one thread operating on a shared data structure, and at least one of the threads is performing a write operation. In a collection this is the dreaded ConcurrentModificationException. Usually one thread is adding or removing from the collection, while another thread is currently iterating over the collection.A race condition can actually be removed by eliminating just one of the three conditions (2+ threads, shared data, one writes) . However, if this is not possible, you must find ways to ensure concurrent access.
The period of time a thread is operating on a shared structure in which a race condition could occur is known as the critical regions. If these are mutually exclusive, then there can be no problem. This can be achieved using synchronization
Synchronization
One method for mutual exclusion is to synchronize on an object. The collection itself is mostly used, but it could be any agreed object, really.When a map is publicly exposed, many operations on the map are distributed across the entire codebase, and most often made by many developers, who does not necessarily think race conditions! all the time.
This is another reason why adhering to the Law of Demeter could be very feasible. It concentrates the critical regions to the originating class..
Developers are not sure which data structures are shared nor if they should perform any measures to identify or rectify critical regions. Especially when there is no help in the javadoc. At a minimum the methods exposing a collection should be clearly labelled if any operations on the collection could be critically affected by other threads or not.
Avoiding race conditions
Synchronization is almost impossible when a collection is spread out in the codebase. Synchronization is also time consuming and can be a pain for performance because of too many unnecessary waits.How then can we solve these problems? I hear you cry.
Basically, by just eliminating one of the three conditions of race condition you're home safe. Either remove the threads, don't share or don't write. The first one is mostly impossible, so we'll work around the last two.
Don't share is pretty simple. Instead of returning the original collection, you create a copy collection to return instead. Then the caller can do whatever they want with their own copy.
public Map getMethods() {
synchronized(this.methods) {
return new HashMap(this.methods);
}
}
During the copy in the constructor, there is a critical region, and so I have added the synchronization. This means that the critical region is within my own class, but can never extend outside of the class.The downside of this solution is ofcourse the many new collections, but they are usually shortlived and does usually not fill up the heap. And besides, you still need to do synchronization. So its not really don't share, but more like don't share outside of this class. But it still serves the purpose.
Don't write is also easy to achieve. Just use immutable objects or copy on write. Fairly good thing we have immutable strings, however, this semantics is not often used in collections.
Read only collections in java is a only achieved using a wrapper from java.util.Collections, which basically throws UnsupportedOperationExceptions if add or remove is called. It is about the worst hack to have around right after java.util.Calendar.
No, what you do instead is a copy on write semantics, such as this:
public void putMethod(String name, DynamicMethod method) {
Map newMap = new HashMap(this.methods);
newMap.put(name, method);
this.methods = newMap;
}
This basically solves the problem. When calling getMethods(), you would either get the previous map or the new map with the added method. A client which held onto the old map will only ever see the old methods, but ConcurrentModificationOperation would never occur.Still...!
Even though the copy-on-write often solves the problem, I would still advice against public exposure of any collection. Two thread can still share the same structure, and writes can occur other places. So while we are waiting for read-only collections there is only few options such as using the wrappers of java.util.Colletions if we really wan't to sleep better at night.Off course we could also write our own aggregating class, e.g. MethodMap which could provide a MethodMapIterator, or perhaps just the aggregating functions as operations. However much this idea is true OO, it will never really see the day.
I was reading code that uses the Java Map collection in every class and pretty much find it unreadable and confusing. When I starting looking for more, I found this blog and clicked on of the links that describes the Law of the Paperboy.
ReplyDeleteI really, really, like this example. I think that all code should implement this kind mentality. It makes writing code much easier to write, read, and improve.
Thank you Neil for including this!