I still have 2 questions about the design of the SGS App API:
I would like to know why the genericity in the class ManagedReference was not designed like a Collection(i.e. ManagedReference<E extends ManagedObject>),
and why there is a need to specify a class in the functions get() and getForUpdate(). Was it just to be able provide the wanted type as the return value or does it help to request the object from the database?
On 1, its a really fine and subtle point of semantics. While it would seem the obvious thing to do (and something we did in the EA stack) it turns out to be subtly, arguably semantically wrong. If you want the whole argument in depth, you’ll need to get it from Seth or Tim.
On 2, its so the return type is cast to the correct type, yes. At that stage in the game you have more information as the object has been retrieved and the casting becomes more semantically correct.
I ended up writing a wrapper around ManagedReference that could track the type it was referencing. I didn’t like the fact that because of cut and paste errors I might try to get the wrong type out of a managed reference. The reference itself really didn’t seem to have a clue as to what it was refering to. Using this new class, I can specify it’s type and even bind it to the datamanager within the constructor.
import java.io.Serializable;
import java.math.BigInteger;
import com.sun.sgs.app.AppContext;
import com.sun.sgs.app.ManagedObject;
import com.sun.sgs.app.ManagedReference;
public class GenericManagedReference<X extends ManagedObject> implements ManagedReference, Serializable {
private final static long serialVersionUID = 1L;
private ManagedReference ref;
private Class<X> clazz;
public GenericManagedReference(final X obj) {
ref = AppContext.getDataManager().createReference(obj);
clazz = (Class<X>)obj.getClass();
}
public <T> T get(Class<T> arg0) {
return ref.get(arg0);
}
public <T> T getForUpdate(Class<T> arg0) {
return ref.getForUpdate(arg0);
}
public X get() {
return ref.get(clazz);
}
public X getForUpdate() {
return ref.getForUpdate(clazz);
}
public BigInteger getId() {
return ref.getId();
}
}
Yeah, I understand what your saying. Although in my code I’d prefer to have references that are specific to a specific class instead of making everything an Object. I’ve used this GenericManagedReference everywhere in my code and it has helped catch a lot of would be runtime errors.
I think
GenericManagedReference<Foo> fooRef = new GenericManagedReference<Foo>(new Foo());
.
. Do some stuff here
.
Foo foo = fooRef.get();
// Note that Bar bar = fooRef.get() will not even compile.
is a lot safer and easier to read than
ManagedObject unknownReference = AppContext.getDataManager().createReference(new Foo());
.
. Do some stuff here
.
// Providing the class twice seems odd?
// Seems that
// <X> X ManagedReference.get(Class<X> value)
// Could just be
// <X> X ManagedReference.get();
Foo foo = unknownReference.get(Foo.class);
especially considering that I’ve been known to do this a bit . . .
ManagedObject unknownReference = AppContext.getDataManager().createReference(new Foo());
.
. Do some stuff here
.
// Since I don't know what this is a reference to, a quick typo will give me a runtime error
Bar bar = unknownReference.get(Bar.class);
is not particularly helpfull. The result is the same so it is as if there was no genericity at all in the class.
If there is no really usefull genericity in the class, I prefer that there is at least the one I proposed since it leaves the choice for the user to use it or not.
I am really curious and would like to understand where it is semantically wrong to use the design of the collections here.
I had a talk with Tim. It was a fairly late decision actually in the API re-design and I think I understand it now, but its a pretty subtle point, so I hope I can explain it clearly.
The issue is this. ManagedReferences are truly typeless, as I mentioned above, until the ManagedObject they reference gets instantiated by the system. In other words, at run-time.
Generics are a 100% compile time construct. They check type at compile time. For this reason, in a sense, a Genericly typed ManagedReference would by “lying to you.” Although it might pass the compile time check there is no assurance it would not actually fail at run-time, when the ManagedObject returned to you from the get() proved to be of some other type.
So it gives no assurances, but causes extra confusion because it looks like it does.
… but … that’s the same with the ArrayList class, and people are still happy to use generics on them.
Isn’t there a way to guaranty (at least for the user of SGS App) that a ManagedReference really reference the type of data specified in its Generic type? For example, by adding the genericity to the following function in the interface DataManager:
… and it seems that this function is the only one to create the Managed reference.
Now, how can a ManagedReference could reference an object whose type is incompatible with its generic type?
I also want to point the fact that with or without a genericity in the class, the cast exceptions happen in the same places and the result for the user is the same.
Thats simple. In the multi-stack objects can be created on various servers potentially running various versions of your app (or at least at upgrade time)… And thus one set of code could put one type of object in the store and then another node of the cluster could retrieve that object using say your new code and the type cast would fail or code fail… There is no garuntee in a distributed system and I think they wanted to imply that in their API…
But, as I said, with or without generics on the class, the class cast exception will come at about the same place (inside the get function for the current version, and at the usage, pretty shortly after the get function, for the version that have the generic on the class).
The array list has to be created with a generic type. That then gates what can go into that
list. The result is you are assured (baring certain VERY grody and deliberate reflection tricks) of
what is in that list at run time.
By contrast the Object Store always contains a mix of types and all a ManagedReference contains is an index number. Furthermore the ObjectStore itself has no knowledge even at run-time of the types of objects within in it. until they are instantiated. (I realize thats an implementation detail but thats how its implemented today.)
As I said, its a very subtle point
Well thats a detail of the current API and not necessarily true going forward. (In fact, there is already a break in this
assumption in the 9.1 release that adds a way to get the ID and recreate a ManagedReference from that.)
In the case of your ArrayList, the proper operation is assured by the syntax of the language. Here you are depending on an undocumented feature of the API (that that is the only way to get a ManagedReference.)
Yes, but the assumptions of the user are not. When its not typed the exception for a mis-matched type is expeted, when it is, it leaves you scratching your head asking “how did that happen?”
Right, and that’s part of the problem here. For folks following along at home: if createReference() takes an Object, and calling get() on the reference requires a type as a parameter, then we can explicitly cast the object and throw an exception if the type is wrong. If, on the other hand, createReference() takes a T, and calling get() simply returns something of type T, then there is no cast until you actually invoke the referenced object. This means that you may not see the exception where you expect to.
There is an alternative. We could make getReference() take both a T and a Class instance, where the class is the same as the type of T. In my opinion, that’s pretty terrible. It also means that the class information needs to be included with each serialized copy of the object, and needs to be meaningful in all classloaders. Given this, then yes, ManagedReference could safely be templated.
So, there are several options, none of which (in my opinion) is all that attractive. Use no templating on ManagedReference and provide a Class on get(), template ManagedReference with no Class parameter such that calling get() may not actually return the right type, or template ManagedReference and also include a Class parameter when the reference is created. For that matter, ManagedReference could be templated and we could require the Class on get(), but that’s just all kinds of unattractive. After banging our heads against this for a long time, we decided to go with the approach that provides (what we think is) the best tradeoff between safety, performance, debuggability, and ease of coding. No doubt that last point is endlessly debatable
floersh raises exactly the right issues for why we assume that the datatypes might not match. In this kind of distributed system, if you ever want to upgrade your code, core code, etc., then this case can arise. We want to give developers the tools to handle this as clearly as possible, which includes making it easy to track down exactly where type mis-matches happen.