Loading a binary file

rreyelts · March 15, 2004, 10:32pm

[quote]This sucks compared to the super easy and simple c method of simply casting a chunk of memory to a struct.
[/quote]
If you’re doing that to read/write a file, you’re playing with fire. Totally ignoring the fact that your code won’t run on architectures with different endianness, the in-memory layout of a struct is highly dependent on the compiler you used and the options you use with that compiler.

God bless,
-Toby Reyelts

princec · March 16, 2004, 6:30am

This isn’t the problem that most of us are trying to solve in Java right now. Most of us are trying to interface directly with a known memory layout… if that memory layout is a non-native endianness then no amount of cleverness is going to make it particularly easy for the JVM engineers without resorting to lots of JVM hacking and performance loss anyway, but I would still expect the JVM to deal with it and produce correct results by generating code that manipulated non-native-endian mapped fields according to the JLS instead of just getting it wrong.

At the end of the day any solution presented must be representable in the first instance in pure Java bytecode and execute correctly. The JVM should be able to use its special knowledge of the semantics to eliminate memory copies and unnecessary bounds checks. Then it’d do everything C can do, except reliably

Cas

mthornton · March 16, 2004, 6:53am

[quote]Almost but not quite. There’s no performance gain: multiple bounds checks still, multiple gets and sets with all the associated performance loss
[/quote]
A JVM could recognise a structure like the Dautelle ‘struct’ class and eliminate most of the bounds checks. I think there is a much better chance of a JRE for this succeeding than for a language change.

princec · March 16, 2004, 8:13am

The resistance to language change in Java is over and over producing verbose, nasty code and hacks. How long did it take to get “enums” despite the fact that “you can do them in Java anyway just by typing loads and loads of tedious error-prone code”? Enums are in the same league as structs.

Cas

overnhet · March 16, 2004, 9:03am

AFAIK C# has structs and using unsafe code you can read them directly from a memory block. That kind of features should help game developers moving from C++ to .NET (and I doubt they will even consider Java as an alternative). >:(

To Cas : I have spent the last few days playing with metadata, reflection and generics to get my own “poor man’s struct” impl. There are still a lot of things to do, and I had to restrict the problem to dumb I/O with Streams and Buffers (no direct mapping to ByteBuffer so far).

There is one thing I wanted to ask about your struct proposal. Suppose I have a simple Vertex3f struct, with x,y,z fields being public : if an instance v is mapped on a ByteBuffer, should v.x = aFloat; modify the underlying ByteBuffer ?

princec · March 16, 2004, 9:14am

Yes, that’s exactly the point of structs - directly modifying data in Buffers by mapping normal Java fields over them.

Cas

overnhet · March 16, 2004, 9:26am

Then I think it is not feasible with current Java, even with metadata : AFAIK there is no method that would allow to “catch” direct access to a public field and translate it into a ByteBuffer.get() call.

An alternative would be to enforce the use of getter/setter to access struct fields, but who wants to write a bloated v.getX() when a plain v.x is much more readable ?

blahblahblahh · March 16, 2004, 10:03am

Unless I’m missing something about performance here (and, let me be clear: as far as performance is concerned I personally couldn’t care less for structs: I just want clean OO access to raw data), let’s not worry too much about personal preferences for the avoidance of getters/setters…not being able to use direct variable access is not a major barrier to using structs (nb: assuming intelligent compiler I would personally almost always use methods for structs in preference to var access!)

Although (as I said in the getters/setters thread) I’m getting fed up with the extra typing ;), it is with good reason that the method-based approach is “preferred” (in well-designed and well-implemented projects) for non-local variable access.

swpalmer · March 16, 2004, 10:22am

Save some some typing, drop the get/set prefix, like much of the ByteBuffer methods do (e.g. position).
Overloading gets you what you want.

v.x() to get
v.x(Float) to set

Then hopefully the 1.5 metadata stuff can be used to automatically generate the boilerplate code that will redirect these calls to the underlying ByteBuffer.

I’m waiting for someone to come up with an implementation… /me taps fingers impatiently…

princec · March 16, 2004, 10:32am

By all means make the fields private and put getters and setters all over them if you need to. The issue is with internal operations on the data, not external access to data, eg. simple vector class:


public struct Vector3f implements Vector3fReader, Vector3fWriter{
private float x, y, z;
public float length() {
return (float) Math.sqrt(x*x + y*y + z*z);
}
public float getX() { return x; }
public float getY() { return y; }
public float getZ() { return z; }
public void setX(float x) { this.x = x; }
public void setY(float y) { this.y = y; }
public void setZ(float z) { this.z = z; }
}

as opposed to this awful travesty:


public class Vector3f implements Vector3fReader, Vector3fWriter {
private int offset;
private final ByteBuffer buf;
private static final int X_OFFSET = 0;
private static final int Y_OFFSET = 4;
private static final int Z_OFFSET = 8;
private static final int SIZE = 12;
public Vector3f(ByteBuffer buf) {
this.buf = buf;
}
public void setOffset(int offset) {
if (offset < 0 || offset >= buf.remaining() - SIZE) {
throw new IndexOutOfBoundsException();
}
this.offset = offset;
}
public float length() {
float x = getX(), y = getY(), z = getZ();
return (float) Math.sqrt(x*x + y*y + z*z);
}
public float getX() { return buf.getFloat(offset + X_OFFSET); }
public float getY() { return buf.getFloat(offset + Y_OFFSET); }
public float getZ() { return buf.getFloat(offset + Z_OFFSET); }
public void setX(float x) { buf.setFloat(offset + X_OFFSET, x); }
public void setY(float y) { buf.setFloat(offset + Y_OFFSET, y); }
public void setZ(float z) { buf.setFloat(offset + Z_OFFSET, z); }
}

Cas

overnhet · March 16, 2004, 10:47am

Yep, direct access to struct field is purely a matter of taste. I think that (some) structs are “lightweight” classes used as mere data storage mean and won’t benefit much from clean encapsulation.

Now if get/set (or swpalmer’s way) are an option, it should be feasible. Did anyone ever used a bytecode generation lib ?

swpalmer · March 16, 2004, 11:13am

PrinceC it is only an awful travesty because you actually made the mistake of looking at the code :). If this is auto-generated code from metadata info etc. Then don’t look at it. Pay no attention to the man behind the curtain. The public API is decent and that’s all you need to care about - so long as the performance is acceptable.

As a first go I would even ditch the offset and start with this:


public class Vector3f implements Vector3fReader, Vector3fWriter { 
private final ByteBuffer buf; 
public Vector3f(ByteBuffer buf) { 
  this.buf = buf; 
}
public float x() { return buf.getFloat(0); } 
public float y() { return buf.getFloat(4); } 
public float z() { return buf.getFloat(8); } 
public void x(float x) { buf.setFloat(0); } 
public void y(float y) { buf.setFloat(4); } 
public void z(float z) { buf.setFloat(8); }

But, you want to be able to read a huge number of Vector3f from a single chunk of memory… that’s why you added the offset, right? But why worry about constructing new Vector3fs to deal with this? They are small and likely short lived in this context.
I would profile first… then added the offset idea if it was required.

princec · March 16, 2004, 11:57am

If the man behind the curtain is constantly doing 3x as much work as the C++ man then sadly certain operations are going to be a lot slower than in C++, and this particular kind of operation is of especial significance to people performing intensive geometry processing operations.

Creating tons of little objects is sadly still a killer in a rendering loop, and the memory is better used for something else.

Consider my age-old BSP conundrum. You have a BSP file, containing data for vertices, triangles, nodes, etc. If you tried to represent this in Java as an actual graph of Vector3fs and so on you’d end up with 50mb of object header bloat before you even got round to storing the data. This is another problem that the sliding struct solves really neatly.

Cas

blahblahblahh · March 16, 2004, 12:17pm

I think it would help get the attention of the different advantages of struct’s were better categorized, and it was made clear that they are related to completely separate abstract use-cases.

e.g., Structs:

Provide OO access to “raw” data from an external source (usually either a network-protocol or a file-format)
Provide higher-speed access to raw data that is large data structures with many small fixed-size data structures inside them

Sliding Structs:
3. Significantly reduce memory requirements and increase speed for apps that have a huge number of very small objects that cannot effectively be represented as arrays
4. Enable portions of the OO universe to be constrained to a sequential portion of native memory so that the application can manually dump and restore them as needed.

NB: I’ve never looked at using mem-mapped BB’s for use-case 4; the last time I was doing that was pre-1.4.x (c.f. below).

I’ve no idea if these are good categorizations/use-cases, but the current descriptions tend to be different depending upon who you ask, with lots of “Oh, and BTW there’s also another good reason”, instead of a clear, easy-to-read overview.

I’ve faced the same problem when dealing with massive parse-trees / AST’s (of the order of 10^6 - 10^7 (or more) nodes), in the days before BB’s existed, and structs would have been a great help (in the end I just borrowed RAM and did partial-evaluations instead). In this example, you also want to “checkpoint” frequently (losing the partial results of calculations that have generated many millions of nodes is not something you want to do!) - and BB-contained sliding-structs (IIRC your definition of the “sliding” struct…) provide a very convenient way of doing this: (temporarily) I don’t care about the fileformat, just let me do a straight-through dump-to-disk at maximum speed ;D. If the system crashes, I at least know I can get the data back…

Indeed: it is “another problem”.

I’m not saying it’s not a worthy problem to solve, but in the current state of things I think the structs issue comes across in a very confused manner to people who don’t already know all the advantages.

Describing it as separate issues may also make it easier for sun to evaluate in the light of other activities - e.g. if they are separately spending considerable effort elsewhere trying to make objects have smaller memory footprint then part of the use-cases may be already improved from that different direction.

mthornton · March 16, 2004, 12:31pm

The archetypal case for this is probably arrays of Complex values. In my opinion this case should not be dealt with via structs, but rather by one of the immutable object proposals.
The structs for external communication and the need for efficient classes for things like Complex are two separate issues that do not deserve a common solution.

princec · March 16, 2004, 1:51pm

Quite. Structs aren’t meant to be used as lightweight objects, as they must be backed by a byte buffer.

Cas

blahblahblahh · March 16, 2004, 1:59pm

[quote]Quite. Structs aren’t meant to be used as lightweight objects, as they must be backed by a byte buffer.

Cas
[/quote]

Then your BSP example is pointless.
As I mentioned, in the AST use-case there is a definite advantage in being backed by a BB

mthornton · March 16, 2004, 3:05pm

[quote]I don’t care about the fileformat, just let me do a straight-through dump-to-disk at maximum speed ;D. If the system crashes, I at least know I can get the data back…
[/quote]
Even before the advent of nio, I found dumping data to a file in a binary format was often limited by the disk speed. Depending on the data, piping the stream through a gzip compression would sometimes improve matters further.

Although in the past I have used ‘structs’ memory mapped in C++, in most cases I now think this is a mistake. For all but the most trivial objects (and objects which must be guaranteed to remain trivial), the lost capability relative to ‘proper’ objects eventually becomes a problem.

princec · March 16, 2004, 3:06pm

Why is the BSP example pointless? I’d simply map my BSP file directly from disk into memory and then when I needed to walk it I’d just have a sliding Node struct to follow the tree, and sliding Vector3fs to find coordinates in it, etc.

Cas

rreyelts · March 16, 2004, 3:16pm

public class Vector3f implements Vector3fReader, Vector3fWriter {
private int offset;
private final ByteBuffer buf;
private static final int X_OFFSET = 0;
private static final int Y_OFFSET = 4;
private static final int Z_OFFSET = 8;
private static final int SIZE = 12;
// ...

Holy crap. That code is so close in style and structure to my own that somebody would swear that you copied it from me or vice-versa. (A good argument against these ridiculous software patents). The largest difference is that I’ve got some more complicated indexing to do in some places, like:

public Link linkAt( Link link, int index ) {
  long nodeLinkPtr = getPtr( network.nodes, offset + LINK_OFFSET );
  int nodeLinkStartIndex = ( int ) ( nodeLinkPtr - network.nodeLinksAddress );
  assert( nodeLinkStartIndex >= 0 );
  int nodeLinkIndex = nodeLinkStartIndex + index * SIZEOF_INT;
  int linkIndex = network.nodeLinks.getInt( nodeLinkIndex );
  assert( linkIndex >= 0 );
  return link.setIndex( linkIndex );
}

God bless,
-Toby Reyelts