Impossible race condition between threads?

Riven · November 3, 2013, 12:12am

I’d get rid of the need to synchronize (through volatile fields or locks) for every model, by adding the intersecting models to the Perspective, which does the culling on a single thread. The advantage here is that you don’t have these expensive memory flushes and CPU pipeline stalls.

You’d only need to sync when gathering/merging the culling results of each Perspective.

Riven · November 3, 2013, 12:30am

[quote=“theagentd,post:40,topic:44753”]
Well, I was trying to solve the problem, instead of kind of figuring out why the JVM trips over bytecode that is clearly incorrect. To me it’s not really interesting to understand each step towards the state where the code failed (regarding how/why the JIT optimizes). At this point it is (with all due respect), “garbage in, garbage out”: the totalTasks field was supposed to be volatile, or you’d have to use the concurrency classes to do the heavy lifting for you.

So… I thought about moving on to working at improving the solution, suggesting more efficient ways to reduce the amount of stalling induced latency and contention

theagentd · November 3, 2013, 1:36am

Why would I need volatile fields in this case? When all tasks finishes the changes will be visible to the following tasks that explicitly required that task since that’s what the concurrent collections guarantee.

I can agree that writing to the same field from multiple threads even without synchronization will have bad performance due to cache flushes, but why are you assuming that this is such a problem in the first place? It scales well and this culling code is not a very big performance hog in the first place.

[quote=“Riven,post:42,topic:44753”]

That’s a bit aggressive, isn’t it? What problem are you trying to solve? Because I did not ask for advice on optimizing my frustum culling, I wanted to know why it works but not my totalTask counter. I AM having the concurrency classes doing the heavy lifting. You seem to know exactly what the problem is (garbage bytecode), so why don’t you enlighten me by telling me (from my limited unenlightened perspective) the compiler is breaking the synchronization that my PriorityBlockingQueue is supposed to guarantee according to the specs. And if the problem is indeed what Spasi said, that the totalTask variable is only read once, how come that that does not happen at all anywhere else. Because if it can happen anywhere, then you are right; my threading code is not thread safe, but more importantly (from how my unenlightened mind is seeing this) the concurrent collection synchronization is not to be trusted since the compiler can break it and ALL fields that at any point is read or written more by more than one thread at any time regardless of the synchronization from concurrent collections have to be volatile. Frankly, I have no use of knowing how to fix the code in my first post, because 1. I’ve already fixed it myself, and 2. The goal is to avoid this issue in the future.

And I’m sorry for going slightly off topic myself now. Anyway, on Riven’s not-so-nicely-conveyed tip I decided to us a bytecode viewer for the first time and check out my .class files. By looking at the run(TaskTree) method, I can see that all synchronization happens inside calls to “invokevirtual” commands. Is it possible that the JIT compiler would not know that a command inside a call to invokevirtual puts a restriction on ordering (like ArrayBlockingQueue.take()), so it would optimize the load instruction to outside the loop? I’m sorry, I’m running out of ideas here…

Riven · November 3, 2013, 2:01am

There are two misconceptions: I didn’t mean to be aggresive in conveying what I did, and I didn’t actually mean that the bytecode was garbaged. The bytecode will be functionally idential to your sourcecode. I meant to say that the ‘input’ to the JIT was wrong, be it the java sourcecode or the derived java bytecode. Don’t expect to find any surprises in the bytecode, javac is very straight forward and barely optimizes anything, let alone promoting fields to localvars.

If you make assumptions outside of the guarantees made outside of the specification of the Java Memory Model, then it can very well work, but it might trip up once in a billion cases. That your unit tests or manual test work as expected is sadly no way to deem code thread-safe.

It is indeed true that you simply must not read & write fields from multiple threads, unless they are volatile or are in a critical section, especially when more than 1 thread is writing, because at that point you actually must resort to reading and writing solely inside critical sections (think: ReentrantReadWriteLock). You basically have to inform the JIT about every field that is accessed concurrently using volatile, to disable its full blown, super aggresive optimisations, like promoting fields to variables, which gets even more complex when inlining is involved and you can’t assume that method boundaries are also boundaries of these promotions. So spell it out for the JIT, however clear you think your sourcecode is, that the JIT should take into account access for specific fields, from multiple threads.

theagentd · November 3, 2013, 2:21am

Riven:

theagentd:

and ALL fields that at any point is read or written more by more than one thread at any time regardless of the synchronization from concurrent collections have to be volatile

It is indeed true that you simply must not read & write fields from multiple threads, unless they are volatile or are in a critical section, especially when more than 1 thread is writing, because at that point you actually must resort to reading and writing solely inside critical sections (think: ReentrantReadWriteLock). You basically have to inform the JIT about every field that is accessed concurrently using volatile, to disable its full blown, super aggresive optimisations, like promoting fields to variables, which gets even more complex when inlining is involved and you can’t assume that method boundaries are also boundaries of these promotions. So spell it out for the JIT, however clear you think your sourcecode is, that the JIT should take into account access for specific fields, from multiple threads.

Just to make this clear: The JIT does not care that there is an explicit memory barrier/fence/whatever-that-synchronization-point-from-take()-is-called is called in the loop when it decides whether or not to move the read out of the loop? What’s the point of doing such optimizations if the result is that I have to make everything volatile to counter it?

Riven · November 3, 2013, 2:28am

The JIT simply adheres to the Java Memory Model, and makes the most aggresive optimisations within those rules. Maybe there guarantees are too narrow for your taste, pushing the burden to the developer, but this is the environment we operate in.

Given the JIT team is only advancing their technology, code that used to work (guaranteed with older JITs) on incorrect assumptions, will break in future JREs.

[quote=“theagentd,post:45,topic:44753”]
That’s an easy one: to speed up single threaded code

theagentd · November 3, 2013, 2:33am

So this line is a blatant lie then?

[quote]Actions in a thread prior to placing an object into any concurrent collection happen-before actions subsequent to the access or removal of that element from the collection in another thread.
[/quote]

This optimization will work in all cases whenever there isn’t any kind of take()-style barriers:



//Object counter example.

private int totalTasks;
private AtomicInteger taskCounter;
private int objects;

//Loop in worker thread:
while(true){
    concurrentQueue.take(...);
    
    for(int i = 0; i < objects.size(); i++){
        objects++;
    }
    //there's a memory barrier, so it needs to be read each iteration.
    //however it can obviously be accumulated in a register and written out once at the end since it doesn't violate any synchronization.
    
    if(taskCounter.incrementAndGet() == totalTasks){
        ...
    }
    //The read for totalTasks must be done each iteration for the same reason as objects.
    
    concurrentQueue.add(...);
}

Riven · November 3, 2013, 2:36am

It’s not so much a blatant lie, it’s a quote without enough context

As the quote says, the memory will be synced/updated for the other thread - you just can’t see it, because you’re not reading back that specific memory location, due to potential field->localvar promotion, so better disable that, by flagging it volatile.

theagentd · November 3, 2013, 2:48am

Why is this code not breaking? http://www.java-gaming.org/?action=pastebin&id=747

Riven · November 3, 2013, 2:50am

Because you’re being lucky. Don’t rely on it to work in the future.

theagentd · November 3, 2013, 2:53am

That doesn’t seem like luck to me. It seems like the perfect opportunity to optimize away the read. What I want to do is reproduce the problem in a controlled environment.

Riven · November 3, 2013, 2:54am

Often the JIT optimizes methods when they are executed N times, and then again much later, more aggresively. As you never leave your methods (N=0), it might hold the JIT optimisations back.

Making a test that simply pulls the loop body outside it, and putting it into its own method, might or might not show field->localvar promotion, as the JIT might already have inlined it back into the loop, or some other condition applied that I’m not aware of, disabling the promotion.

Hence using the word ‘lucky’ earlier. You just have to account for it, because what works today, might break tomorrow.

theagentd · November 3, 2013, 3:35am

Thanks for the hints! Does this test deadlock for you as well? http://www.java-gaming.org/?action=pastebin&id=754

Riven · November 3, 2013, 3:38am

Yes. This is the full output:


Starting.
offering value
offered value, waiting for response
Thread started.

theagentd · November 3, 2013, 3:39am

I think I need some sleep… I’ll be back tomorrow… I can’t be thinking straight anymore.

Riven · November 3, 2013, 3:48am

I can’t explain it either, but using ArrayBlockingQueue(1) instead of using SynchronousQueue got rid of the deadlock, and allowed the threads to pump messages back and forth.

I moved the threading outside of the constructor (that’s tricky stuff!) but to no avail.


		SyncTest2 s = new SyncTest2();
		s.new MyThread().start();
		s.testLoop();

nsigma · November 3, 2013, 11:06am

hmm … I fail to see how localvar promotion in that case wouldn’t break the Java Memory Model. Any localvar caching should be invalidated by crossing the memory barrier or that would appear to be a bug?

It doesn’t surprise me that this short example is working. I still believe the bug in the original code is elsewhere (probably in the signalling of task completion) and you’re chasing the rabbit down the wrong hole!

SynchronousQueue would be more accurately equivalent to ArrayBlockingQueue(0)! ie., it has no storage at all.

The issue is the use of offer() vs put() - non-blocking vs blocking. Check the return value of offer() and it should be false. The offer() in Thread 1 does not block and cannot return successfully unless Thread 2 is already in the take() method, and then Thread 1 blocks on a take() that can never be serviced. Thread 2 is doing the reverse, hence deadlock.

The use of an ArrayBlockingQueue with a single storage slot would allow offer() to work.

Roquen · November 3, 2013, 12:15pm

Remember we’re talking about the compiler seeing the value of a field as being constant across the lifetime of a method and not a main-memory value that’s stored in the cache hierarchy. Because the programmer implicitly states that nobody can change its value behind the thread in questions back. No amount of synchronization or memory barriers will address this issue. The compiler is performing a perfectly legal transformation which doesn’t consult main-memory.

(EDIT: by “No amount of synchronization…” I meant wrapping in a mutex)

Spasi · November 3, 2013, 12:30pm

The question then would be, why does the bug go away when counting backwards? A bug elsewhere (e.g. calling run(TaskTree) before the current batch of tasks has completed) would still be problematic in that case. Making totalTasks volatile would also make no difference. Btw, theagentd, have you actually tried that with the original code?

nsigma · November 3, 2013, 12:40pm

This page

and the linked to bug report and thread discussion
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7170145
http://cs.oswego.edu/pipermail/concurrency-interest/2012-May/009440.html

would suggest this is indeed a bug for the JIT compiler to do this, and contravenes JMM?