Thread Safety (for my RTS game)

firepowerjohan · February 1, 2009, 10:39pm

Hi
Been a while since I posted, been busy programning my new RTS game or in fact it is rather a TBS/RTS hybrid.

Anyway, since it uses a peer2peer system (will later convert it to a combination of client server and peer2peer with a web server) I am dealing with thread safety. As of now, I have not had any problems but I only tried it 3 players so far, if game will have 10-20 players problems might show up that I do not yet have realised. So, I am going to check with others here if my method is correct or if I already need to design something better.

Now, here is how the HOST player works.

It has an orderqueue, basically a class basically like a vector but it decodes the strings you have put into the vector to something useful for the game i.e. an action that is to be performed. A “a 129 133” for example could be decoded as to attack from hex 129 to hex 133 or in code be using like game.resolvebattle(129, 133, blablabla some parameters needed here). So the orderqueue is used by game and the game uses it like a queue so it removes the first element and performs that action (for example might be someone that says "I move my Militia from 30, 30 to 30, 35).

Any requested order in game results to sending into the HOST orderqueue. So there is one queue per game session with say up to 16 players and it is positioned in the host player game. An order means for example move unit or attack or purchase a unit or deploy unit or declare war, …, etc…

Now, the threads that are used for the socket connections (one socket per client by the way) gets incoming strings and put them into the orderqueue.

I heard that Vector is thread safe but that it is also not 100% thread safe, Meaning, if I do orderqueue.get(0) and say that there is only one object in it and someone else does orderqueue.remove(0) at the same time if I am unlucky the remove is done first and the get results in exception.

BUT, the fine thing is all the incoming traffic result in the add method and only the one thread on the host game will remove anything from queue. This means, there are several thread adding into the queue but only ONE thread running that will ever remove anything from the queue, so is this combination thread safe?

Jono · February 2, 2009, 3:02am

Each of Vector’s methods are synchronized, so as you describe the situation there should be no problem. Some things to keep in mind though…
When you’re adding to the queue, make sure (assuming you are adding an Order called “order” to a Vector called “queue”) you use

queue.addElement(order);

and not

queue.add(order,queue.size());

When you’re removing from the queue, just something like

if(!queue.isEmpty()){
  order = queue.firstElement();
  queue.removeElementAt(0);
}

will be fine, since you only have one thread doing the removing. Also, make sure you’re never iterating over the Vector, because its size could change part way through giving an out of bounds exception.

ORFJackal · February 2, 2009, 7:51am

You are using the Vector like a queue, so it would be better to use a thread-safe implementation of java.util.Queue or java.util.concurrent.BlockingQueue. Vector is not meant for being used as a FIFO queue, because adding/removing the first element is an O(N) operation (only adding/removing the last element is an O(1) operation).

Roquen · February 2, 2009, 8:47am

If I understand your post, your talking about a “single-producer/single consumer” model (only one thread performs puts and one thread performs gets). If that is the case you can use a “wait-free” method (no locks or atomic operations).

Here some example code using a circular buffer:


public final class CQueue<T>
{
  /** The actual array. */
  private final T[] array;

  /** */
  private final int mask;

  /** Current read position */
  private volatile int rPos;

  /** Current write position */
  private volatile int wPos;

  /**
   *  Creates a queue that can hold up to 2<sup>n</sup> elements.
   */
  @SuppressWarnings("unchecked")
  public CQueue(int n)
  {
    int size = (1<<n);

    mask  = (n-1);
    array = (T[])(new Object[size]);
    rPos  = 0;
    wPos  = 0;
  }

  /** Get an element. Returns <i>null</i> if empty. */
  public T get()
  {
    if (rPos != wPos) {
      T t = array[rPos];

      rPos = (rPos + 1) & mask;

      return t;
    }

    return null;
  }

  /**
   * Adds an element to the queue. Returns <i>true</i> if successful.
   * <p>
   * Inserting <code>null</code> would be a very bad idea as the
   * consumer will think the queue is empty.  This however doesn't
   * break the queue.
   */
  public boolean put(T t)
  {
    int next = (wPos + 1) & mask;
    
    if (next != rPos) {
      array[wPos] = t; // place element before index update
      wPos        = next;

      return true;
    }

    return false;
  }

Mr_Light · February 2, 2009, 9:17am

Yeah like Jackal saids using a Queue makes more sense then reinventing the wheel, moreover BlockingQueue even gives you the bonus of not having to implement a wait mechanism to not waist cpu cycles.(as it blocks.) Make sure the queue is visible to the other threads though.

hmm apparantly Roquen was posted while I was typing:
http://java.sun.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html

[quote]ConcurrentLinkedQueue:
This implementation employs an efficient “wait-free” algorithm based on one described in Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms by Maged M. Michael and Michael L. Scott.
[/quote]
If you wanted a wait free one.

Riven · February 2, 2009, 9:26am

The code provided by Roquen is not threadsafe at all!

Not even for 1 thread reading and one thread writing. Both threads can have the fields cached. At least use the volatile keyword, but the CQueue impl has more disadvantages, like overwriting the oldest elements when the queue is full.

Better use the thouroughly tested classes provided in the JRE.

Roquen · February 2, 2009, 10:27am

General notes:

The reason I suggested a circular-buffer based method is because CAS ops can take 100s to 1000s of cycles on modern processors, so it is nice to not used them unless needed. Single Producer/Single Consumer can avoid the heavy overhead of atomic operations and locks. No races, dead-locks or any other headaches of concurrent programming. Also there is nothing faster unless the hardware has specialized support. (Okay, not quite true…but you need to go to cache oblivious or conscious)

@Mr_Light:
This paper look interesting (haven’t tried):

@Riven
I typed off the top of my head and yes I should have stuck volatile on rPos and wPos (bad me). But re-read the code, it does not overwrite any unread element. The only trick is dealing with a full-queue (via block the producer), but the whole point is to provide a sufficiently large queue for this not to occur.

Riven · February 2, 2009, 10:42am

Even now, the references in the T[] array can be cached. Adding a volatile to the field obviously doesn’t work, as it only ensures the reference to the array is not shared.

So… either use locks or AtomicReferenceArray, otherwise it’s bound to corrupt your data, sooner or later, depending on the amount of cores in your CPU, and the amount of CPUs in your system.

Roquen · February 2, 2009, 12:19pm

The problem with not having volatile on the two counters is that the runtime can inline the routines and promote the counters to registers and break sync. This is not possible with the array accesses.

To get the behaviour your describing would require a multi-processor system with an incoherent cache memory system. These are pretty much only low-cost embedded systems.

Mr_Light · February 2, 2009, 12:30pm

I’m sure the jvm uses the assembly instruction and considering how often that gets called

The abstract looks interesting enough - I’ll read the rest tonight. I like optimistic approaches in general Usually if you align the rest of the system stuff can run at near optimal, unsafe, speeds.

But to bring it back to core - unless firepowerjohan’s profiler said that CAS is a bottleneck of any sorts - we need not worry. Which is why I proposed blockingqueue as it gives you additional stuff for free. In retrospect a simple if(poll()==null){ sleep some } might just as well suffice. I wouldn’t mind a hefty discussion about theoretical speeds of queues to satisfy the geek in me though I’m not sure that we want to behave like a bunch of kernel hackers. The culture and temperateness around these forums is far too pleasant to drag it down like that.

firepowerjohan · February 2, 2009, 4:05pm

Thanks for all the replies, I did not expect this many replies seems it is interesting topic

I am not using single-producer/single consumer since there are many threads that are adding into the vector but only one that is removing from vector. This belongs to the NETWORKING forum in fact so that is why I did not bring up the networking details yet. Why I have many producers is because host has one socket per player so one thread per player in fact. Client only have one socket since all its communication goes to server. Server broadcast to the other clients.

So in a 4 player game the HOST has 3 threads with one socket each picking up messages from that player, using streams (objectoutputstream, objectinputstream). I am not sure, some say I can just use one single socket for all communication and let all players send on it. Would it work like a queue so that no problem occur if 2 players send a message simultaneously I am not sure.

Why I chose vector was, at one time I was thinking the consumer should be able to search through the orderqueue for a specific instruction but so far I have not needed such functionality.

Roquen · February 3, 2009, 2:53pm

In that case I’d suggest using a ConcurrentLinkedQueue. It scales pretty well from low to pretty high thread contention.

@Mr_Light:
I wasn’t trying to start a debate on design & usage of concurrent data-structures. My attempted point was that they should be avoided like the plague when one can. And if you must and happen to have single producer/consumer, then you can still avoid the headaches.

(entering super geek mode, you most likely want to ignore the rest)

the cycle counts I gave are for the hardware instruction itself, which on Intel-a-likes lock down the memory controller for the duration of the instruction. So all processors are shut-out from reading main memory for the duration, OUCH!