btw, can’t you get that information quite easily on Linux without JNI by just reading and parsing /proc/cpuinfo ? Quick Google found this which includes a getNumberOfCores() method.
Well nobody is forcing anyone to program in an OOP style in performance critical code. Is the extra effort worthwhile is too big a topic to cover in any generalized case as well as being programmer specific. HT will do well in covering up some of these hiccups. Also HotSpot does in some situations insert prefetch ops…I’ve never looked at this point however.
“branch prediction failures”: hum…might be wrong here…it’s been awhile since I’ve thought hard about the micro-architectures…but branch misprediction doesn’t free any resources for the HT if I have my brain screwed on correctly. Trying to fit the details of speculative execution, reordering buffers, register renaming, yada-yada-yada into the brain at once for a extended time is hard.
If having HT enable slows down performance: Multithreading: You’re doing it completely wrong.
Useless to an individual. I personally have no interest in topology queries as I’m not interested in multi-socket hardware in Java…someone else might be very interested in having this information. I’d very much like to know as much as possible about the specific single-socket CPU however…like is population count available. Probably very few people want to know that…I do. The same for a fair number opcodes which HotSpot has intrinsics for.
Assuming you’re not talking about not having any expectations about 6 execution port hardware: If I know that I have N full core vs. N/2 full & N/2 virtual, I’d be likely to choose a different number of worker threads. And in the HT case I’d try pinning some threads to a virtual core. Experimentation based on what has worked in the past combined with what I was seeing in some real world use-case at the time in question.
Let me babble a moment about some hypothetical situation. You have a HT machine, we’re modifying the code that it is running on a full core…how does it (on average) impact both the associated HT core and the other full/HT cores in the system. Of course this is way over simplified…such is writing MT code.
Make an optimization that lowers the computational cost. If we eliminate a waiting-for-result stall, the HT has less resources and might slow down. If not and the “mix” of operations is similar (hitting more or less the same execution ports) then the HT is uneffected. Since our full core is completing computations faster it might be accessing memory faster which might slow down anyone else that also want access to some shared resource (like memory).
Make an optimization that lowers the number of stalls. The HT has less resources and probably slows down. If it’s a shared resource stall then other cores might speed up if they were in competition.
In some nearly impossible idealized case some optimized code running in a full core leaves virtually no execution units for the associated HT to use and at the same time frees up resources for other cores to allow them to reach peak performance. So my point is HT cores do not behave the same as full cores.
The HT case where I was pinning was: most threads were more-or-less computation bound, a smaller number of threads were more-or-less memory bound (in comparison). By pinning the memory bound to HT cores I got better performance because memory stall where being better hidden on average since they were often waiting for a free execution ports to hit. Of course this is totally use-case specific. All MT optimization are.
Sure, this is possible, thanks! I somewhat always avoid using file I/O in a simple library, though I would prefer a solution with a system function call.
However, @Spasi’s solution with hwloc is really great, we should wait for that.
Or we can replace 5 lines of Java with a complex JNI interface to read those files instead! Personally I prefer the way Linux presents this information.
Absolutely agree with you, since that provides far more information, and I’m generally in favour of libraries that -
- Work cross-platform
- Work on the basis of a Somebody Else’s Problem field
IMO if you have no multi-threaded experience and want to multi-thread a game. Take your base-line system (a system you actually possess) and say this is the only machine I care about. Don’t try to get fancy and scale stuff to more powerful machine. Then painfully fumble your way to a system that works.
I have written multi-threaded games before though I was splitting the game by very large chunks. Game play in one thread, initialisation + web interaction etc in the other thread. I was reasonably happy with the results though felt I should of added the sound to another thread as well. My problem is many people who play my sort of games are playing on home built arcade machines with older PC hardware. I suppose if their processor only has one core then I can just accept that it’s too old. If it has HT then I don’t really have any issues as the processor is likely to be fast enough to run the game on one core.
My current aim is to do something like this:
- Thread one: game play
- Thread two: initialisation, web interaction (high scores)
- Thread three: simulate the sea
- Thread four: sound
Thread two is very light and mostly is just waiting for stuff to happen. Should I mix in threads 3 and 4 into thread 2? What happens if I have 4 threads running on an old dual core processor? will it be swapping tasks in and out continually or would it all be cached in the processor? I guess I can just try it each possibility but obviously that takes time.
Mike
PS my dev PC (at home) is probably more powerful than most folks so I might need to scale down
Never mix actual sound processing in a thread with anything else. But that depends what you mean by “sound”. If you just mean sending events to a sound library, then that’s probably fine as the library should have a dedicated thread for driving the audio. OTOH, if you’re using a blocking API such as the lower-level bits of JavaSound, then always use a dedicated thread.
Threads in my game code tend to be distributed thusly:
- Main thread
- Sound streaming
- Asynchronous communications (hiscores server, etc)
4*. A small fixed pool of threads (1 per virtual core) for transforming and writing sprite data to buffers and animating emitters and particles
That’s as clever as I like to get. You probably don’t realise it but the sound API you’re using already does fancy threading in the background and so indeed does the GPU. With garbage collection and background compilation from Hotspot going on you’re probably making more use of threads than you realise.
4* is really an optional thing that I’ve done to split a very few parallelizable tasks into equal sized chunks to operate on all cores. For example I needed to write out 20,000 sprites’ worth of data per frame, which also involves transforming their vertices on the CPU: it’s easily split into 4 identically sized chunks that will all execute in roughly the same time period. Likewise particles all behave in pretty much the same way and so they’re easily split into equal sized chunks.
After all the chunks have finished executing there is usually a “gathering” phase in the main thread that performs some operation on the resulting dataset. For example, the particle animation system - after all particles are processed, the gathering phase scans all the particles that were processed and compacts them, returning dead particles back to the particle pool.
Just thought I’d throw in 2p of confusing information
Cas
So if I put my sounds into a separate thread, what’s the best way of kicking the sounds off? Do I:
- just call the sound play method from the other thread - I assume this doesn’t offer any advantage in having a separate thread.
- set a variable that says what sound to play and the other thread polls this variable - obviously only plays one sound at a time.
- would I need to set priorities on sounds so I don’t get a little click overriding a large explosion.
- would I need to queue the sounds to be played and then pull them when ready?
- some other way?
Also, if I have a relatively slow processor like a Celeron 2955U laptop processor (2 cores, no hyperthreading), should I still have threads 1,2 & 3 in Cas’s post + another thread to simulate the sea = 4 threads? I should point out I’m not using any fancy libraries for graphics etc.
You don’t put your sounds in a separate thread… (though you might put music in a separate thread, if you’re having to stream and decode it from an OGG or MP3 stream). Sound is otherwise already threaded behind the scenes, and you don’t need to worry about it. Just tell a sound effect to play and it’ll take care of itself, returning from the call instantly.
You can play a realistically unlimited number of simultaneous sound effects on bog standard hardware these days and again, the sound mixer thread takes care of it all for you behind the scene. Just play whatever you like, it’ll all be done for you. You might find that you come up against a limit for sound channels - like 64 or something - in which case you might want a sound priority system but that’s for you to implement and to be honest I’d be very surprised if you ever needed it.
Now might be the time to use fancy libraries for graphics Or even not so fancy… you might want to investigate using JavaFX from now on.
You will still need #1 and #3 (#1 obviously, and #3 unless you want your entire game and UI to freeze solid whilst you attempt network communications or any particularly long disk I/O in the middle of the game).
Cas
ish … that depends on what library and what API you’re using. Lower-level bits, including of JavaSound, are blocking (although there’s usually some other threading stuff going on natively). All depends what exactly @mike_bike_kite is actually using? A third-party library?
Do you really need every bit of CPU power for your game? Most hobby projects won’t, and I’m a bit puzzled by the effort you take to really utilize every bit of CPU power …
Also, laptop owners will hate your game for sucking the battery empty in no time, if you really push all cores to their limits.
Aw come on, nobody actually uses laptops like that. They spent 99% of their time tethered to the wall.
FWIW I use every single ounce I can squeeze out of my desktop for some of my latest stuff. The real issue is ensuring that you’re going to scale down to the sorts of machines your users are using. Multithreading is increasingly important for this as we’re seeing more and more machines with a minimum of 2 cores and plenty with more.
Cas
True. With me it’s even 100% :D. I’ve never ever in the last 5 years that I own my current laptop now, used it without AC.
For me, my laptop is a portable workstation replacement, that has AC all the time wherever I take it to.
I don’t even know whether the battery will work.
I run my laptop battery to death every night.
I’d suggest you were a very rare minority…
Cas
[quote=“Varkas,post:32,topic:55527”]
My game runs fine on a single core of my main home dev machine (an i5-3570k) but the game is sluggish on a slower processor, like the Celeron 2955U and the sound also disappear. I could get the game to run on the slower processor by using threads and using more than one core. It’s not just for that laptop, it’s for any PC with a slower processor but more than one core. I suspect there are far more people with slower processors than fast ones so I’d be foolish limiting my game to just those with powerful processors. Many modern laptops have moderately cheap 4 core bay trail processors in them so I’d be shooting myself in the foot only using a 1/4 of the processing available. Spreading the load between the processors is also likely to be more energy efficient than over taxing just one core.
Do I need all that power? the game itself is quite basic with aliens and particles flying about. There’s a little bit of AI to make the aliens harder to beat. The sound does appear to need a separate thread, at least the way it’s currently implemented. I also have a 2D sea underneath the player that’s very central to the game. It’s actually quite hard simulating a sea with waves and splashes and a separate thread for this appears necessary. A final thread handling IO to the web will just ensure that the game won’t hang while it writes your high score to the database.
PS I just checked and the game is using 20% of one core on my i5 and it’s obviously using 100% of the Celeron.
Which game is this again?
Cas
OK - I’ve created a work in progress thread where it shows a simple picture and provides a download to play here.
Hmm exactly how have you managed to make that so slow?
Stub + native Method
58.5% 0 + 1464 sun.java2d.loops.MaskBlit.MaskBlit
16.4% 0 + 411 sun.java2d.pipe.ShapeSpanIterator.nextSpan
8.2% 0 + 204 sun.java2d.loops.FillRect.FillRect
6.0% 42 + 108 java.security.AccessController.doPrivileged
1.9% 11 + 36 sun.awt.image.BufImgSurfaceData.initRaster
1.7% 0 + 43 sun.java2d.pipe.ShapeSpanIterator.addSegment
0.8% 0 + 21 sun.java2d.loops.DrawLine.DrawLine
What’s all this then?
Cas