OpenAL & hardware acceleration

So I have read some stuff about OpenAL not being hardware accelerated sometimes; and maybe you guys know more about the current situations with OS’, drivers and obviously specifically Java (and LWJGL)

When / Where is OpenAL accelerated ?
Can I check if the OpenAL I’m initializing and using is actually accelerated ?
Does acceleration matter ? Is it noticable ?

In short: What concerns are there, about this ?

OpenAL should be accelerated on Creative cards if you use Creative’s (proprietary) driver. Otherwise, there are software implementations that are equally capable, if not outright superior.

Modern machines, even mobile devices can do sound so easily in software that you don’t really need to care. I doubt it even affects power consumption much. Remember that I was playing mp3 on a 486 DX 50 a lifetime ago. Thats decoding mp3’s. Just playing sound is much cheaper in terms of CPU costs and bandwidth is negligible. Realtime software equalizers are also very easy to do and incur negligible cpu overhead (delay needs to be considered, but is typically minor).

We’ve been using OpenALSoft for years, and indeed, Windows Vista and 7 don’t actually have hardware sound mixing any more.

Cas :slight_smile:

bienator wrote that OpenALSoft works fine with JOAL on Linux, I will check that myself in some weeks.

OpenALSoft has multiple backends, so if any of those backends are hardware accelerated then it will be able to take advantage of that hardware. The backends include DirectSound, ALSA, OSS, Solaris, PortAudio, PulseAudio, etc.

Yeah since OpenAL uses (only ?) DirectSound on Windows, it’s so silly - why is DirectSound not hardware accelerated.
They wanna bring that back with Windows 8; way to go…

Because hardware acceleration caused so many problems for so little benefit. Sound processing is pretty trivial to do on the CPU and uses very little by way of clock cycles, so they replaced the hardware implementation of DirectSound from Vista onwards to only use a software renderer.

OpenALSoft only uses hardware acceleration to do buffering of final output which limits the chances of something going wrong, which is good. We’ve found it extraordinarily reliable.

Cas :slight_smile:

Indeed sound in java without openAL with lwjgl (or jogl) was pretty much a shot in the dark. But soft openAL is fantastic. It just works and the api is straightforward. Hardware acceleration is overrated and totally worthless for sound. You going to load your system just as much just sending the data to the sound card

Can’t wait to general processors are so fast that the same thing happens to video ;). Soft opengl for the win.

No way! Graphics cards are evolving faster than CPUs (I think? :persecutioncomplex:) so I don’t think that will happen for a loooooong time, and you can’t even get close to the performance of a GPU yet with CPUs. I’d rather say that because CPUs are getting more and more cores, they are becoming more and more like GPUs, and soon we won’t have any “CPUs” in our computers, only GPUs. xD

Well a bit OT here… don’t want to derail anything. But GPUs are only faster for a very narrow range of problems. There have been quite a few cases were well written fall back code for use when there is no CUDA available was in fact faster than the core CUDA version–even on a tesla card. Anything that doesn’t map perfectly to the odd memory model and extreme SIMD mode tends get a huge hit in performance. Turns out that a lot of intensive stuff does not fit this model.

And i am thinking future here. 512 cores at 4+GHz and 100Gbytes per sec transfer rates sort of thing. Sooner or later graphics becomes as trivial as sound is today and avoiding the pain of differing HW implementations will no longer be worth the effort.

I had that 486 back in 1995, just 16 years ago that was using 98% CPU for 128kbit mp3’s. There is no sign that another 16 years won’t give the same performance increases for the same price.

Just as I don’t see perfect scaling on a GPU (Radeon cards has so many more stream processors than Geforce cards, yet perform similarly), I don’t see anywhere near 4x scaling on a quad core in real world applications. How is a 512 core CPU different from a 512 core GPU? Sure, the architecture and memory model is different, but they have the same problems, don’t they? Synchronization, e.t.c.
My point with GPUs being faster is that most things that take time are threadable, and if they aren’t you’re possibly doing it wrong. Therefore the “limited” amount of problems that you can apply them on are the things that actually time consuming. GPUs are excellent for solving problems where things can be solved in parallel, like graphics. Each vertex and fragment can be calculated pretty much independently from all other vertices or fragments, so we get pretty good scaling with cores. For other programs, there are so many things that can be done in parallel, but people don’t bother to implement multithreading. It simply isn’t worth the light computation that most programs actually do. Most games today could theoretically use all your CPU cores if they were made for it. I’m not saying 4 cores is 4 times the performance, but you’d still be able to cram more speed out of them.

What we need to realize is that multi-core processors and multi-threading is the future. Nothing can change the fact that replacing a single core with two slightly slower cores is much more energy (and therefore heat) efficient. Doubling the clock rate to match that (theoretical) performance increase is impossible. Just think about the fact that we have 6-core CPUs running at over 3GHz. Can you run a single core processor at 18GHz? Heck, we have graphics cards with 3072 stream processors running at 830MHz. Let’s make a 2.5 terra Hertz stream processor!

PS: These numbers are just for fun, don’t bash the actual performance you’d get from 3072x830MHz vs 1x2549760MHz… xDDD

Multi-core kinda sucks. It’ll probably always be horribly inefficient. What we want is software-configurable parallel architectures and a language that can cope. That is something like transputers. Imagine a grid of 16x16 ARM cores each with their own local memory and DMA as well as access to the “global memory” pool where you could specify in the OS how you wanted to pipe data between them.

Cas :slight_smile:

I was assuming something like transputers with 512 cores. aka something more like the connection machine. We are talking about a decade and a half. No one is going to remember what a quad core even is by then.

I’m not saying it’s good or efficient, just that it’s the future. We need new tools/technologies suited for creating programs that can take advantage of multiple CPUs, or we won’t be able to keep up with Moore’s law much longer.

I don’t think multicore is going to scale that well in the long term. Past, say, 16 cores I suspect that trying to maintain coherence in a unified memory model is going to become increasingly the problem with the concept. What we need is NUMA and DMA and the ability to create pipelines.

Cas :slight_smile:

There has been a lot of theory work on this and things can in fact be pretty distributed with high locality and performance. Mainstream practice is a long way from it right now. But there has never been much need to take notice of the theory since SMP has worked so far. Soon they will take notice and you will see some impressive stuff. Just like we did on the connection machine or transputers for example. Don’t assume that x86 in 2027 will look anything like a x86 in 2011. Graphics in particular will always go parallel pretty well.

There is also just faster single core possibilities for that kind of time frame. But much faster than 4-10GHz and you really need to go asynchronous. Superconductor switches have switching speeds on the order of 10THz. Light can travel just 30 microns in that time. Add better power consumption scaling and/or diamond wafers and you really will see something indistinguishable from magic.

Note that the first multicore machine i programed had 16 cores back in 1996-7? (SGI) . I have four 16 core machines here in the office right now. Our cluster is 500 cores. We are having no problems keeping them all at a 100% and getting pretty close to theoretical performance. We also have a few CUDA power machines with teslas and up. Developing code for them is the bottle neck.

Many core programing is not new, there is just a lot of new people getting introduced to the idea.

I have officially derailed the thread. Sorry OP. May I be spared from the thread lock hammer of the mods.