I’ve been busy at work, but it’s been awhile so I’ve got a collection of odds-n-ends I’ve been meaning to write up:
Bandwidth: 70Gb/sec
A 768-way Azul (7280 box) gets 70Gb/sec of throughput as measured with a multi-threaded pure Java version of the infamous STREAM benchmark. Not official (since it’s neither C nor FORTRAN!), so don’t quote me on this but it does put our gear in the top-20 highest-bandwidth supercomputers.
NonBlockingHashMap News
NonBlockingHashMap has been up on SourceForge for awhile now. I finally got around to fixing a problem with racing writers & racing copy-threads-on-resize and a racing witness reader thread being able to see a value flip/flop more times than there are writers (smallest test case takes 5 racing threads, 4 of which are writing!). The problem was found last year via inspection and confirmed with a model checker. I’ve long had a fix figured out – use a simpler State Machine during the table copy – but just now finally got it implemented.
Gene Novark has agreed to model check the old version with SPIN (he’s checking the implementation not the algorithm and found a few implementation bugs; should be fixed already via the rewrite). I’m hoping he’ll also do the new version! In any case, he’s a smart guy and getting closer to graduation.
NonBlockingHashMapLong (a primitive-long-key’d NBHM for space-conscious users) is also getting more use and bug fixes. I need to rewrite it to use the new State Machine; as part of that rewrite I’ll be able to fold in a classic space/time tradeoff flag. The space-optimized version will use less (amortized) space than the normal HashMap version by a fair amount. There’s also a NonBlockingSetInt which is essentially a classic auto-resizing bit-vector that’s non-blocking during the resize.
JavaOne, Talks, Abstracts
JavaOne submission deadline has come and gone. So has MSPC, CGO tutorials, DaCapo and goodness knows what else. I gave a talk at an internal Intel Dynamic Execution Environment forum, a Big Picture talk (“Challenges and Directions in the Multi-Core Era”) on why concurrent programming is so hard. It was very well recieved (it helps that the audience got going early asking questions; I got connected with the audience and it mades for a much better presentation). I’ll try to get the slides up on the blog a little later. I submitted the these abstract to JavaOne:
- “Challenges and Directions in the Multi-Core Era” – why concurrent programming is hard.
- “Towards a Coding Style for Non-Blocking Algorithms” – guess that that one’s about!
- “Debugging Data Races” – same as last year, but the material is obviously still timely since I’m still debugging customer apps w/race bugs
- “An Always-On Real Time Profiling and Monitoring Tool” – Azul’s version of an in-production profiling & monitoring tool
And this gem to MSPC:
- “I Wanna Bit!” – a (really!) short paper on a magic bit I’d like from my hardware which will dramatically improve my ability to write non-blocking algorithms. The bit isn’t all that magical and requires really minimal hardware (and yes I DO talk to the Azul hardware engineers!) – it’s basically an “atomic-read-set bit” that allows your L1 cache to be treated as a large atomic-read-set, along with the typical single-word atomic-write-set (e.g., CAS).
Quick audience poll – any one of those tweak your curiosity?
Other Stuff
- Playing with Google index bench
- Playing with several customer apps, some scaling to ridiculous levels
- Writing a pure Java program to “Root” a JVM (already filed the bug report, no you can’t have the code yet)
- Compiler optimizations in Azul’s JVM to reduce GC pause-time


