
The Zing Platform is the first elastic software platform designed specifically for virtualization and cloud deployments, with unsurpassed scalability, efficiency, availability and visibility. Managed by an application-aware runtime resource controller integrated with industry standard virtualization management tools, Zing’s elastic architecture automatically scales individual Java application instances up and down based on real-time demands, and dynamically maps and allocates resources based on configurable policies.
Vega Appliance - Delivering a Breakthrough Java™ Computing Experience »
Java is well established as the dominant programming model for the Enterprise, and has a significant footprint in over 90% of large companies. As it continues to mature, Java™ is increasingly used to support very large business critical applications, sometimes referred to as Extreme Transaction Processing (XTP) applications. This paper discusses how Azul Compute Appliances complement existing infrastructure with highly scalable compute and memory resources for Java-based applications and meet the demands of XTP applications, by delivering extreme throughput and scalability with consistent performance, without changing the application code or architecture.
Vega Appliance - Ultra-high Capacity Building Blocks for Scalable Compute Pools »
This paper discusses how Azul uses the power of application virtual machines and multicore chip technologies to deliver massive amounts of operating system agnostic compute resources to Java™ and J2EE™ virtual machine-based applications. Deployed in compute pools, Azul Compute Appliances can host hundreds of applications simultaneously, providing dynamic access to a shared set of compute resources to meet peak application workloads while eliminating capacity planning at individual application level.
Characteristics of the Vega Appliance - Reliability, Availability, and Servicability (RAS) »
The Azul Compute Appliance is designed with enterprise class reliability, availability, and serviceability features. This paper examines the hardware and software methods employed by the Azul Systems architecture to attain the high level of RAS within the Azul Compute Appliance.
Azul Vega Compute Appliances are high capacity network attached systems used to improve the performance, scalability and TCO of enterprise Java deployments. Applications are offloaded transparently from traditional servers to the appliances where they gain access to massive CPU and memory resources that deliver sustained high throughput and consistently low response times.
Vega 3 Appliance - RealTime Performance Monitor (RTPM) »
Azul Real-Time Performance Monitor (RTPM) is a runtime visibility tool that provides fine-grain application diagnostics down to the individual application thread level. RTPM dramatically reduces application tuning and production-time diagnostics, improving time-to-market and reducing production problem resolution times.
Part 1: The Zing™ Java Virtual Machine: Removing Barriers to Java Application Scalability »
Listen to Gil Tene, Vice President of Technology and CTO, Co-Founder of Azul systems as he discusses the current barriers to Java scalability and how to remove them.
Learn how Azul's groundbreaking technology for Java applications, now available as an elastic software platform, has enabled Fortune 500 companies to achieve orders of magnitude increases in throughput and 50% decreases in TCO with no application changes.
The Zing Java Virtual Machine (Zing JVM) is a fully compliant SDK that installs seamlessly in place of your existing JVM. Unlike your current JVM, the Zing JVM allows existing Java applications to scale to dozens of CPU cores and hundreds of gigabytes of memory elastically based on real-time demands.
This webinar is the first in a series on deploying elastic, scalable Java applications.
Java Virtualization: The New Foundation for Scalable, Elastic Java Deployments »
Optimizing Java deployments to ensure smoothly-scalable performance can be challenging for any size application. But with the proliferation of hypervisors and the advent of large commodity servers (up to 64 cores and 1 terabyte of memory), enterprises are struggling to optimize their Java deployments in virtual environments.
By leveraging the proven technology of Java virtualization form Azul Systems, enterprises can now deploy Java applications under KVM (and other hypervisors) that are unconstrained from the rigidities of traditional operating systems and Java Virtual Machines (JVMs). This ability provides orders of magnitude improved scalability and throughput for any size Java application.
In this session, Gil Tene discussed the power of Azul’s Java Virtualization, the elasticity of Azul’s JVM, and the simplicity of Java deployments under such a solution.
Virtualizing JBoss Enterprise Middleware with Azul »
Virtualization technologies, such as Red Hat’s KVM technology, can provide a number of advantages, including better hardware utilization. However, when it comes to optimizing Java middleware deployments under hypervisors, a new approach dramatically improves the application scalability and elasticity of JBoss Enterprise Middleware deployments.
By combining the virtues of JBoss Enterprise Application Platform with the proven technology of Java virtualization from Azul Systems, enterprises can now deploy JBoss Enterprise Middleware applications under KVM and see better scalability and throughput than that offered by traditional physical deployments.
In this session, Shyam Pillallmarri and Steve Hess will discuss the power and agility of virtualization, the elasticity of Azul’s Java virtual machine, and the simplicity of mission critical JBoss Enterprise Middleware deployments under KVM.
How to Stop Worrying & Start Caching in Java » (coming soon!)
Application data caching has come of age as distributed and large cache clusters are now common. The next generation of applications that depend on efficient caching has come into being and data and cache size explosion has set in.
In this session, Azul Systems’ SriSatish Ambati and Red Hat’s Manik Surtani will survey performance characteristics of different cache algorithms, their implementations (e.g., implementing a 200Gb data cache size), and how well they work in practical JVM deployments. In each scenario, they will present patterns of architecture that scale, and demonstrate where read and write performance stands in the context of increasing cache sizes and concurrency.
Throughout this discussion, they will recognize several villains, including heap fragmentation, long-lived objects, multi-VM communication, socket handlers, and queue managers. SriSatish and Manik will take a fun-filled “whodunit” approach to portray the roles played by each villain in killing cache performance.
Alternative Languages on the JVM™ Machine »
There are several languages that target bytecodes and the JVM™ machine as their new "assembler," including Scala, Clojure, Jython, JRuby, the JavaScript™ programming language/Rhino, and JPC. This session takes a quick look at how well these languages sit on a JVM machine, what their performance is, where it goes, and why.
This Is Not Your Father's Von Neumann Machine; How Modern Architecture Impacts Your Java™ Apps »
Managing software performance used to be a relatively straightforward process. Uniprocessors were the norm, the number of cycles each instruction took to execute was known, and it was mostly a matter of measuring how many instructions you were executing per unit of work -- and then reducing that number. The world has changed: The cost of individual instructions varies by several orders of magnitude, depending on how close the data is to the CPU, and improvements in throughput depend on effective use of parallelism. But to design and analyze performant programs, we have to understand something about the underlying hardware and how that has changed in recent years. This session provides an overview of the architecture of modern CPUs, how this has changed in recent years, and what the implications are for software development and performance management.
The Art of (Java™ Technology) Benchmarking »
People write toy Java technology benchmarks all the time. Nearly always they "get it wrong" -- wrong in the sense that the code they write doesn't measure what they think it does. Oh, it measures something all right -- just not what they want. This session presents some common Java technology benchmarking pitfalls, demonstrating pieces of real, bad (and usually really bad) benchmarks, such as the following: SpecJVM98 209_db isn't a DB test; it's a bad string-sort test and indirectly a measure of the size of your TLBs and caches. SpecJAppServer2004 is a test of your DB and network speed, not your JVM™ machine. SpecJBB2000 isn't a middleware test; it's a perfect young-gen-only garbage collection test.
The session is for any programmer who has tried to benchmark anything. It provides specific advice on how to benchmark, stumbling blocks to look out for, and real-world examples of how well-known benchmarks fail to actually measure what they intended to measure.
Performance Considerations in Concurrent Garbage-Collected Systems »
The presentation discusses and explains relevant GC terminology and phrases common in concurrent and mostly concurrent GC, focusing on their effects and relationship to metrics such as heap size, real and effective live set size, and object allocation rates. These include concurrent and mostly concurrent marking, live set and card marking, generational operation, and compaction.
The Fragger tool is a heap fragmentation inducer, meant to induce compaction of the heap on a regular basis using a limited amount of CPU and memory resources.
Fragger’s purpose is to aid application testers in inducing inevitable-but-rare garbage collection events, such that they would occur on a regular and more frequent and reliable basis. Doing so allows the characterization of system behavior, such as response time envelope, within practical test cycle times.
Challenges and Directions in Java Virtual Machines »
Available core counts are going up, up, up! Intel is shipping quad-core chips; Sun’s Rock has (effectively) 64 CPUs and Azul’s hardware nearly a thousand cores. How do we use all those cores effectively? The JVM™ machine proper can directly make use of a small number of cores (JIT compilation, profiling), and garbage collection can use about 20 percent more cores than the application is using to make garbage--but this hardly gets us to four cores. Application servers and transactional—Java™ 2 Platform, Enterprise Edition (J2EE™ platform)/bean--applications scale well with thread pools to about 40 or 60 CPUs, and then internal locking starts to limit scaling. Unless your application, such as a data mining; risk analysis; or, heaven forbid, Fortran-style weather-prediction application has embarrassingly parallel data, how can you use more CPUs to get more performance? How do you debug the million-line concurrent program?
“Locking” paradigms (lock ranking, visual inspection) appear to be nearing the limits of program sizes that are understandable and maintainable. “Transactions,” the hot new academic solution to concurrent-programming woes, has its own unsolved issues (open nesting, “wait,” livelock, significant slowdowns without contention). Neither locks nor transactions provide compiler support for keeping the correct variables guarded by the correct synchronization, such as atomic sets. Application-specific programming, such as stream programming or graphics, is, well, application-specific. Tools (debuggers, static analyzers, profilers) and libraries (JDK™ software concurrent utilities) are necessary but not sufficient. Where is the general-purpose concurrent programming model? This session’s speaker claims that we need another revolution in thinking about programs.
Experiences With Debugging Data Races »
With multicore systems becoming the norm, every programmer is being forced to deal with multi-CPU memory atomicity bugs: data races. Data race bugs are some of the hardest bugs to find and fix, sometimes taking weeks on end to deal with, even for experts. There are very few tools to help here (and these are mostly just academic implementations, FindBugs being a rare exception). This session’s speakers are at the forefront of multicore Java™ technology-based systems and have to debug data races daily. They have a lot of hard-won experiences with finding and fixing such bugs, and they share them with you in this session.
Towards a Scalable Non-Blocking Coding Style »
Nonblocking (NB) algorithms are something of a Holy Grail of concurrent programming--typically very fast, even under heavy load, they come with hard guarantees about forward progress. The downside is that they are very hard to get right. This session’s speakers have been working on writing some nonblocking utilities over the last year (open sourced on SourceForge in the high-scale-lib project) and have made some progress toward a coding style that can be used to build a variety of NB data structures: hash tables, sets, work queues, and bit vectors. These data structures scale much better than even the concurrent JDK™ software utilities while providing the same correctness guarantees. They usually have similar overheads at the low end while scaling incredibly well on high-end hardware.
The coding style is still very immature but shows clear promise. It stems from a handful of basic premises: You don’t hide payload during updates; any thread can complete (or ignore) any in-progress update; use flat arrays for quick access and broadest-possible striping; and use parallel, concurrent, incremental array copy. At the core is a simple state-machine description of the update logic.
Lock-Free Wait-Free Hash Table »
This session presents a totally lock-free hashtable with extremely low-cost and near perfect scaling. Readers pay no more than HashMap readers: just the cost of computing the hash, loading and comparing the key, and returning the value. Writers must use AtomicUpdate instead of a simple assignment but otherwise pay the same as readers. In particular, there is no required order between loads and stores; correctness is assured, no matter how the hardware orders memory operations.
A state-based technique demonstrates the correctness of the algorithm. This novel approach is very straightforward and much easier to understand than the usual “happens-before” memory-order-based reasoning.
Experiences with Debugging Data Races »
With multicore systems becoming the norm, every programmer is being forced to deal with multi-CPU memory atomicity bugs: data races. Data-race bugs are some of the hardest bugs to find and fix, sometimes taking weeks on end, even for experts. There are very few tools to help here (mostly just academic implementations). The speakers are at the forefront of multicore Java technology-based systems and daily have to debug data races. They have a lot of hard-won experiences with finding and fixing such bugs, and they share them with you in this BOF session.
As java programs operate, they continually allocate objects of varying sizes in the heap in order to perform useful work. As some objects age and die, empty “dead” spaces appear in the heap at a rate that matches the allocation rate, and these spaces must be reclaimed and reused for use by newly allocated objects in order to sustain program execution.
Garbage collection, in all forms, deals with locating, reclaiming, and reusing these empty spaces. This reuse has two main possible forms:
While the use of in-place-reuse can be effective for delaying compaction, the heap gets continually fragmented as objects of varying sizes come and go. Eventually, there will come a time where many small empty spaces are available, but a single object that is larger than each of the individual spaces cannot be allocated without de-fragmenting and compacting the heap.
Fragmentation, and the resulting need for compaction, is inevitable. This is best evidenced by the fact that every commercial garbage collector implementation in the enterprise java world currently includes significant amounts of code that performs Heap Compaction.
Compaction and Pauses
Compaction can be a problematic operation for many Java Virtual Machines. Compaction requires live objects to be moved form one location to another in memory, and as a result all references to those objects must be tracked down and safely remapped such that they point to the object’s new location.
If even one object in the heap is moved, many references may need to be remapped. More importantly, each and every reference in memory must be correctly checked for potential remapping, and the remapping need must be handled safely before the program is allowed to continue executing.
With the exception of the Zing VM, virtually all commercial J2SE implementations available today perform this necessary compaction and remapping step with the program paused. Azul’s garbage collector is unique in the enterprise Java market in its ability to relocate objects and safely remap all references to them while the application execution is ongoing.
Compaction, Response Time, and Practical Heap Size
When a JVM pauses for compaction, practical Heap Size becomes directly limited.
Since pausing an application for the duration of a compaction step is highly disruptive, to the point of apparent failure, compaction tends to drive the upper-bound on the amount of practical heap that an application can utilize.
The amount of time spent in compaction tends to be linear to the size of the heap. A larger heap means longer compaction times, and on a JVM that pauses for compaction, that means longer pause times. Since most applications have some basic level of worst-case response time requirement, and these requirements coupled with inevitable compaction pauses end up dictating a limit on practical heap size.
Azul’s ability to concurrently compact the heap, and to allow the application to continue to execute while remapping is performed, allows applications to completely separate heap size from response time requirements. Applications that leverage the Zing VM can practically make use of 10s or 100s of GB of memory without encountering compaction related pauses, simultaneously sustaining scale and consistent response times.
Fragger is a heap fragmentation inducer, meant to induce compaction of the heap on a regular basis using a limited amount of CPU and memory resources.
The Fragger tool is to aid application testers in inducing inevitable-but-rare garbage collection events, such that they would occur on a regular and more frequent and reliable basis. Doing so allows the characterization of system behavior, such as response time envelope, within practical test cycle times.
Fragger works on the simple basis of repeatedly generating large sets of objects of a given size, pruning each set down to a much smaller remaining live set, and increasing the object size between passes such that is becomes unlikely to fit in the areas freed up by objects released in a previous pass without some amount of compaction. Fragger ages object sets before pruning them down in order to bypass potential artificial early compaction by young generation collectors.
By the time enough passes are done such that the total allocated space roughly matches the heap size (although a much smaller percentage is actually alive), some level of compaction likely becomes inevitable.
Fragger's resource consumption is completely tunable, it will throttle itself to a tunable rate of allocation, and limit it's heap footprint to configurable level.
When run with default settings, Fragger will occupy ~25% of the total heap space, and allocate objects at a rate of 20MB/sec. At these settings compaction will usually occur within 2 minutes per GB of heap. Altering the target allocation rate, as well as the heap occupancy ratio and with the number of passes in a compaction-inducing iteration, will change the frequency with which compactions occur, and the CPU percentage consumed by Fragger.