Virtualized systems development is a development methodology where the actual hardware of a system is augmented by a virtual platform, a simulation model of the hardware running on a workstation or PC. The virtual platform can run the same binary software as the physical hardware, fast enough to be used as alternative to physical hardware for software development. It also enables new ways of working and adds some useful tricks to the developer’s set of tools.
The hardware shift to multicore processors and multiprocessor systems calls for new software and systems development tools and methodologies to help developers transform their code into parallel applications and have systems take full advantage of the performance, power, and packaging advantages offered by multicore processors systems. Thus, the challenge confronting system architects and software developers today is to get software to be efficiently and safely parallel.
There are three main problems:
- Ensuring existing software keeps working (without taking advantage of multicore)
- Parallelizing existing software to get the performance and power consumption benefits of multicore parallel execution
- Creating new software that is parallel from the beginning
Debugging Parallel Software and Systems
The most obvious benefit of a virtual platform is that it provides superior debug and analysis features compared to physical hardware. Anyone who has ever developed code for an embedded board will appreciate the convenience of a virtual environment. You get a system that is not flaky due to hardware glitches, that offers better control over the target, faster communication, and conveniences like unlimited numbers of breakpoints. If the target freezes completely, you can stop it and check what happened. You can change system parameters like core counts, clock speeds and memory size, as well as network setups, with complete freedom and ease.
A key benefit of a virtual platform is repeatability and reversibility, as illustrated in Figure 1. The key problem in finding and fixing software bugs in parallel software is the lack of determinism in the execution of the software system. Every run of a program will exhibit a different order of events in the program, and even very small changes to the system state or timing result in very different program execution (as illustrated in Figure 1). This complicates debugging greatly, as the very act of debugging a parallel program will make timing-sensitive bugs such as race conditions disappear or appear in a different place.
Note that repeatability does not mean that the behavior of a software program is constant. It just means that when running the same software from the same initial state with the exact same sequence of asynchronous inputs, the same execution sequence is seen. Variation can be programmed into a virtual platform to exercise the software, or made to happen by running the same application repeatedly or for a long time on the virtual platform.
Reversibility means that you can debug a program by running it backwards, observing the sequence of events that led up to some error. Even if a program is crashed or deadlocked, it is possible to back out of the error state and see what the system did on its way there. Breakpoints can be set on the previous point some variable or memory location changed its value.
Another benefit of a virtual platform for multicore debugging is that the simulator can stop the execution of the entire system at any point in time. This means that it is possible to single-step code where processors communicate with each other without changing the behavior of the code, and that code running on other processors cannot swamp a stopped processor with data to process.
Virtual hardware also offers insight, in that any part of the system can be observed. Combined with OS awareness, the virtual hardware can show the processes running on the hardware, and debug a single process anywhere on the multicore machine. Breakpoints can be set on any kind of activity, including device accesses, exceptions in processor, and processor-to-processor communication in hardware.
System Execution Insight
A virtual platform can inspect and trace the execution of a software stack with no probe effect. Not having to instrument the program code or run a heavy-duty profiler on the target means that just like as with debugging using a virtual platform, we have no probe effects from profiling and tracing. Thus, if a software load is behaving strangely, we can apply reverse execution to get back to where it started, and profile or trace the exact execution that was causing problems. Another advantage of a virtual platform is that we have perfect synchronization between the cores, and all traces can be time-stamped without worrying about hardware jitter or clocks being out of synch.
Tracing can be applied at a number of levels, from hardware-level tracing of memory operations to determine data accesses and possible races, to profiling at the operating-system thread level, to profiling the execution of a multiple-board multi-processor distributed system to determine load balance.
Figure 2 shows an example of load balance investigation. We are comparing two different ways to parallelize a packet-processing application running on a quad-core platform. In program under test, there is one thread that starts off the worker threads (the lowest-numbered thread, tid 50), and then either four symmetric threads (which process one packet at a time through three steps) or three pipelined threads (each implementing one of the three processing steps, and handling all packets in sequence). Note that the virtual platform also helped us inject the exact same traffic at the exact same time in both cases.
In the case with pipelined setup, we have observed massive packet losses, and this profile clearly shows why. The pipelined setup achieves no parallelism at all, and the OS mostly runs it on a single core for the duration of the execution. The lower total time spent processing also shows that the program completely misses to handle most of the packets. In contrast, there is ample parallelism in the symmetric setup, and the OS spreads out the execution on all four cores.
Virtual platforms are eminent for testing the scalability and robustness of software on future and different hardware. On physical hardware, you are limited to the core counts and configurations available today. On a virtual platform, it is easy to do what-if analysis and test software on arbitrarily large systems. For example, to check whether and how a software stack created for a dual-core platform scales (or even keeps working) up as the system moves to triple-core, quad-core, dozen-core, hundred-core, and beyond in future hardware generations. By varying core clock frequencies, or using different frequencies for different cores, timing-related errors and deadlocks tend to be provoked so that they can be found well ahead of time.
Exploring the Multicore Design Space
The configurability of virtual platforms makes it possible to explore the system design space for applications on multicore hardware. For example, a software workload can be partitioned into a part to run on general-purpose processor cores, and another part to run on digital-signal processors (DSPs), and a third part to be implemented in hardware accelerators. With a virtual platform, this can be quickly prototypes by configuring a platform with a certain number of processors of each type, and quickly-created models of hardware accelerators.