Simulator Technology
Computer system simulation is a separate field of study in its own right, and simulation techniques range from high-level, abstract queuing models to extremely detailed, low-level RTL or even SPICE models. Each point in the spectrum trades accuracy for speed of simulation, and is therefore suited to a different set of goals. For example, queuing models allow a range of design parameters to be evaluated quickly, but at the level of a rough approximationin a co-design environment this allows one to quickly explore a design space and focus in on a few alternatives for further study. In contrast, co-verification tools use low-level timing accurate models to verify the hardware implementation their slow speed but high accuracy is justified because the cost of making a mistake in the chip implementation can be prohibitive. In this section we briefly review the spectrum of simulator technologies as they apply to a co-development environment.
Native Execution
The fastest way to run a target application is simply to compile it for the host and run it natively. Of course, this is often of limited use in embedded systems because the application can not make non-native OS calls or reference memory-mapped peripherals. Native execution also breaks the transparency rule described earlier, unless the host and target environments are very similar. Still, native execution can be used early in the development cycle to test higher level algorithms (for example, compression algorithms) or to develop platform independent sub-systems such as the user interface.
Native execution can be extended in several dimensions to improve transparency by incorporating more of the target environment's features. For example, an RTOS emulation layer can be provided so that the application can make system calls supported on the target. The Cygnus eCosä
RTOS has found this functionality is useful for device driver development. Native execution can also be extended to support memory-mapped peripherals by trapping on the foreign addresses and performing the simulation of the peripheral in the exception handler.
A third way to extend native execution is appropriate if the host processor is similar to, or is an architectural subset of the target processor. This may be the case if both the host and target CPUs are, say, in the PowerPC family. In this case, the extensions in the target processor's instruction set can be simulated by trapping on and decoding the illegal instructions on the host processor. If this is possible, it restores the transparency aspect, because the host can now execute the target binaries directly.
Instruction Set Simulators
An instruction set simulator (ISS) performs the fetch-decode-execute cycle for the target CPU's instruction set on the host processor. The ISS typically includes all the target processor's registers and system state, and as such is a complete model of the target processor.
There are many variants on the degree to which the internal details of the target processor are modeled. For example, the ISS may model the details of the pipeline and dispatch to multiple functional units. As another example, some ISSs will use native floating point instead of simulating the FPU of the target. As a third example, the target processor's caches may or may not be modeled. The degree to which these features are important depends on the application and on the maturity of the product. Early on, or for timing-insensitive applications, a high-level simulator that only mirrors the semantics of the target instruction set may be appropriate. For performance tuning, or for more detailed verification, a pipeline accurate or cycle accurate simulator with a cache model may be necessary. This is especially true of DSP and VLIW processors, where visibility into the internals of the architecture can be the most important feature of the co-development tool.
While native execution can yield simulation speeds in excess of 100 MIPS, instruction set simulators typically achieve a much more modest 1-5 MIPS. Since an ISS is a true model of the target processor, the simulator technology must use extensive instruction caching to achieve even this speed. More exotic techniques include instruction chaining to avoid repeatedly fetching instructions in a basic block, and Just-in-Time (JIT) technology can be used to map target instructions to one or more host instructions to shorten the execute phase [2]. While these techniques increase speed, they can affect the visibility aspect of the co-development tool since debugging capabilities may be somewhat reduced.
Hardware Simulators
This class of simulators executes HDL models of the hardware using event queues or cycle-based techniques. Even at the highest levels of behavioral abstraction, these simulators achieve speeds of only 1000s of instructions per second (0.001 MIPS). At the more detailed RTL and gate-levels, simulation speeds are much slower, around 1 instruction per second or less. As well, high-level symbolic debugging is sometimes difficult, because the models are expressed at such a low level.
Nevertheless, a hardware simulator is often useful in a co-development environment. If the main processor and memory are modeled using ISS techniques, the HDL simulator can be used for the custom ASICs of the embedded product. Using HDL models saves having to write a high-level model of the hardware. Overall performance will degrade in proportion to the degree of interaction with the custom hardware. In something like a biomedical application with relatively slow sample rates, the simulation may be able to run at nearly full ISS speeds. In general, a hardware simulator is the best verification tool, since the models it simulates will be used to generate the physical prototype.
|