A System-Level Framework for Energy and Performance Estimation in System-on-Chip Architectures

Abstract: Shifting the design entry point up to the system level is the most important countermeasure adopted to manage the increasing complexity of SoCs. The reason is that decisions taken at this level, early in the design cycle, have the greatest impact on the final design in terms of performance, energy efficiency and silicon area occupation. However, taking decisions at this level is very difficult, since the design space is extremely wide, and it has so far been mostly a manual activity. Efficient system-level estimation tools are therefore necessary to enable proper design-space exploration and the development of system-level synthesis tools. Proposing an efficient approach to system-level estimation is the main contribution of this thesis. The approach consists of three layers. The bottom layer relies on building a library of IP energy and performance models, where each IP functionality is pre-characterized. Characterization is done only once at the gate level, which gives high accuracy to the approach. The implementation of an energy and performance model for a Leon3 processor is reported as an example. The impact that the IP-to-IP communication infrastructure has over individual IP properties is also taken into account, for bus-based and NoC-based architectures. The intermediate layer is where the actual estimation takes place. At this level, applications are run and profiled on a development host (a common PC). This allows us to create a trace of the executed source code, which is then mapped to the assembly code of the target architecture. This operation allows a trace of target instructions to be indirectly built and confers high speed on the whole methodology. Once the target trace is inferred, energy and performance figures can be extracted by using the IP models from the bottom layer. To make the whole process possible, changes are made to the GNU GCC compiler. Estimation is shown for a few common image/video codec applications. The top layer is a refinement layer that accounts for the presence of caches and for the fact that multiple applications normally run concurrently, share the same resources and are controlled by an operating system. Statistical models are built to account for the impact of each of these components. An MPSoC hosting up to 15 processors and using both fixed-priority and round robin bus arbitration is used for modeling bus contention. The RTEMS operating system is taken as a reference to model the OS impact. Validation for each layer is also carried out. The results show that the approach is within 15% of gate-level accuracy and exhibits an average speed-up of 30X compared to transaction-level modeling (TLM).

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)