Wednesday, September 08, 2010

  • Print
  • |
  • |
  • Reprints
  • |
  • RSS
  • |
 

Optimizing Embedded Designs for the Intel Atom Processor

Designers must understand intricacies involved in hardware, software, and end delivery to select the right Atom-based solution for their embedded devices.

Jamey Dobbins

Page 1 of 3

Compelling low-power and higher-performance embedded systems can now be realized by leveraging the advanced architecture and managed power characteristics of the IA-32 based Intel® Atom™ Processor. Performance computing features traditionally reserved for desktop and server class systems are now available for well-optimized embedded systems and, if properly implemented, can be achieved in concert with low-power operation.

These features include: multithreading, Intel® Hyper-Threading Technology, virtualization, high-level macro ops and advanced SSE3 math dedicated functions. In addition to these features, the Intel® Atom™ Processor can deliver sophisticated dynamic power management. This paper and presentation will outline how these performance capabilities coupled with advanced power managed system architecture can be productively employed for optimized embedded systems.

Multithreading and Intel Hyper-Threading Technology
What is multithreading and how can it be used to optimize embedded applications? In the computer world, multithreading is the task of creating a new thread of execution within an existing process rather than starting a new process to begin a function. Essentially, the task of multithreading is intended to make wiser use of computer resources by allowing those already in use to be simultaneously utilized by a slight variant of the same process.

How does this differ from multitasking one might ask? In multitasking, a separate process is spawned or forked off of the original process and the forked process has a completely separate address space and process id. Threads on the other hand share the same address space and process id as the process that created them. Creation of threads requires less overhead, as measured in machine time, than the forking off of an entirely new process. Threads have an advantage over processes by sharing file handles. A file opened in the main process can share this handlewith all of the threads it created. Communication between threads is also much easier than inter-process communication.

Parallelism
The key to optimize applications whether embedded or not is to promote parallelism which is the simultaneous processing of different data or tasks. Parallelism is achieved through two models: "data" and "functional" decomposition. As the names imply, these two models represent very different methods of applying multiple threads to achieve a higher level of performance within a single process. Data decomposition is where the same independent operation is applied to different sets of data. Functional decomposition means performing independent work on asynchronous threads.

Examples of data decomposition would be mathematical algorithms like matrix multiplication that could split the row times column computation into 'N' threads each computing a different columns output. Applications such as a client-server communication, advanced 2D/3D graphic rendering with many characters or independent entities in the visible scene, complex HMI's, or playing a video lend themselves to functional decomposition. In the client-server model a new thread is created for each client that communicates with the server. A graphics display application could use a different thread to control the actions of each entity or rendering layer on the screen. When executing playback of a video, threads are used to handle the audio, read data from the disk, and play the video.

Multi-core
Threads can run concurrently on multi-processor or multi-core hardware, yielding higher performance and increased system responsiveness. Multithreading is best suited for this type of system. Threads can, however, run on uni-processor cores and uni-processor machines. This is achieved by a "slight of hand trick" called timeslicing. Each thread is scheduled to run on the processor for a given amount of time called a time-slice. The operating systems scheduler is responsible for when and how long each thread of execution is allowed to run on the processor. There are different algorithms utilized for the scheduler that try to optimize the usage of each thread or process while also attempting to achieve a good level of system responsiveness.

Intel®Hyper-Threading Technology
You have probably heard the term "Hyper-threading" and wondered, what is this? Hyper-threading or "simultaneous multithreading" is a term coined by Intel®. Intel's technology essentially enables the operating system to behave as though it is controlling two processors, allowing two threads to be run in parallel, both on separate 'logical' processors within the same physical processor.

The operating system effectively sees two processors through a mix of shared, replicated and partitioned chip resources, such as registers, math units and cache memory. Intel® Hyper-Threading Technology can significantly improve performance on a supported processor by keeping the processor execution pipelines and resources busy and minimizing idle cycles. See diagram below:

Figure 1 - Intel® Hyper-Threading Technology

Low-power embedded applications, typically in the past, ran on small RISC-based or low-end x86 systems with little or no multithreading exploited to help improve performance. Newer technology like Intel's Atom™, which is architected for Intel® Hyper-Threading Technology, can perform operations very efficiently. All of this performance, coupled with low-power operation of only 2-3 watts for an entire Compute-On-Module, make embedded multithreading a very viable solution for system optimization.

Introducing parallelism into embedded applications can now reap higher performance when coupled with hardware that is geared toward this parallel computing. Keep an eye on multicore technology processors also, which are anticipated to be moving rapidly towards the embedded systems market with low power architectures. Embedded systems and applications designed with parallelism can quickly adopt and exploit the performance of future generation multi-core processors.

Page 2: Virtualization
Page 3: Compiler Optimizations

Page 1 2 3
Intel Embedded Design Center