Samsung 1GHz Hummingbird mobile CPU takes on Snapdragon

Jul 27, 2009
1

Samsung have announced their new 1GHz mobile processor, the Samsung Hummingbird, based on the 45nm ARM Cortex A8 architecture and developed jointly with Intrinsity.  The Hummingbird CPU promises not only high media and data crunching performance in mobile devices, but low power consumption and - thanks to some creative re-use of existing technology - relatively low chip prices.

Hummingbird comes with 32KB of both data and instruction cache, a variable-size L2 memory cache and the ARM NEON multimedia extension.  With NEON, Hummingbird can promise hardware video encoding and decoding, 2D/3D graphics, audio/voice/speech processing and sound synthesis that's more than twice as powerful as previous ARM-based chips.

Samsung are now working on SoC (System-on-Chip) implementations of the Hummingbird, which will likely be positioned to take on Qualcomm's similarly 1GHz Snapdragon chipset.  No word on how the two components compare in terms of pricing, however.

Press Release:

SAMSUNG AND INTRINSITY JOINTLY DEVELOP THE WORLD’S FASTEST ARM® CORTEX™-A8 PROCESSOR BASED MOBILE CORE IN 45 NANOMETER LOW POWER PROCESS

SEOUL, KOREA, AUSTIN, TEXAS, July 26, 2009 – Samsung and Intrinsity today jointly announced the industry’s fastest mobile processor core implementation of the dual-issue ARM® Cortex™-A8 processor architecture in 45 nanometer (nm) Low Power (LP), low leakage process technology. This Cortex-A8 implementation, code-named Hummingbird, delivers 2000DMIPS at 1GHz. The Hummingbird comes with 32KB each of data and instruction caches, an L2 Memory cache, the size of which can be customized, and an ARM® NEON™ multi-media extension. Performance and power consumption of the Hummingbird have been validated in silicon. SoC implementations using this core are under development.

To achieve a 1GHz operating clock speed in 45nm LP process, the Hummingbird utilizes a semi-custom design flow which involves custom designed circuit/memory structures, a set of customized primitive cells, and the enhanced RTL FastCore® and Fast14® high-speed domino logic from Intrinsity in its implementation. A multi-Vdd / multi-frequency design methodology was also used to ensure the Hummingbird can run at a high speed even at the minimum supply voltage of 1.0V. The low power consumption and the high operating clock performance make the Hummingbird an ideal processor core for use in advanced mobile devices.

“The biggest challenge in mobile processor core design and implementation is to achieve high clock speed performance while keeping the power consumption low,” said Dr. Jae Cheol Son, Vice President, SOC Platform Development, System LSI Division, Samsung Electronics. “Collaboration between Samsung and Intrinsity combines the best design and implementation technologies in the industry in successfully meeting the aggressive performance and power consumption targets of the Hummingbird. Samsung’s forthcoming SOC products based on the Hummingbird will enable our customers to add many more advanced processing capabilities to their mobile products without sacrificing battery life.”

A highly effective synthesis flow which creates static logic with optimal timing and power was employed to generate the gate-level view of the Hummingbird. With this flow, standard cell gates are placed to minimize wire delay and maximize speed. Highly automated Vt and cell selection flows choose the best gates for speed while balancing power. Finally, a high performance physical design integration flow which includes automated and optimized bus routing, driving, and re-buffering is used to generate the final design.

According to Will Strauss, president of Forward Concepts, "The market for both stand-alone and embedded advanced cellphone application processors was $2 billion in 2008, and is expected to grow at 25 percent CAGR to the $6.1 billion level in 2013, with some growth even in recessionary 2009. Our forecast is that ARM's Cortex-A8 family could account for about half of the total market for mobile application processors by 2013, since ARM already has more than a dozen Cortex-A processor family licensees."

"A solution that can provide mobile devices with additional performance at lower power consumption is key to winning market share among mobile device consumers. The Hummingbird offers an advanced vehicle to achieve such capability." Strauss concluded.

According to Bob Russo, Intrinsity CEO, “Not only is it the fastest available Cortex-A8 processor in an LP technology on the market, but we believe it has the lowest leakage and dynamic power consumption of any high-end mobile processor core out there. Mobile device end-users want smoother video, faster gaming, and a longer battery life. Meeting these conflicting demands typically means building a new processor implementation from scratch. That can take as long as two or more years and hundreds of engineers – a very expensive proposition. Intrinsity’s FastCore solution could be available in as quickly as four months at a fraction of the cost. Cycle behavior changes are not viable because they require that all software and test suites be largely re-designed and introduce an unacceptable high level of risk. Add another year or more to the development time for that. Intrinsity has solved this problem by applying a semi-custom design flow and Fast14 technology to enhance a great core and potentially double its performance.”

Intrinsity’s Cortex-A8 processor-based Fastcore embedded core is cycle-accurate and Boolean equivalent to the original Cortex-A8 RTL specification. While most ARM processor cores are implemented with synthesized static logic and compiled SRAMs, the Hummingbird achieves the exceptional 1GHz clock rate in Samsung’s 45nm LP process technology through the use of a semi-custom design flow which strategically applies Intrinsity’s proprietary Fast14 one-of-N domino logic (NDL) technology as macros in the timing-critical paths of the Cortex-A8 RTL core. NDL provides low latency conversion between domino logic and static logic which allows NDL to be seamlessly applied to a standard cell synthesized design. NDL provides gates which are 25 to 50 percent faster than static logic gates.


Must Read Bits & Bytes