New Power Plays
Achieving low power in the data center with 10GBASE-T
Kamal Dalmia and Ramin Shirani, Aquantia examine a fully populated data-center rack as a platform from which to explore the impact that the choice of interconnect will have on total power dissipation in a real-world environment.
The multi-rate capability and ease of use of 10GBASE-T connections are well known, but not so well known is that it is also a low-power technology, building upon process advancements and device innovations.
This feature will examine the power consumed by the fully populated rack when deployed with 10GBASE-T, and compare that to an equivalent fully populated rack deployed with a competitive technology, the SFP+ with direct-attach twin-ax copper cabling.
Certainly 10GBASE-T did not enter the market as the lowest-power physical-layer (PHY) technology. The challenge of implementing 10GBASE-T requires that the PHY perform extensive processing. However the combination of on-going process shrinks, along with innovations in implementation, provide for significant reductions in power. Consequently, the industry is moving from current-generation devices in the 6-W range to new-generation devices at the 40-nm process node that are projected to be around 3 W.
RJ45 connectors and twisted pair cabling are commonly deployed at the edge of the network and PCs and servers today use gigabit LAN-on-motherboard (LOM). 10GBASE-T has been defined and developed with this legacy in mind. And now, we can see the fruits of these efforts coming to bear as LOM is beginning its transition to 10GBASE-T.
As used in the data center
The sweet spot for 10GBASE-T is at the edge, at the interconnect between the server and the first-level aggregation switch. Historically, these connections benefited from the flexibility of structured cabling and would find use over a wide distribution of cable lengths, from sub-1-meter patch cords, up to the full-spec length of 100 m.
Internet data centers introduce a nominally standard configuration for implementing server capacity into this otherwise ad hoc world. The fully populated rack, each with a pair of redundant top-of-rack (ToR) switches, is becoming a common increment of capacity. Such a fully populated rack would include the following elements:
- 42RU rack
- 40 1RU servers, each with two 10GE ports for uplinks to each of the two redundant ToR switches
- 2 ToR 1RU switches, each with 40 10GE downlinks and 8 10GE uplinks

Data Center Mode
Innovative techniques coupled with the advantages of advanced process nodes like 40nm deliver power consumption near 3W/port. The emergence of the ToR switching topology makes this low power consumption mandatory and new PHYs coming to market deliver much lower power for use within the rack.
The relationship between link length and power was recognized during development of the 10GBASE-T standard. Two operating modes were envisioned, one for the full-reach 100m, 4-connector channel, and the other for a shorter link length to support modular switches deployed at the end of row. The rationale for this short reach mode was to permit lower power operation. The short reach channel was defined by IEEE as 30 meters of Cat6a twisted pair cable with two in-line connectors.
Some developers went a step further. Aquantia pioneered the concept of a Data Center Mode™ in May 2009, to permit power to be reduced to its lowest operating range possible for interconnects within the rack (up to 10 meters).
During training, the 10GBASE-T PHY determines the quality of the link, cable reach, SNR, etc. The PHY, if designed for Data Center Mode, then has the ability to turn off any unnecessary set of cancellers, recognizing the environment to be within 10 m. As these cancellers constitute a majority of the power consumption, turning them off in this mode allows the part to shave off close to half the power. In this kind of innovative design, the part can also be “forced” into the Data Center Mode, allowing the system to be designed with a guaranteed reduced maximum power consumption.
In this comparison for power in a ToR topology, the Data Center Mode is leveraged for maximum energy efficiency.
Direct-attach SFP+ interconnect
One of the attributes people have come to expect with the SFP+ form factor is the ability of an open SFP+ cage to accept a variety of SFP+ modules, ranging from a direct attach twinax cable, to an SR or LR optical module. This flexibility becomes possible when the system employs an electronic dispersion compensation (EDC) chip in series with the SFP+ cage.
From a cabling perspective, in contrast with the direct attach copper solution, the Cat6a UTP patch cord benefits from the reliability inherent in the lower bandwidth of the cabling, and the simplicity of the connector and its attachment.


Total rack power

The power consumption of the ToR switches is a function of the features and capabilities designed into the switch. This comparison uses the value of 10 W per port for the 40 downlinks, plus the PHY power. The power consumed by the uplinks was excluded from the comparison. Our two ToR switches will consume 40 x 12 W = 480 W each when deployed with 10GBASE-T, or 40 x 10.5 W = 420 W with SFP+ direct attach copper.
In an overall power budget of about 13 KW, the difference between the two interconnect choices is less than 2% of the total. This scenario is somewhat simplistic and assumes that other energy efficiency features available in today’s PHYs are not utilized. Next, we look at the real life usage environment and how innovative 10GBASE-T features reduce the overall power consumption to be not only lower than SFP+ but also gigabit Ethernet for a given amount of bandwidth.
Enter utilization, virtualization and dynamic consolidation
The server power shown above reflects the power consumed when the system is at full load. But servers typically operate at full load for only short periods, otherwise operating at either light loads, or at idle.

Power does not drop to zero for servers at active idle. Processors remain clocked, memories are active and disks are spinning. The amount of power consumed at active idle will be implementation specific. To get an idea of actual power consumption, data was taken from www.spec.org which shows power at active idle at about half of full load power. Our analysis draws from this data, and will use a 50% power at active idle. Our servers are assumed to draw 300 W at full load, and 150 W when at idle. Dynamic consolidation and Wake on LAN
Virtualization offers the ability to consolidate virtual machines in order to minimize the number of physical servers required. Initially the concept of virtualization was applied in a static manner, by combining otherwise independent applications on the same server. Applying virtualization in this manner saved both capital costs and power.
The application of virtualization can also be used to consolidate active virtual machines onto a minimum number of physical servers through a concept known as dynamic consolidation. These now-idled machines can then be placed in sleep mode.
Wake on LANA technique used to permit a network application to control networked devices going into, and coming out of sleep mode is “Wake on LAN”. WoL was originally developed in the context of corporate IT, where client machines on desktops only needed to be active when they were attended by living, breathing humans but could be turned off at night and on weekends. However IT needed access to the machines off-hours to perform maintenance and install updates. WoL would permit a remote IT utility to awake a PC in sleep state to permit IT to perform maintenance.
WoL is ideal for use with dynamic consolidation in data center applications. WoL is used to wake networked elements based on the receipt of “magic packets”. There are a number of implementation details which ease implementation of WoL for 10GBASE-T and potentially can speed the implementation of dynamic consolidation in the market.
A key for successful deployment of WoL is to minimize the power consumed by the network controller while in sleep state. Some PHY manufacturers have designed into their products the ability to enter and exit low power modes which facilitate WoL on a systems level. The low power for WoL leverages the auto negotiation capabilities integral in copper PHY technology by permitting the link to perform its WoL monitoring function at the lower-power 100-megabit data rate. The sequence would be as follows:
1. The systems software that performs allocation of virtual machines determines a server will be inactive for sufficient time to be placed into sleep state, and issues command to the target server to enter a sleep state.2. The server breaks the current 10GBASE-T link, reconfigures for 100BASE-TX, and relinks as 100BASE-TX, then enters a sleep state.
3. Standby supplies provide Vaux, which provides just over 1 W of power per PCIe slot. The controller monitors the link for a magic packet with the PHY and controller in lower power 100-Mb/s mode.
4. Magic packet is detected by controller, and server is ‘woken up’.
5. 100BASE-TX link is broken and re-established as 10GBASE-T.
The power reduction for the server delivered by entering sleep state is much higher than the power differences between 10GBASE-T and SFP+ PHYs. If we assume that the total power consumed by the server in sleep state is 3W (standby power will include supply inefficiency and ancillary standby power consumption), this is a savings of 147 W per server over the active idle power of 150 W which is otherwise consumed.
The systems benefit of permitting servers to enter sleep state begins to accumulate quickly. Given that 10GBASE-T started with a 240-W power disadvantage over SFP+, 10GBASE-T will demonstrate lower overall systems power once the second server enters a sleep state.
These differences become more apparent as the number of servers entering sleep state increase. Of course, the role of dynamic consolidation and the number of servers in sleep state will be highly application dependent, but the benefits are large. As dynamic consolidation is deployed statistics will become available on what the weighted average is of active vs. idled servers.
If the average utilization of server capacity over the typical work week drops below 95%, 10GBASE-T enjoys a power advantage that will only increase as the average power utilization decreases from the 95% threshold.
Energy Efficient EthernetFurther power savings for 10GBASE-T will be delivered as the Energy Efficient Ethernet (EEE) standard is fully implemented over the next few product generations. IEEE802.3az is standardizing EEE in order to exploit the bursty nature of Ethernet traffic. While Ethernet traffic is intermittent by nature, copper PHYs operate in a continuous manner. 100BASE-TX, 1000BASE-T and 10GBASE-T are continually exchanging information, either data or pre-defined idle characters. This permits the use of sophisticated DSP and analog signal processing to achieve high performance.
EEE creates the ability for the link to substitute a true idle when there is nothing to transmit. The system can present Low Power Idle (LPI) commands to the PHY, permitting the PHY to enjoy lower power for the duration of that LPI command.
The power savings will depend on the traffic density and on the device implementation. An example taken from a server with a gigabit uplink in use at the LLNL is shown in Figure 5. Peak bandwidth approaches the full gigabit capacity. However overall utilization is less than 1%, and there are gaps that exist between these peaks where large files are being exchanged.
The power savings offered by EEE will be presented in two forms. The most obvious will be the reduction of PHY power during periods of LPI. Potentially more significant will be the systems level power reduction, as the concept of LPI extends beyond the PHY to the balance of the system.
Fig.7: Actual GE link utilization. Source: LLNL
Examples of these benefits can be envisioned in both the server, as well as in the switching platforms. If the application layer is cognizant of a period with low utilization, it can implement a more aggressive power savings policy within the server. Similarly, the switching platform may have resources queued to handle full bandwidth at all times. This could permit more aggressive power management strategies given advance notice of periods of reduced bandwidth requirement
Lower energy usage than gigabit Ethernet
10GBASE-T drives such low overall systems power that a comparison to 1000BASE-T is almost unfair. The appropriate analysis of course needs to be an apples-to-apples comparison with the same features within the controller adapter card. Today, dual-port 1000BASE-T NICs for high-performance converged networks can be 8 to 10 W for a dual port CNA. A similar controller adapter card for dual-port 10GBASE-T with today’s technology will be about 15 W. 10GBASE-T will be 40% the per-bit power of 1000BASE-T today when in the form an end customer can make a comparison.Looking forward to the 40-nm node and beyond, the power in the PHY will decrease as a proportion of the overall I/O power in a converged adapter. Today, 10GBASE-T permits a 60% power savings, per gigabit of bandwidth, over 1000BASE-T in commercially available adapters. This power savings will continue to grow as the benefits of 10GBASE-T discussed in this paper come to market.
Cabling infrastructure considerations and conclusionsIEEE 802.3an defines worst case channels and certain limit lines on both Cat6 and Cat6a/Cat7 cables, which correspond to 55 m and 100 m respectively. As the limit lines are very close to the Shannon capacity of the channel, extra care has to go into the cabling deployment and field characterization.
Beside the standards published by TIA/EIA on cabling infrastructure specifications for support of 10GBASE-T, a number of resources are available from a variety of cable and test equipment manufacturers.In particular, the reader can find helpful data in the following links:
* Testing 10Gb/s Performance of Cat6 and Cat6A Cabling System (Panduit)* Copper and Glass: Securing the Foundation of your 10Gigabit Data Center (Fluke Networks)
The power of 10GBASE-T
The 10GBASE-T PHY, particularly in the new quad-port designs soon coming to market, is a great illustration of the power of Moore’s Law. Silicon-based implementations will benefit both from the effects of process shrinks, as well as from the benefits of circuit innovations. The advent of 40-nm-based devices is a critical element of seeing lower power in 10GBASE-T devices. Equally important are the innovations such as the Data Center Mode for connections within the rack.
About the Authors:
Ramin Shirani is vice president of engineering & co-founder of Aquantia. He has 23 years of experience in the communications IC industry and is expert in UTP based Ethernet PHYs including low power 1000BASE-T technologies. He has worked at National, was co-founder of Enable (acquired by Agere), and has served as GM at Lucent Micro.
Kamal Dalmia is vice president of sales & marketing at Aquantia. He has 20 years of experience in sales and marketing in the semiconductor industry, spanning three generations of Ethernet products, as well as in Fiber Channel, Infiniband, SONET and ATM. In the past, he has served at Cypress, PMC, Marvell and Teranetics.
- No news
- Solar microinverters and DC-DC power optimizers to generate USD 1.5 bn in next five years
- Research: Li-ion battery has surprisingly small ecological footprint
- Graphite foam cools hi-intensity LEDs
- Sanyo launches solar cell module comprising 21.1 percent efficiency cells
- University of Southampton plan to develop energy harvesting fabrics
- Cree devises 150-mm SiC wafers
- Imec reports large-area silicon solar cells with efficiencies up to 19.4%
- Suntech tops solar module shipment ranking in Q2, 2010
- 'Perilous' market conditions seen in solar
- Collaboration enables jettable solar technology to provide enhanced cell efficiencies at lower cost
- EMC filters for medical devices
- Designing a multichemistry battery charger
- Power Management Solutions for Stellaris® ARM® CortexTM-M3 MCUs
- PoE+ Circuit Delivers 13W to 70W for Powered Devices (PDs)
- Inductance calculations with PerMag
- Digital Power Helps Get Products to Market More Quickly
- Comparing the Merits of Integrated Power Modules versus Discrete Regulators
- High Efficiency, High Power Factor TRIAC Dimmable 14 WTYP LED Driver
- Power Supply Design Just Became More Straightforward, Thanks to a New Interleaved PFC IC
- High Efficiency, High Power Factor TRIAC Dimmable 7 WTYP LED Driver
Power Management
Power Supply
Analog
National Semiconductor
STMicroelectronics
Power MOSFET
MOSFETs
Maxim Integrated Products
Microcontroller
ADC
DC/DC Converter
Smart Grid
Fairchild Semiconductor
Power
Battery
Energy Harvesting
Cypress Semiconductor
Analog Devices
MOSFET
IMS Research
Photovoltaic
Vishay Intertechnology
Linear Technology
Diodes
Solar
International Rectifier
DC/DC Converters
Batteries
Texas Instruments
Power MOSFETs
This site contains articles under license from EETimes Group , a division of United Business Media LLC.


