eeTimes
eeTimes
eeTimes eeTimes
Forgot password Register
Print - Send - -

New Power Plays

Achieving low power in the data center with 10GBASE-T

July 19, 2010 | Kamal Dalmia and Ramin Shirani | 222901223
Achieving low power in the data center with 10GBASE-T Kamal Dalmia and Ramin Shirani, Aquantia examine a fully populated data-center rack as a platform from which to explore the impact that the choice of interconnect will have on total power dissipation in a real-world environment.

The multi-rate capability and ease of use of 10GBASE-T connections are well known, but not so well known is that it is also a low-power technology, building upon process advancements and device innovations.

This feature will examine the power consumed by the fully populated rack when deployed with 10GBASE-T, and compare that to an equivalent fully populated rack deployed with a competitive technology, the SFP+ with direct-attach twin-ax copper cabling.

Certainly 10GBASE-T did not enter the market as the lowest-power physical-layer (PHY) technology. The challenge of implementing 10GBASE-T requires that the PHY perform extensive processing. However the combination of on-going process shrinks, along with innovations in implementation, provide for significant reductions in power. Consequently, the industry is moving from current-generation devices in the 6-W range to new-generation devices at the 40-nm process node that are projected to be around 3 W.

RJ45 connectors and twisted pair cabling are commonly deployed at the edge of the network and PCs and servers today use gigabit LAN-on-motherboard (LOM). 10GBASE-T has been defined and developed with this legacy in mind. And now, we can see the fruits of these efforts coming to bear as LOM is beginning its transition to 10GBASE-T.

As used in the data center
The sweet spot for 10GBASE-T is at the edge, at the interconnect between the server and the first-level aggregation switch. Historically, these connections benefited from the flexibility of structured cabling and would find use over a wide distribution of cable lengths, from sub-1-meter patch cords, up to the full-spec length of 100 m.

Internet data centers introduce a nominally standard configuration for implementing server capacity into this otherwise ad hoc world. The fully populated rack, each with a pair of redundant top-of-rack (ToR) switches, is becoming a common increment of capacity. Such a fully populated rack would include the following elements:

  • 42RU rack
  • 40 1RU servers, each with two 10GE ports for uplinks to each of the two redundant ToR switches
  • 2 ToR 1RU switches, each with 40 10GE downlinks and 8 10GE uplinks
This fully populated rack is a good platform to explore the impact that the choice of interconnect will have on total power dissipation in a real-world environment, hence the choice of this rack as a platform from which to compare the power consumed when deployed with 10GBASE-T to the power consumed when deployed with a competitive technology, the SFP+ with direct-attach twinax copper cabling.

Fig 1: HP Extreme scale out

Fig.1: HP Extreme scale out

 

Data Center Mode

Innovative techniques coupled with the advantages of advanced process nodes like 40nm deliver power consumption near 3W/port. The emergence of the ToR switching topology makes this low power consumption mandatory and new PHYs coming to market deliver much lower power for use within the rack.

The relationship between link length and power was recognized during development of the 10GBASE-T standard. Two operating modes were envisioned, one for the full-reach 100m, 4-connector channel, and the other for a shorter link length to support modular switches deployed at the end of row. The rationale for this short reach mode was to permit lower power operation. The short reach channel was defined by IEEE as 30 meters of Cat6a twisted pair cable with two in-line connectors.

Some developers went a step further. Aquantia pioneered the concept of a Data Center Mode™ in May 2009, to permit power to be reduced to its lowest operating range possible for interconnects within the rack (up to 10 meters).

During training, the 10GBASE-T PHY determines the quality of the link, cable reach, SNR, etc. The PHY, if designed for Data Center Mode, then has the ability to turn off any unnecessary set of cancellers, recognizing the environment to be within 10 m. As these cancellers constitute a majority of the power consumption, turning them off in this mode allows the part to shave off close to half the power. In this kind of innovative design, the part can also be “forced” into the Data Center Mode, allowing the system to be designed with a guaranteed reduced maximum power consumption.

In this comparison for power in a ToR topology, the Data Center Mode is leveraged for maximum energy efficiency.

Direct-attach SFP+ interconnect
One of the attributes people have come to expect with the SFP+ form factor is the ability of an open SFP+ cage to accept a variety of SFP+ modules, ranging from a direct attach twinax cable, to an SR or LR optical module. This flexibility becomes possible when the system employs an electronic dispersion compensation (EDC) chip in series with the SFP+ cage.

Such EDC chips provide the flexibility some customers desire. This flexibility does come with a cost though, both in terms of some dollar cost per port, as well as in terms of power. The power comparisons we make below will assume EDC devices consume one half watt of power per port.

From a cabling perspective, in contrast with the direct attach copper solution, the Cat6a UTP patch cord benefits from the reliability inherent in the lower bandwidth of the cabling, and the simplicity of the connector and its attachment.

Fig. 2 and Fig.3: Cisco Direct Attach copper cable (top), Cat6a UTP cable (bottom).

Fig. 2 and Fig.3: Cisco Direct Attach copper cable (top), Cat6a UTP cable (bottom).

Fig. 2 and Fig.3: Cisco Direct Attach copper cable (top), Cat6a UTP cable (bottom).


Total rack power

The power consumption of our example server will be a function of many things, including the number and speed of the processor cores, as well as the memory, storage and other features. For the purpose of this analysis we have used a nominal power of 300 W per server at full load, including the two 10GBASE-T ports. Our fully populated rack will require 40 x 300 W = 12,000 W to power the servers.

Fig. 4: Comparison of 10GBASE-T versus SFP+ Direct Attach

Fig. 4: Comparison of 10GBASE-T versus SFP+ Direct Attach
 

The power consumption of the ToR switches is a function of the features and capabilities designed into the switch. This comparison uses the value of 10 W per port for the 40 downlinks, plus the PHY power. The power consumed by the uplinks was excluded from the comparison. Our two ToR switches will consume 40 x 12 W = 480 W each when deployed with 10GBASE-T, or 40 x 10.5 W = 420 W with SFP+ direct attach copper.

In an overall power budget of about 13 KW, the difference between the two interconnect choices is less than 2% of the total. This scenario is somewhat simplistic and assumes that other energy efficiency features available in today’s PHYs are not utilized. Next, we look at the real life usage environment and how innovative 10GBASE-T features reduce the overall power consumption to be not only lower than SFP+ but also gigabit Ethernet for a given amount of bandwidth.

Enter utilization, virtualization and dynamic consolidation
The server power shown above reflects the power consumed when the system is at full load. But servers typically operate at full load for only short periods, otherwise operating at either light loads, or at idle.

Fig.5: Power as a function of server workload, from www.spec.org.


Fig.5: Power as a function of server workload, from www.spec.org.
 

Power does not drop to zero for servers at active idle. Processors remain clocked, memories are active and disks are spinning. The amount of power consumed at active idle will be implementation specific. To get an idea of actual power consumption, data was taken from www.spec.org which shows power at active idle at about half of full load power. Our analysis draws from this data, and will use a 50% power at active idle. Our servers are assumed to draw 300 W at full load, and 150 W when at idle. Dynamic consolidation and Wake on LAN

Virtualization offers the ability to consolidate virtual machines in order to minimize the number of physical servers required. Initially the concept of virtualization was applied in a static manner, by combining otherwise independent applications on the same server. Applying virtualization in this manner saved both capital costs and power.

The application of virtualization can also be used to consolidate active virtual machines onto a minimum number of physical servers through a concept known as dynamic consolidation. These now-idled machines can then be placed in sleep mode.

Wake on LAN

A technique used to permit a network application to control networked devices going into, and coming out of sleep mode is “Wake on LAN”. WoL was originally developed in the context of corporate IT, where client machines on desktops only needed to be active when they were attended by living, breathing humans but could be turned off at night and on weekends. However IT needed access to the machines off-hours to perform maintenance and install updates. WoL would permit a remote IT utility to awake a PC in sleep state to permit IT to perform maintenance.

WoL is ideal for use with dynamic consolidation in data center applications. WoL is used to wake networked elements based on the receipt of “magic packets”. There are a number of implementation details which ease implementation of WoL for 10GBASE-T and potentially can speed the implementation of dynamic consolidation in the market.

A key for successful deployment of WoL is to minimize the power consumed by the network controller while in sleep state. Some PHY manufacturers have designed into their products the ability to enter and exit low power modes which facilitate WoL on a systems level. The low power for WoL leverages the auto negotiation capabilities integral in copper PHY technology by permitting the link to perform its WoL monitoring function at the lower-power 100-megabit data rate. The sequence would be as follows:

1. The systems software that performs allocation of virtual machines determines a server will be inactive for sufficient time to be placed into sleep state, and issues command to the target server to enter a sleep state.
2. The server breaks the current 10GBASE-T link, reconfigures for 100BASE-TX, and relinks as 100BASE-TX, then enters a sleep state.
3. Standby supplies provide Vaux, which provides just over 1 W of power per PCIe slot. The controller monitors the link for a magic packet with the PHY and controller in lower power 100-Mb/s mode.
4. Magic packet is detected by controller, and server is ‘woken up’.
5. 100BASE-TX link is broken and re-established as 10GBASE-T.

The power reduction for the server delivered by entering sleep state is much higher than the power differences between 10GBASE-T and SFP+ PHYs. If we assume that the total power consumed by the server in sleep state is 3W (standby power will include supply inefficiency and ancillary standby power consumption), this is a savings of 147 W per server over the active idle power of 150 W which is otherwise consumed.

Fig. 6: Savings of 10GBASE-T and WoL versus number of servers in sleep state. 

Fig. 6: Savings of 10GBASE-T and WoL versus number of servers in sleep state.
 

The systems benefit of permitting servers to enter sleep state begins to accumulate quickly. Given that 10GBASE-T started with a 240-W power disadvantage over SFP+, 10GBASE-T will demonstrate lower overall systems power once the second server enters a sleep state.

These differences become more apparent as the number of servers entering sleep state increase. Of course, the role of dynamic consolidation and the number of servers in sleep state will be highly application dependent, but the benefits are large. As dynamic consolidation is deployed statistics will become available on what the weighted average is of active vs. idled servers.

If the average utilization of server capacity over the typical work week drops below 95%, 10GBASE-T enjoys a power advantage that will only increase as the average power utilization decreases from the 95% threshold.

Energy Efficient Ethernet
Further power savings for 10GBASE-T will be delivered as the Energy Efficient Ethernet (EEE) standard is fully implemented over the next few product generations. IEEE802.3az is standardizing EEE in order to exploit the bursty nature of Ethernet traffic. While Ethernet traffic is intermittent by nature, copper PHYs operate in a continuous manner. 100BASE-TX, 1000BASE-T and 10GBASE-T are continually exchanging information, either data or pre-defined idle characters. This permits the use of sophisticated DSP and analog signal processing to achieve high performance.

EEE creates the ability for the link to substitute a true idle when there is nothing to transmit. The system can present Low Power Idle (LPI) commands to the PHY, permitting the PHY to enjoy lower power for the duration of that LPI command.

The power savings will depend on the traffic density and on the device implementation. An example taken from a server with a gigabit uplink in use at the LLNL is shown in Figure 5. Peak bandwidth approaches the full gigabit capacity. However overall utilization is less than 1%, and there are gaps that exist between these peaks where large files are being exchanged.

The power savings offered by EEE will be presented in two forms. The most obvious will be the reduction of PHY power during periods of LPI. Potentially more significant will be the systems level power reduction, as the concept of LPI extends beyond the PHY to the balance of the system.


Fig.7: Actual GE link utilization. Source: LLNL
Fig.7: Actual GE link utilization. Source: LLNL

Examples of these benefits can be envisioned in both the server, as well as in the switching platforms. If the application layer is cognizant of a period with low utilization, it can implement a more aggressive power savings policy within the server. Similarly, the switching platform may have resources queued to handle full bandwidth at all times. This could permit more aggressive power management strategies given advance notice of periods of reduced bandwidth requirement

Lower energy usage than gigabit Ethernet

10GBASE-T drives such low overall systems power that a comparison to 1000BASE-T is almost unfair. The appropriate analysis of course needs to be an apples-to-apples comparison with the same features within the controller adapter card. Today, dual-port 1000BASE-T NICs for high-performance converged networks can be 8 to 10 W for a dual port CNA. A similar controller adapter card for dual-port 10GBASE-T with today’s technology will be about 15 W. 10GBASE-T will be 40% the per-bit power of 1000BASE-T today when in the form an end customer can make a comparison.

Looking forward to the 40-nm node and beyond, the power in the PHY will decrease as a proportion of the overall I/O power in a converged adapter. Today, 10GBASE-T permits a 60% power savings, per gigabit of bandwidth, over 1000BASE-T in commercially available adapters. This power savings will continue to grow as the benefits of 10GBASE-T discussed in this paper come to market.

Cabling infrastructure considerations and conclusions
IEEE 802.3an defines worst case channels and certain limit lines on both Cat6 and Cat6a/Cat7 cables, which correspond to 55 m and 100 m respectively. As the limit lines are very close to the Shannon capacity of the channel, extra care has to go into the cabling deployment and field characterization.

Beside the standards published by TIA/EIA on cabling infrastructure specifications for support of 10GBASE-T, a number of resources are available from a variety of cable and test equipment manufacturers.In particular, the reader can find helpful data in the following links:

* Testing 10Gb/s Performance of Cat6 and Cat6A Cabling System (Panduit)

* Copper and Glass: Securing the Foundation of your 10Gigabit Data Center (Fluke Networks)

 

The power of 10GBASE-T

The 10GBASE-T PHY, particularly in the new quad-port designs soon coming to market, is a great illustration of the power of Moore’s Law. Silicon-based implementations will benefit both from the effects of process shrinks, as well as from the benefits of circuit innovations. The advent of 40-nm-based devices is a critical element of seeing lower power in 10GBASE-T devices. Equally important are the innovations such as the Data Center Mode for connections within the rack.


About the Authors:

Ramin Shirani is vice president of engineering & co-founder of Aquantia. He has 23 years of experience in the communications IC industry and is expert in UTP based Ethernet PHYs including low power 1000BASE-T technologies. He has worked at National, was co-founder of Enable (acquired by Agere), and has served as GM at Lucent Micro.

Kamal Dalmia is vice president of sales & marketing at Aquantia. He has 20 years of experience in sales and marketing in the semiconductor industry, spanning three generations of Ethernet products, as well as in Fiber Channel, Infiniband, SONET and ATM. In the past, he has served at Cypress, PMC, Marvell and Teranetics.








Please login to post your comment - click here
Related News
    No news
MOST POPULAR NEWS
Interview
Technical papers
Poll
What is the principal power source supporting your current product design?

All material on this site Copyright © 2009 - 2010 European Business Press SA. All rights reserved.
This site contains articles under license from EETimes Group , a division of United Business Media LLC.