Skip to Content

Amazon AWS Servers’ secret weapon: Its custom-made hardware and network

Posted on October 26, 2022 by

Categories: AWS


Internet and software behemoths frequently develop and manufacture their gear to boost productivity and gain a competitive edge.

Google builds its own servers from scratch and equips them with millions of Intel processors. In May, the company revealed that it had created an application-specific integrated circuit (ASIC) for use with neural networks. Facebook’s data centers employ its own switches.

However, the market leader in public clouds, Amazon Web Services, may have gone the furthest by creating its own high-speed network in addition to its own routers, processors, storage servers, and computation servers.

During a keynote address at the AWS re: Invent conference in November, James Hamilton, an AWS VP and Distinguished Engineer, spoke to software developers and hardware designers who work on NICs (network interface cards). “When you own the horizontal and vertical, we can move at our accustomed pace, make adjustments at our accustomed pace, and react to client needs at our accustomed pace. We consider this to be quite significant.

Hamilton claimed that the networking industry is “chopped up into vertical stacks,” with various companies innovating and competing on those stacks during a 1.5-hour geekfest complete with close-up photographs of servers, racks, and cables. Hamilton compared this to how the mainframe industry went extinct. According to him, this makes networking equipment more affordable for AWS to produce on its own, saving the company money.

According to Hamilton, we have a staff dedicated to developing our own protocols, and we operate our own bespoke routers built to our requirements. “Expense was what led us down our own road, and while there is a significant cost (upgrade), the major benefit is in reliability. This specially developed equipment “has one request from us, and we use judgment and make it straightforward. We just don’t do that because we want it to be stable, as much fun as it would be to have a bunch of complex features.

He claimed that the most dedicated, serious organization would take six months to fix the flaws if AWS utilized standard commercial routers. “This is the wrong place to be. We thus cherish our current location.

Even though “it looks like a stupid decision,” Hamilton added, AWS has decided to standardize on 25-gigabit Ethernet (25 GbE) as a fiber networking transfer speed. “I’ll defend this choice because I had a significant role in it.”

He pointed out that the industry standards are 10 GbE and 40 GbE, with 10 GbE denoting a single optical wave and 40 GbE denoting four waves, but at a cost nearly four times that of the optics. “Well, 25 concerts are practically as expensive as 10 gigs, therefore we can conduct 50 gigs for a lot less money” (than 40 gigs). I think this is where the industry will wind up, and it’s undoubtedly the proper response from an optics perspective.

With 128 ports of 25 GbE and 7 billion transistors, the Broadcom Tomahawk ASIC used by AWS routers has a flow-through capacity of 3.2 terabits (Tbit). Hamilton held one up and declared, “These are genuine monsters.” He claimed that similar chips with 6 Tbit and 13 Tbit capacities are on the way, and they will cost around the same.

Software-defined networking, which enables network managers to modify and control network behavior through interfaces, is another critical component of AWS’s networking approach. It is part of moving processes as much as possible from software into hardware.

“Whenever you have a task that is really repetitious, you’re better of moving part of that down into hardware,” he added. “We made an obvious but essential insight somewhere about 2011.”

People often remark, “Hey, the reason (AWS) had to switch to consumer networking gear is that if you didn’t, you could never have the bandwidth we have in our data centres. “That isn’t the case. I could use anyone’s equipment to offer you any bandwidth you desire. It’s pretty simple to accomplish. Do you know what’s challenging? Latency. That is how physics works. I advise software developers to measure things in milliseconds (one-thousandth of a second). They gauge nanoseconds (one billionth of a second) and microseconds in hardware (one-millionth of a second). Therefore, here is where we should go.

Additionally, AWS creates its own branded chipsets with the name “Annapurnalabs” for use in “every server we deploy,” according to Hamilton. The Israeli chipmaker Annapurna was acquired by Amazon in January for an alleged $350 million. This was the first time AWS explicitly stated that the company’s chips were being used.

Do you think we work in the semiconductor industry? Hamilton shouted. He demonstrated the chipset and said, “Not only are we producing hardware, but we developed this. “This is a significant event. We will be able to implement it in silicon if the trends I mentioned before on hardware implementation and latency hold true, which I am relatively sure.

For ofits data centers, AWS utilizes power-switching equipment with customized firmware to ensure that the load is not interrupted and the facility continues to function even in the event of an internal malfunction. That, according to Hamilton, prevents issues like the airline that suffered a $100 million loss when backup generators were shut out during a switchover and a 34-minute delay in Super Bowl coverage in 2013.

Hamilton presented up from the 8.8 PB in the 880-disk setup, AWS’s more current bespoke storage servers carry 11 petabytes (one million GB) of data on 1,100 discs housed in a single standard-size 42U rack. He said it was against corporate policy to display the most recent model.

Hamilton acknowledged that the company’s specialized compute servers, which each take up one space on a rack, are not overly numerous. It turns out that this is being used to increase thermal and electricity efficiency. This is probably three, four, or five times denser than what OEMs offer to customers, and they are also less effective. However, they make up for it in price.

The voltage regulators and power supply on the AWS compute servers are more than 90% efficient. Additionally, “if this power supply is 1 percent better, it starts to be a really intriguing amount” because AWS spends millions of dollars on electricity.

There are no third-party interconnection locations in the 100-Gb private network connecting AWS’s data centers, which are spread across 16 regions globally. “There are several parallel 100-Gb lines. Because we can withstand a connection failure, no one in this room will ever be impacted by the failure of a single link. That is how we program it. We’d be insane not to.