Section 1: What Is a Datacenter, Really?

A datacenter is a physical facility that centralizes an organization's shared IT operations and equipment - servers, storage, networking gear - along with all the infrastructure needed to keep that equipment running continuously: power delivery, cooling, physical security, and network connectivity.

Key to this concept is centralization. Before datacenters, every company ran its own servers in a server room, a closet, or under someone's desk. Centralizing created economies of scale: one power feed, one cooling plant, one security perimeter, one operations team, all serving thousands of machines.

A Brief History of Datacenters

1945-1960s: The first "computer rooms" were built for mainframes. The ENIAC filled an 1,800 sq foot room and needed dedicated power and cooling, and IBM's System/360 popularized the raised-floor computer room - a design still used today.

1970s-1980s: Minicomputers shrank the footprint but the concept was still the same: a controlled environment for sensitive machines. "Data processing centers" became standard in banks, airlines, and the government.

1990s: The internet changed everything. The dot-com boom drove demand for server colocation - companies needed to house web servers somewhere reliably connected to the internet. Thus, the modern commercial datacenter was born.

2010s: Hyperscalers emerged. Google built its first purpose-built datacenter in 2000 in The Dalles, Oregon. Amazon launched AWS in 2006, Azure launched in 2010. These companies stopped buying commodity datacenters and started designing their own from the ground up.

2020s: AI compute demands are reshaping the entire industry. A single H100 GPU draws 700W, and a rack of them draws 50-80 kW. Datacenters that were designed for 10kW/rack have become immediately obsolete.

Section 2: Power Infrastructure

Power is the single most important resource in a datacenter. Everything else is downstream of power, which means that a datacenter that loses power loses everything.

The power chain goes like this:

Utility grid -> Substation -> Main Switchgear -> UPS -> PDU -> Rack -> Server PSU -> CPU/GPU

If some of those acronyms don't make sense, don't worry. We'll cover everything over the course of this guide.

Step 1: Utility Feed

Large datacenters typically negotiate with utility providers for their own dedicated high-voltage feeds (13.8 kV for smaller datacenters to hundreds of kilovolts for hyperscalers). A hyperscale facility might draw anywhere from 100-500 MW to power, which is equivalent to a small city. That's a pretty insane number.

Different substations usually split utility feeds. If one utility feed fails in a geographical area (due to a downed transmission line, a substation fire, etc.), the second takes over within milliseconds via automatic transfer switches.

The key metric to think about for the utility feed is utility availability. In most regions, it's 99.9%-99.99%, which translates to 8-88 hours of potential downtime per year from the grid alone.

Step 2: Uninterruptible Power Supplies (UPS)

The UPS serves two purposes: it bridges the time gap during a utility failure until the generator starts (typically 10-30 seconds) and it conditions the power (filtering spikes, sags, and harmonics that damage equipment).

UPS Topologies

Double-conversion (online): All power flows through the UPS continuously, and the battery is always in the circuit. This provides almost perfect power conditioning, and it's used in 99% of critical applications. It's slightly less efficient (let's say 95% of maximum efficiency).

Line-interactive: UPS only activates on failure. More efficient, but less protective. This topology is typically used in edge and smaller deployments.

Static transfer switch (STS): Transfers between two power feeds in < 4ms. It's not a true UPS but it's used in 2N architectures (N, N+1, 2N, and 2N+1 will be covered in just a few sentences).

Battery Technology

Traditional VRLA (valve-regulated lead-acid) batteries are heavy, bulky, and degrade in heat.

Lithium-ion UPS systems are increasingly common: smaller footprint, longer life, better performance in warm environments, but 2-3x the upfront cost.

Some hyperscalers (notably Google and Microsoft) are experimenting with flywheel UPS systems - spinning steel wheels that store kinetic energy, with near-zero maintenance and 20+ year lifespans.

Redundancy Models

N: exactly the capacity you need. Any failure causes an outage.

N + 1: one extra unit, so any single failure is covered.

2N: two complete, independent systems. Either can carry the full load alone.

2N+1: two complete systems plus one extra. The most resilient, but also the most expensive.

Step 3: Generators

Diesel generators are the primary backup for extended outages. They take 10-30 seconds to start and reach stable output, which is why the UPS batteries exist (to bridge that gap).

Sizing: Generators are typically sized at 125% of the critical load to handle startup surges.

Runtime: Most facilities stock 24-72 hours of diesel on-site. During major disasters, like Hurricane Sandy in 2012, some facilities in New York ran for 7+ days on diesel while waiting for restoration.

Fuel Logistics: Hyperscalers maintain long-term fuel contracts with priority delivery.

Alternative/Emerging Approaches: Hydrogen fuel cells are being popularized by Microsoft as generator replacements, as well as on-site solar and battery options that can offset (but not replace) grid dependency at scale.

Step 4: Power Distribution Units (PDUs)

PDUs step voltage down from the generator/UPS output to rack-level voltage and distribute it to individual racks via breaker panels.

A few different types of PDUs:

Floor-mount PDUs: serve multiple racks in a row.

Rack-mount PDUs: sit inside individual racks, measure per-outlet power draw in real time, and can remotely cycle power to individual servers.

Dual-corded power: Critical servers have two power supply units (PSU), each connected to a separate PDU on a separate circuit, fed from separate UPS strings. This is the "dual-corded" architecture, because the server stays on as long as at least one PSU has power.

The key insight here: The dual-corded design means a PDU failure only degrades a server (it runs on one PSU), it doesn't kill it. Only a failure of both PDUs simultaneously takes a dual-corded server down. This is why the N+1 vs. 2N distinction matters so much at the PDU level.

Power Efficiency: PUE

The most important metric here is power usage effectiveness (PUE), which is the following:

Total facility power / IT equipment power

A PUE of 1.0 is theoretical perfection - every watt goes to compute. In practice, the industry average is around 1.5, a good colocation facility has 1.3-1.4, hyperscalers run at 1.1-1.15, and Google's best facility runs at 1.06.

PUE optimization is really important in datacenters. Take the following business example:

If you're running 10 MW of IT load at PUE 1.5, you're paying for 15 MW. At $0.07/kWh, that's $918,000/year in pure waste vs. a 1.1 PUE facility. PUE optimization is one of the highest-ROI activities in datacenter management.

Section 3: Cooling Infrastructure

Heat is of utmost importance. A CPU throttles at 95C and dies above 105C. A datacenter's cooling infrastructure exists to continuously remove the heat generated by every chip in every server before it can accumulate.

Where does heat come from? Well, every watt of electrical power consumed by a server is eventually converted to heat. A 10 kW rack produces 10 kW of heat continuously (24/7/365). For perspective, that's equivalent to 100 incandescent light bulbs burning inside a wardrobe-sized cabinet. Modern GPU servers push this to 50-80 kW per rack, which is why cooling is the defining constraint of infra.

Airflow: the basic approach is to push cold air into servers and pull hot air out. Every server has fans that draw air from front to back (or bottom to top).

Hot Aisle/Cold Aisle Containment

Racks are arranged in alternating rows:

Cold aisles: server fronts face each other, so cold air is supplied from the raised floor (via perforated tiles) or overhead.

Hot aisles: server backs face each other, so hot exhaust air collects here and is returned to the cooling units.

Cold aisle containment: A physical barrier (curtains or hard panels) seals the top and ends of the cold aisle, which prevents hot and cold air from mixing.

Hot aisle containment: A physical barrier seals the hot aisle, and hot air is captured to be returned directly to the CRAC (Computer Room Air Conditioning) units.

Cooling Equipment

The CRAC (Computer Room Air Conditioning) units are the workhorses of datacenter cooling. A CRAC unit is a specialized air conditioner that:

Draws hot air from the hot aisle.

Passes it over a chilled-water coil or direct-expansion refrigerant coil.

Blows cooled air back into the cold aisle or under the raised floor.

Some other definitions on chilling:

CRAH (Computer Room Air Handler): Like a CRAC, but uses chilled water from a central chiller plant rather than its own refrigerant circuit. More efficient at scale, but requires a chiller plant to be present.

Chiller plants: A chiller is an industrial refrigeration machine that produces chilled water (typically 6-12C). That chilled water circulates to CRAC/CRAH units throughout the facility. Chillers are large, expensive, and typically located outside or on the roof.

Cooling towers: Chillers reject heat via cooling towers, which are large fans that evaporate water to dissipate heat into the outside air. This is where water consumption becomes significant. A large datacenter can consume millions of gallons of water per day through evaporative cooling, which is why Water Usage Effectiveness (WUE) has become an important sustainability metric alongside PUE.

Advanced Cooling for High-Density Racks

Standard air cooling maxes out at roughly 20-25 kW per rack. Above that, air simply can't remove heat fast enough. Modern AI/GPU deployments require alternatives.

Rear-door heat exchangers (RDHx): A chilled-water heat exchanger replaces the standard rear door of a rack. Hot exhaust air from servers passes through it before entering the room, so the rack appears "cool" to the surrounding environment. Can handle 30-50 kW per rack.

Direct Liquid Cooling (DLC): Cold plates attach directly to CPUs, GPUs, and memory modules. Liquid flows through the cold plate, absorbs heat, and carries it away. Can handle 100+ kW per rack. Intel, AMD, and NVIDIA now all offer server SKUs with native DLC support.

Immersion cooling: Servers are submerged in tanks of non-conductive dielectric fluid. The fluid absorbs heat directly from every component simultaneously. Extremely efficient, and can achieve PUE of 1.02-1.05. There are two variants:

Single-phase: Fluid stays liquid and is pumped to a heat exchanger.

Two-phase: Fluid boils at around 50C, vapor rises and condenses on a coil above the tank, then drips back down. Zero pumping energy required.

When we look at something like NVIDIA's GB200 NVL72 rack (72 Blackwell GPUs), it requires liquid cooling as a physical necessity and cannot be deployed with air cooling at all.

Free Cooling and Economization

In cold climates, datacenters can use outdoor air or water directly to cool the facility without running chillers at all. This is called economization or free cooling.

Air-side economization: Outdoor air is filtered and circulated directly through the datacenter. Used by Meta's Prineville, Oregon facility.

Water-side economization: Cooling towers cool water without the chiller compressor running at all. Used widely in temperate climates.

This is why location matters so much. Microsoft, Google, and Meta build in Scandinavia, Oregon, Iowa, and Finland because cold climate means cheap or free cooling, which means dramatically better PUE and lower operating costs. Google's Finland datacenter uses seawater from the Bay of Finland. Meta's Lulea, Sweden datacenter runs almost entirely on free cooling and achieves a PUE of 1.07.

Section 4: Network Infrastructure

If power is the circulatory system and cooling is the immune system, networking is the nervous system, which is the medium through which all information actually flows.

Let's understand the physical layer. So what are the cables?

Copper (CAT6/CAT6A): Used for short runs within or between adjacent racks. Maxes out at 10 Gbps over short distances.

Fiber optic: The dominant medium for any run longer than a few meters. Two main types:

Single-mode fiber (SMF): Thin core, laser light, used for long distances from 100m to 80km. Used for datacenter-to-datacenter interconnects.

Multi-mode fiber (MMF): Wider core, cheaper, used for runs up to around 300m within a facility.

DAC (Direct Attach Copper): Twinaxial copper cable with transceivers bonded to both ends. Cheap and low-latency for runs under 5 meters. Ubiquitous for server-to-switch connections inside a rack.

Transceivers (SFP, QSFP, OSFP): Hot-pluggable modules that convert electrical signals to optical. The form factor defines the speed. SFP+ handles 10G, QSFP28 handles 100G, and QSFP-DD handles 400G.

The Network Hierarchy

Top-of-Rack (ToR) switches: Sit at the top of each rack. Every server in the rack connects here, and these switches then uplink to the spine layer at higher speeds.

Leaf-spine fabric: The dominant architecture inside modern datacenters, and important enough to deserve its own explanation.

The old three-tier network (access, aggregation, core) was designed for north-south traffic, meaning data flowing in and out of the datacenter to users on the internet. This worked fine for web applications. But modern workloads like distributed databases, AI training jobs, and microservices generate enormous east-west traffic, meaning server talking to server, rack talking to rack, all inside the datacenter. The three-tier model choked on this because every packet had to travel up through aggregation and back down, creating bottlenecks at the core.

Leaf-spine solves this by having every leaf switch connect directly to every spine switch. Any server can reach any other server in exactly two hops (leaf to spine to leaf) regardless of physical location. Latency is predictable and uniform. Bandwidth scales by adding spine switches rather than replacing the entire core.

The network uses ECMP (Equal-Cost Multi-Path routing) to load-balance traffic across all spine switches simultaneously. There is no Spanning Tree Protocol blocking half your links. All links are active at all times.

Border leaf / edge routers: The point where the internal fabric connects to the outside world, whether that is the internet, WAN circuits, or dedicated interconnects to other datacenters.

Out-of-band (OOB) management network: A completely separate network used only for console access and device management. If the production network fails entirely, operators can still access every device via the OOB network. Non-negotiable in any critical facility.

Why This Matters for AI

Training a large language model on 10,000 GPUs means every GPU must exchange gradient data with every other GPU at every training step. This collective communication pattern called all-reduce demands extremely high bisection bandwidth across the entire network fabric. It is why AI clusters use InfiniBand, which is a high-performance interconnect purpose-built for this pattern, rather than standard Ethernet. NVIDIA's acquisition of Mellanox in 2020 for $6.9B was almost entirely about owning this technology.

Section 5: Compute Infrastructure

Server Anatomy

A modern datacenter server is designed from the ground up for three things: continuous operation, remote management with no keyboard or monitor ever needed, and high density at 40+ servers per rack.

Key components:

CPUs: Modern datacenter CPUs (Intel Xeon, AMD EPYC) have 32-192 cores and support NUMA, which stands for Non-Uniform Memory Access. In a multi-socket server, each CPU socket has its own local memory pool, and a high-speed interconnect bridges the sockets. Software has to be NUMA-aware to get maximum performance because accessing memory on the "remote" socket is slower than accessing local memory.

ECC Memory: Servers use Error-Correcting Code (ECC) memory that detects and corrects single-bit memory errors automatically. In a regular desktop, a bit flip might cause a crash. In a production database, it could corrupt a transaction. ECC is non-negotiable in critical systems.

NVMe SSDs: Storage inside servers is almost entirely NVMe now, connected directly over PCIe rather than through the old SATA bus. PCIe Gen 4 NVMe drives deliver 7+ GB/s sequential read, which is roughly 14x faster than a SATA SSD.

Dual PSUs: Every server has two hot-swap power supplies, each capable of running the full server load independently. This maps directly to the dual-corded PDU architecture from Section 2.

GPUs and Accelerators

GPUs have become the primary compute engine for AI workloads. A CPU has 32-192 cores optimized for complex sequential tasks. A GPU has 10,000-18,000 simpler cores optimized for parallel floating-point math. Neural network training is fundamentally a massive parallel matrix multiplication problem, which makes GPUs architecturally ideal for it.

NVIDIA's current datacenter GPU lineup:

H100 (2022): 700W TDP, 80 GB HBM3 memory, 3.35 TB/s memory bandwidth. The dominant GPU for GPT-4 era training.

H200 (2024): Same die as the H100 but with 141 GB HBM3e memory and 4.8 TB/s bandwidth, which is critical for running larger models.

B200 (Blackwell, 2025): 1,000W+, 192 GB HBM3e. Requires liquid cooling as a hardware requirement.

Other Relevant Technology:

NVLink: NVIDIA's proprietary high-speed interconnect between GPUs in the same server. NVLink 4.0 on the H100 provides 900 GB/s bidirectional bandwidth between GPUs, compared to around 128 GB/s over PCIe. Within a server, NVSwitch allows all-to-all communication between all 8 GPUs at full NVLink speed.

HBM (High Bandwidth Memory): Unlike conventional memory, HBM is stacked vertically and placed directly adjacent to the GPU die using a silicon interposer. The H200's HBM3e delivers 4.8 TB/s of bandwidth versus around 100 GB/s for typical system RAM. This bandwidth is what makes running large model inference feasible at all.

Storage Infrastructure

Direct-attached storage (DAS): SSDs inside the server itself. Lowest latency, but not shared between servers.

SAN (Storage Area Network): A dedicated high-speed network connecting servers to shared storage arrays. Accessed as block storage, meaning the server sees it as if it were a local drive. Used heavily in virtualization environments.

Object storage: Files stored as objects with unique IDs and accessed via HTTP APIs. Amazon S3 is the canonical example. Massively scalable into the exabyte range, cheap, but higher latency than block storage. Used for backups, ML training datasets, logs, and media.

Section 6: Software, Operations, and Management

How Operators Actually See the Datacenter

DCIM (Datacenter Infrastructure Management): Software that provides a unified view of physical assets, power draw, cooling status, and capacity across the entire facility. Think of it as the datacenter's ERP system. It integrates power meters on PDUs, temperature sensors on racks, camera systems, and asset tracking into one interface. Major vendors include Schneider Electric EcoStruxure and Vertiv.

BMC (Baseboard Management Controller): A dedicated microcontroller embedded on every server motherboard. Even if the server is completely powered off or its OS has crashed, the BMC is still running on standby power. It provides remote console access, power cycling, hardware health monitoring (temperatures, fan speeds, memory error counts), and firmware updates, all without touching the production network. The modern standard for BMC communication is called Redfish, which is a REST API that replaced the older IPMI protocol.

Virtualization

Rather than running one application per physical server, which is wasteful because most servers sit at 10-15% CPU utilization, virtualization lets you run many virtual machines on a single physical host.

A hypervisor is the software layer that makes this possible. Type 1 (bare-metal) hypervisors run directly on hardware and are used in production datacenters. The main ones are VMware ESXi, Microsoft Hyper-V, and KVM (Linux Kernel-based Virtual Machine), which powers most cloud VMs. Type 2 hypervisors run on top of a host OS and are for development use only.

Key virtualization capabilities that matter in production:

vCPU oversubscription: Assigning more virtual CPUs than physical cores exist. Safe at roughly 4:1 for typical workloads, but never do this with GPU workloads.

Live migration: Moving a running VM from one physical host to another with less than 1ms of downtime. This is how maintenance windows happen without service interruption.

Containers and Kubernetes

Containers (Docker) package an application and all its dependencies into a portable image that shares the host OS kernel. They start in milliseconds and are ephemeral by design. Kubernetes is the orchestration system that manages containers at scale, scheduling them onto servers, restarting them when they crash, managing networking between them, and handling rolling deployments.

The most basic units to understand are the following:

Resource requests and limits: Every container declares how much CPU and memory it needs. Kubernetes uses this to make placement decisions. It is essentially a bin-packing problem.

GPU scheduling: NVIDIA's Kubernetes device plugin exposes GPUs as schedulable resources. MIG (Multi-Instance GPU) allows a single H100 to be partitioned into up to 7 independent GPU instances, each serving a different workload.

Monitoring and Observability

The three pillars of modern observability:

Metrics: Numerical time-series data including CPU utilization, memory usage, request rate, and error rate. Tools include Prometheus for collection and Grafana for visualization.

Logs: Timestamped text records of events including application logs, syslog, and security audit logs. Tools include the ELK stack (Elasticsearch and Kibana) and Splunk.

Traces: Records of a request's journey through distributed services, used to diagnose where latency is coming from in a microservices architecture. Tools include Jaeger and Datadog APM.

SLA vs SLO vs Error Budget:

SLA is a contractual promise to customers: "99.9% uptime or we credit your bill."

SLO is an internal target: "We aim for 99.95% uptime."

Error budget is the allowable downtime within the SLO. At 99.9% you have 8.7 hours per year. When you have burned through that budget, you stop shipping new changes until it refills.

Section 7: The AI Era

Everything in the previous sections was designed around CPU servers drawing 500W to 1kW per server. AI training clusters have changed virtually every assumption.

The Scale of Modern AI Training looks something like this:

GPT-3 (2020): roughly 10,000 V100 GPUs

GPT-4 (estimated): roughly 25,000 A100 GPUs

Meta LLaMA 3 (2024): 16,000 H100 GPUs

Microsoft committed $80B to datacenter CapEx in FY2025 alone

These clusters require dedicated power circuits, purpose-built liquid cooling, InfiniBand networks operating at 400G, and storage systems capable of streaming terabytes per second of training data to keep the GPUs fed.

So what actually changes at GPU scale?

Power density: Standard CPU racks run at 5-10 kW. GPU racks run at 30-80 kW. NVIDIA's GB200 NVL72 rack, which contains 72 Blackwell GPUs in a single cabinet, draws approximately 120 kW. Air cooling is physically insufficient at this density. Liquid cooling is not optional.

Reliability: At 10,000 GPUs, hardware failures become a daily occurrence rather than a monthly one. AI training frameworks like PyTorch and JAX must implement checkpoint-and-resume to survive this. If a GPU fails 20 hours into a 7-day training run, you recover from the last checkpoint rather than starting over.

Network: GPU training uses collective communication patterns where every GPU exchanges gradient tensors with every other GPU at every training step. At 10,000 GPUs this is a global all-to-all communication problem, and the network becomes the bottleneck rather than the compute. This is why InfiniBand dominates AI clusters despite Ethernet being cheaper.

The Infrastructure Arms Race

The AI compute buildout is the largest capital investment cycle in the history of the technology industry.

Microsoft: roughly $80B in 2025

Amazon: roughly $75B in 2024

Google: roughly $75B in 2025

Meta: roughly $60-65B in 2025

What are a few emerging technologies to look at, i.e. what will be big in the future?

Optical switching: Silicon photonics that route light signals without converting to electrical. Near-zero latency and massive bandwidth. Google demonstrated petabit-scale optical circuit switching in 2024.

Co-packaged optics (CPO): Integrating optical transceivers directly onto the switch chip package, eliminating pluggable transceivers entirely. Reduces power and improves signal integrity at 800G+ speeds.

Chiplets: GPUs and CPUs assembled from multiple smaller dies connected by high-bandwidth die-to-die interconnects. The NVIDIA GB200 and AMD MI300X are early production examples. This approach allows chip designers to mix process nodes and combine specialized dies (memory, compute, I/O) into one package.

Ending

I hope you enjoyed learning about datacenters as much as I did. As much as we talk about them, my fear is that much of the discourse abstracts the datacenter as an investment and less of a technology. However, there is a whole group of people thinking about these technologies very deeply - the work that these companies are doing is the work of the future!

What is a Datacenter?