Table of Contents

15 mins

Wholesale Colocation for Private Cloud and GPU Infrastructure: When It Makes Financial Sense

The quarterly cloud invoice arrives, and the number is no longer a surprise. It is a problem. For a VP of Infrastructure managing a GPU-intensive AI training environment, the bill reflects something public cloud was never designed to efficiently serve: sustained, high-utilization workloads running around the clock, at densities that keep climbing. The per-GPU-hour rate looked manageable at first. At scale and at 85 percent utilization, it compounds into a figure that is difficult to defend to a CFO.

Wholesale colocation for private cloud and GPU infrastructure blog post cover — high-density GPU server racks with fiber optic cables, published by Inflect

Global enterprise spending on AI infrastructure is projected to surpass $200 billion annually by 2025 (Source: IDC, Worldwide AI and Generative AI Spending Guide, 2024, idc.com). For infrastructure and procurement leaders, that scale underscores a broader shift: once GPU utilization is consistently high, public cloud economics often become less attractive than wholesale colocation for private cloud and AI deployments.

What Is Wholesale Colocation and Why Do GPU and Private Cloud Workloads Outgrow Retail?

How Wholesale Colocation Works and How It Differs from Retail

Wholesale colocation is a data center leasing model in which a tenant secures dedicated power, typically one megawatt or above, along with a private suite, hall, or building, under a long-term contract running five to fifteen years. Unlike retail colocation, where the provider manages shared facility infrastructure to a defined SLA, wholesale tenants operate their own power distribution, cooling, and physical environment within their space. The provider delivers the shell: power to the suite, cooling capacity, physical security, and building connectivity.

Retail vs. Wholesale Colocation: Which Model Fits Which Workload and Why

Retail and wholesale colocation serve fundamentally different buyer profiles, and the decision turns on four variables: power footprint size, workload density, duration of commitment, and operational maturity. Retail suits organizations needing a few hundred kilowatts or less who want managed services and provider-operated facility infrastructure. Wholesale suits organizations with a stable requirement of one megawatt or more that are comfortable operating their own infrastructure and willing to commit to a longer term for significantly lower cost per kilowatt.

What Private Cloud Infrastructure Requires from a Wholesale Facility at Baseline

Private cloud deployments in a wholesale colocation environment require four baseline facility capabilities: N+1 or 2N power redundancy with UPS and generator backup, mechanical cooling sized for the planned power density, physical access controls meeting at minimum SOC 2 Type II standards, and a carrier-neutral interconnect environment. These are not differentiators among wholesale providers; they are table stakes. The evaluation begins after confirming all four are present.

How GPU Clusters Push Those Baseline Requirements Further

GPU infrastructure adds three requirements on top of the private cloud baseline that not every wholesale facility can meet: power density above 30 kilowatts per rack, cooling capable of handling that density at sustained load, and a low-latency interconnect fabric for GPU-to-GPU communication across the cluster. This applies equally to training large language models and running production AI models. Air-cooled facilities designed for 8 to 12 kilowatts per rack cannot support modern GPU deployments without significant retrofit. These requirements narrow the field of viable wholesale providers considerably and make facility qualification a critical early step.

The 60–70% Utilization Heuristic: A Quick Gut-Check Before You Build the Full Model

A practical rule of thumb used by infrastructure buyers before commissioning a full TCO analysis: if GPU workloads run at sustained utilization above 60 to 70 percent, wholesale colocation is likely to undercut public cloud on a per-GPU-hour basis and a full model is worth building. This is a heuristic, not a financial model, and it does not account for egress costs, hardware amortization, or operational staffing. It answers one question: is this worth analyzing further?

The Financial Case for Wholesale Colocation Over Public Cloud at Scale

The Utilization Rate Where Wholesale Colo Beats Public Cloud

Wholesale colocation begins to undercut public cloud on per-GPU-hour cost when sustained cluster utilization exceeds approximately 60 to 75 percent. For an illustrative 8-rack GPU cluster at 30 kilowatts per rack running at 90 percent utilization: an H100-equivalent instance on a major cloud provider runs approximately $1.20 to $2.50 per GPU-hour on one-year reserved pricing (illustrative; verify current rates directly with provider), putting annual compute spend for a 64-GPU cluster near $1.1 million. In a wholesale colocation environment with owned hardware, the same cluster's annual cost covering lease, power, and five-year hardware amortization typically falls in the $600,000 to $850,000 range (illustrative; dependent on market, hardware, and power rate).

The Hidden Costs That Make Public Cloud More Expensive Than It Looks

Public cloud costs for large-scale GPU workloads are often understated by the compute line item alone because egress fees, software licensing, and capacity overprovisioning can all add materially to total spend (Source: Cloudflare, How cloud egress fees will challenge the future of AI, 2023, cloudfare.com). At AI training data volumes, egress charges can become a meaningful budget line, and the combined impact of these costs can substantially raise the real cost of public cloud depending on the workload profile.

CapEx vs. OpEx: How Wholesale Colo Changes the Infrastructure Balance Sheet

Wholesale colocation creates a split budget treatment: the lease is an operating expense, while owned hardware is a capital expenditure depreciated over three to five years. This gives finance teams flexibility in how the total investment is presented and planned. For organizations that moved fully to public cloud to eliminate CapEx, re-introducing hardware ownership is a real consideration that should involve treasury and accounting from the start.

How to Build a Total Cost of Ownership Model for Wholesale GPU Infrastructure


Infographic showing the wholesale colocation TCO formula for GPU infrastructure — cost per GPU-hour across 60%, 75%, and 90% utilization scenarios compared to cloud reserved pricing, with crossover point at 65–75% utilization, published by Inflect

A complete TCO model for wholesale GPU infrastructure uses the following formula: Total Cost = (Annual Lease + Annual Power + Hardware Amortization + Annual Network + Annual Operations) divided by Effective GPU Hours, where Effective GPU Hours = Installed Capacity x Hours in Period x Utilization Rate x Availability Rate. Organizations evaluating hardware financing should add annual cost of capital to the numerator. Build the model at three utilization scenarios, 60 percent, 75 percent, and 90 percent, to define the risk envelope. The availability rate, typically 0.98 to 0.995 for a well-operated facility, accounts for planned and unplanned downtime and directly affects high availability commitments to internal stakeholders.

What Wholesale GPU Deployments Actually Look Like

From 2 MW to 20 MW: Common Deployment Profiles

Wholesale GPU and private cloud deployments span three common profiles: a 2 megawatt private cloud consolidation supporting 150 to 200 standard compute racks at 8 to 12 kilowatts each, the typical entry point for smaller deployments; a 10 megawatt AI training cluster implying approximately 300 to 400 high-density GPU racks at 25 to 35 kilowatts each; and a 20 megawatt hyperscale-adjacent deployment requiring dedicated halls or buildings, where the facility negotiation resembles a real estate transaction. Each step up in scale introduces operational complexity that informs the readiness assessment in Section 5.

How to Design a Private Cloud Network Inside a Wholesale Colo Facility

Private cloud network design in a wholesale colocation environment follows a spine-leaf topology, with spine switches providing high-bandwidth east-west fabric between leaf switches and leaf switches connecting directly to compute racks. The tenant controls the entire fabric, enabling more aggressive oversubscription ratios without the multi-tenant constraints of retail colocation. For AI and machine learning workloads, the fabric must support non-blocking bandwidth at full line rate between GPU racks, which drives switch selection and cabling requirements from the planning stage.

Bare Metal Density and Rack Power Planning for Private Cloud Workloads

Bare metal server density for private cloud workloads in wholesale colocation typically ranges from 8 to 15 kilowatts per rack for standard enterprise compute and 15 to 20 kilowatts for memory-optimized workloads, with 80 percent of rated capacity as the operational ceiling and the remaining 20 percent held as headroom. Power is delivered via redundant A and B feeds at the rated amperage per rack. Over-provisioning power at the planning stage is significantly easier than renegotiating capacity mid-lease.

How GPU Clusters Are Physically Deployed in Wholesale Colocation

GPU cluster deployment involves three physical infrastructure layers beyond standard compute: a high-speed interconnect fabric for GPU-to-GPU communication, a power distribution system delivering 25 to 40 kilowatts per rack at sustained load, and cooling matched to that density. NVLink connects GPUs within a server; InfiniBand or 400G Ethernet connects servers within the cluster for RDMA traffic during distributed training. Rack layout follows cluster topology, placing frequently communicating GPUs in adjacent racks, which drives aisle and floor loading requirements to confirm with the facility before deployment.

Cooling Architecture Options for High-Density GPU Workloads

High-density GPU workloads in wholesale colocation are served by three cooling architectures: air cooling with hot-aisle and cold-aisle containment supporting up to 15 to 20 kilowatts per rack; rear-door heat exchangers supporting 20 to 30 kilowatts per rack by transferring heat to facility chilled water; and direct liquid cooling, which supports 40 kilowatts per rack and above via cold plates at the chip level, required for the highest-density AI accelerator deployments. Hot spots are a common failure point when air-cooled GPU environments exceed design density limits; direct liquid cooling eliminates this risk at the rack level. Buyers should confirm the cooling architecture in their specific suite, not the building-wide average.

How to Evaluate and Contract Wholesale Colocation

Five Facility Criteria for GPU-Capable Wholesale Colocation

The five verifiable facility criteria for GPU-capable wholesale colocation are: the facility supports 25 kilowatts per rack or above in the specific suite; the provider offers liquid cooling capability or supports tenant-installed DLC retrofits; the building is carrier-neutral with at least two independent fiber paths and a meet-me room; physical security meets SOC 2 Type II at minimum; and the facility holds the compliance certifications the workload requires, including ISO 27001, PCI DSS, or HIPAA physical safeguards for sensitive data environments. Ask for suite-specific documentation on power density and cooling capacity, not building averages.

Interconnect and Network Access: What to Require and What to Avoid

Wholesale colocation contracts should require carrier-neutral access with at least two independent fiber providers, documented path diversity between the meet-me room and the suite, and the right to connect to any carrier or IX in the building without a cross-connect exclusivity clause. Avoid contracts that require all connectivity to be purchased through the provider at provider-set rates, buildings with a single fiber provider, and agreements that do not specify the fiber path to the suite. For AI workloads with significant data ingestion requirements, a restrictive interconnect environment can impose ongoing costs that were not visible during initial pricing.

Lease Terms, Expansion Rights, and Exit Provisions: Where Deals Go Wrong

The commercial terms that most frequently create problems in wholesale agreements are overcommitment of power without expansion optionality, no exit mechanism before year five, and renewal terms that allow repricing to market without a cap. Ask for a right of first offer on adjacent space or power, and a termination for convenience clause with a modeled penalty structure. The overcommitment risk is concrete: a 10 megawatt commitment at 60 percent utilization leaves 4 megawatts of stranded power appearing directly in the TCO model. Build the lease commitment around a conservative utilization forecast and grow into capacity via expansion rights.

Geographic Market Selection: Power Cost, Latency, Regulation, and Talent

Wholesale colocation market selection usually turns on four variables: power cost per kilowatt-hour by region, network latency to end users or interconnected systems, regulatory requirements for data residency, and the availability of talent for on-site operations. Electricity rates vary significantly by state, and lower-cost power markets can materially improve the economics of high-density GPU and private cloud deployments (Source: U.S. Energy Information Administration, Electric Power Monthly, 2024, eia.gov). AI training workloads that do not require sub-10ms latency typically have more geographic flexibility than latency-sensitive production applications.

When Wholesale Colocation Is Not the Right Answer

Workloads Still Better Served by Public Cloud or Retail Colocation

Public cloud remains the better answer for three workload categories: variable or bursty GPU workloads where reserved capacity would sit idle at off-peak; development and experimentation environments where workload profiles change frequently; and workloads with a defined short duration, such as a single training run planned over three to six months, where a wholesale lease extends well beyond the workload lifecycle. Disaster recovery workloads requiring standby capacity at low utilization are also better served by retail colocation or cloud than by a wholesale commitment. Retail colocation serves organizations with GPU requirements in the 50 to 500 kilowatt range that want provider-managed facility operations without wholesale commitment.

When a Hybrid Wholesale Colo and Public Cloud Model Makes More Sense

A hybrid model, where a stable base load runs in wholesale colocation and variable burst capacity runs on public cloud, makes sense for organizations with a defined minimum GPU utilization floor alongside periodic peak demand exceeding that floor by 30 percent or more. The base load anchors the TCO model in wholesale economics while the burst component uses on-demand or spot pricing where the flexibility premium is justified. This hybrid approach also provides scalability for teams with unpredictable GPU demand growth. The architecture requires a private network connection via direct connect or private peering to avoid egress costs on data movement between environments.

How to Know If Your Organization Is Operationally Ready for Wholesale Scale

Operational readiness for wholesale colocation requires three capabilities that are consistently underestimated during the financial evaluation: a facilities or data center operations function capable of managing power, cooling, and physical infrastructure; a network team capable of designing and operating the internal fabric and external connectivity; and a procurement and legal function with experience negotiating long-term real estate-style contracts. Organizations that overstate readiness, commit to a large lease, and struggle to deploy workloads at the committed pace end up paying for stranded capacity. A conservative initial commitment with expansion rights is a lower-risk entry point.

The Decision Framework: When to Start Evaluating Wholesale Now

Organizations evaluating wholesale colocation fall into three profiles that each point to a clear next action. If GPU or private cloud workloads run at sustained utilization above 65 percent, the power requirement exceeds one megawatt, and the need has a duration of three years or more: build the TCO model, qualify providers against the five criteria, and run a parallel pricing comparison before entering any exclusive negotiation. If utilization sits between 40 and 65 percent: build the model and set a trigger to re-evaluate when utilization crosses 65 percent on a rolling 90-day basis. If workloads are variable or the requirement is below 500 kilowatts: stay in retail colocation or public cloud and revisit when scale and utilization conditions are met.

Is Wholesale Colocation Right for Your GPU Infrastructure?

Wholesale colocation makes financial sense for GPU and private cloud workloads when three conditions align: sustained utilization above roughly 65 percent, a power requirement of one megawatt or more, and an infrastructure need lasting three years or longer. The TCO model consistently shows a cost-per-GPU-hour advantage over public cloud above that utilization threshold. Facility qualification requires confirming five suite-specific criteria: per-rack power density, cooling capability, carrier-neutral interconnect, physical security standard, and compliance certifications. For organizations that meet those conditions, Inflect provides instant pricing across 6,000+ facilities with free expert advisory at no cost to buyers.

Find Wholesale Colocation Options Across 6,000+ Facilities

Inflect is a digital infrastructure marketplace where wholesale colocation buyers search, compare, and receive instant pricing from providers across more than 6,000 data centers in over 100 countries, without a sales call or RFQ submission. Providers on the platform include Equinix, Digital Realty, QTS, CyrusOne, NTT, Iron Mountain, and Flexential, among hundreds of others. Buyers can search available colocation options by market and power requirement, compare providers against the facility criteria checklist in Section 4, and access free expert advisory on lease structure and market selection at no charge. The instant pricing capability closes the information gap that typically forces buyers into negotiation before they have a complete picture of the market.


Start comparing wholesale colocation options on Inflect:

  • Search by power requirement, market, and facility capability across 6,000+ data centers in 100+ countries

  • Receive instant pricing from Equinix, Digital Realty, QTS, CyrusOne, NTT, Iron Mountain, and more, with no sales call required

  • Access free expert advisory on facility qualification, lease structure, and market selection at no charge to buyers

  • Compare wholesale options across multiple geographies simultaneously before entering any negotiation

Table of Contents

About the Author

Haley Rogers

Content & Social Media Specialist

Haley Rogers is the Content & Social Media Specialist at Inflect, bringing over two years of experience in social media, marketing, and content strategy — including time at a fast-paced tech company before joining the Inflect team. She specializes in translating complex digital infrastructure topics into clear, engaging content, with a particular focus on blog writing and brand storytelling across channels.

Join 1000+ Industry Pros Who’ve Subscribed

Stay ahead of the curve.

Get the latest digital infrastructure news delivered to your inbox.

Join 1000+ Industry Pros Who’ve Subscribed

Stay ahead of the curve.

Get the latest digital infrastructure news delivered to your inbox.

Join 1000+ Industry Pros Who’ve Subscribed

Stay ahead of the curve.

Get the latest digital infrastructure news delivered to your inbox.