Infrastructure that understands AI workloads
General-purpose infrastructure was not designed for AI. GPU clusters require different scheduling primitives. Training jobs have different I/O patterns than databases. Inference serving has different latency requirements than batch processing.
Fabric is an infrastructure platform built from first principles for AI — where every component is designed around the performance characteristics, failure modes, and operational patterns specific to AI workloads.
It sits below the model and above the hardware — abstracting physical resources into a programmable, observable, and resilient infrastructure substrate that AI systems can depend on.
The infrastructure gap in the AI stack
Seven integrated subsystems
AI Compute
Orchestrates heterogeneous compute resources — GPU, CPU, and accelerator pools — with workload-aware scheduling. Abstracts hardware topology to present a unified compute surface to AI workloads regardless of underlying infrastructure.
- Multi-vendor GPU orchestration
- NUMA-aware placement
- Preemptive scheduling with priority tiers
- Hardware topology discovery
Storage Fabric
A distributed storage layer purpose-built for AI data flows. Eliminates I/O bottlenecks during training by co-locating data with compute, supporting streaming ingestion, and providing deterministic read latency at scale.
- Distributed object storage
- Training-optimized I/O pipeline
- Tiered caching (NVMe → DRAM → remote)
- Checkpointing and snapshot management
Network Fabric
High-throughput, low-latency networking infrastructure for distributed AI. Manages collective communication patterns (AllReduce, AllGather) and provides congestion control tuned for gradient synchronization workloads.
- RDMA-aware routing
- Collective communication primitives
- Bandwidth-aware job placement
- Network topology modeling
AI Scheduler
A workload scheduler designed for the unique characteristics of AI jobs — long-running, gang-scheduled, and sensitive to resource fragmentation. Implements backfill scheduling, gang admission control, and preemption policies.
- Gang scheduling with backfill
- Priority queues and fairshare
- Spot/preemptible workload support
- Multi-tenant isolation
Inference Gateway
A high-performance serving layer for AI model inference. Handles request routing, batching, model versioning, and autoscaling. Designed for sub-10ms p99 latency at sustained throughput across model sizes from 7B to 700B parameters.
- Dynamic batching and continuous batching
- KV-cache memory management
- Multi-model serving
- Autoscaling with cold-start mitigation
Observability
Full-stack observability for AI infrastructure — GPU utilization, memory pressure, training throughput, and job lifecycle events. Provides the telemetry required to diagnose performance regressions and infrastructure anomalies.
- GPU/CPU/memory telemetry
- Training metrics pipeline
- Distributed tracing for inference
- Alerting and anomaly detection
Security Layer
Security primitives for multi-tenant AI infrastructure — workload isolation, secrets management, network policy enforcement, and audit logging. Designed for enterprise compliance requirements without compromising performance.
- Workload identity and isolation
- Secrets and credential management
- Network policy enforcement
- Audit logging and compliance
What Fabric enables
Large-scale model training
Fabric coordinates compute, storage, and network resources for distributed training runs — from single-node fine-tuning to multi-cluster pre-training across thousands of GPUs. The scheduler ensures optimal placement, the storage layer eliminates data starvation, and the network fabric minimizes gradient synchronization overhead.
High-throughput inference serving
The Inference Gateway handles model serving at production scale — routing requests, managing KV-cache memory, batching dynamically, and scaling replicas based on load. Designed for organizations running multiple models across diverse hardware configurations.
ML research infrastructure
Research teams need infrastructure that supports rapid iteration — fast dataset access, reproducible environments, experiment tracking, and efficient use of shared compute. Fabric provides the substrate for research infrastructure without requiring each team to build their own.
Multi-tenant AI platforms
Enterprises building internal AI platforms need isolation, fairshare scheduling, cost attribution, and policy enforcement across teams. Fabric provides the infrastructure primitives to build multi-tenant AI compute platforms that scale from tens to thousands of users.
Design philosophy
Performance first
Every component is designed around throughput and latency requirements — not general-purpose compute adapted for AI workloads.
Composable architecture
Platform components are independently deployable and composable. Adopt what you need without taking on the full stack.
Hardware agnostic
Fabric abstracts away underlying hardware vendors. Support NVIDIA, AMD, Intel, and custom accelerators through a unified API.
Research-validated
Architecture decisions are grounded in systems research. Every design choice is validated against published work and empirical benchmarks.
Observable by default
Observability is not an afterthought. Every component emits structured telemetry from day one.
Operational simplicity
Infrastructure that requires constant human intervention doesn't scale. Fabric aims for autonomous steady-state operation.
Building in phases
Active research areas
Fabric is not a product announcement. It is an active research and architecture project. The following areas are under active investigation, with published literature informing every design decision.
View full research agendaCXL-based disaggregated memory for AI inference — enabling independent memory and compute scaling across GPU clusters.
Software-defined memory management for LLM inference — paged attention, cache eviction policies, and memory tiering.
Fast, consistent checkpoint protocols for large model training — minimizing recovery time from hardware failures.
Placement algorithms that model network topology — minimizing collective communication overhead in distributed training.
Data pipeline acceleration for ML training — streaming, prefetching, and caching strategies to eliminate GPU starvation.
Shape the future of AI infrastructure
We are actively looking for infrastructure engineers, AI systems researchers, and enterprise partners who want to build the next generation of AI infrastructure together.