remote-lab.net learn by doing

Enabling GPUDirect RDMA on GeForce GPUs

In the previous post I built a software congestion point using a Linux bridge with TC RED to exercise the full DCQCN feedback loop on my point-to-point lab. That experiment successfully triggered ECN marking and CNP generation but ended with a gap: no PFC. Priority Flow Control is the other half of lossless RoCEv2, the L2 backstop mechanism that prevents packet drops when DCQCN can’t reduce rates fast enough. Without PFC the congestion control picture was incomplete.

The Poor Man's Congestion Point: DCQCN with TC RED and a Linux Bridge

In the RDMA lab I built a two-node setup using dual-port ConnectX-4 cards with PF passthrough, connected point-to-point over fiber. That setup is great for simple RDMA performance testing but I wanted to go further and explore how congestion control works in RoCEv2 networks.

Building a Multi-Node RDMA Lab on a Single Machine

After my EPYC workstation I wanted to go further: experiment with multi-node GPU networking and learn RDMA, RoCE and the networking technologies behind today’s AI datacenters. My initial plan was ambitious but so were the prices. Even a small, already outdated setup was prohibitively expensive. Here’s how I ended up achieving similar learning goals for much less.

Building a Multi-GPU Workstation for My Home Lab

After running a power-efficient home lab setup for quite some time, I recently decided it was time for an upgrade. My existing machines were chosen primarily for their small form factors and energy efficiency, but new requirements—particularly the need for GPU acceleration—meant it was time to think bigger. Here’s how I approached building a more powerful workstation while maintaining a practical balance of performance, cost and expandability.

Installing Single Node OKD on a KVM virtual machine

Spring has finally arrived and with it came the perfect opportunity for some home lab maintenance. After running my home lab OKD instance reliably for over 2 years, I recently performed a reinstall and decided to document the entire process. Since the previous setup served me well, I wanted to capture all the detailed steps involved in getting a fresh OKD environment up and running on a KVM virtual machine.