25 Gbit/s at home, part 1

Vladimir Smirnov
19 min readSep 7, 2023

--

Background

I live in a small rural town in Switzerland, and I've been a client of init7 (ISP) for a while now. That ISP is very well known as the only one that started to advertise a 25 Gbit/s connection available for private customers a few years ago; however, it takes time for them to upgrade equipment on POPs and, in my particular case, they actually needed to open a POP nearby (In that case they have some sharing agreement with another ISP and capacity, as well as other features, are limited — it is still FTTH and not a GPON, but speed is limited to 1 Gbps for home user). But in 2023, they've announced that soon there will be a POP close by, and they'll move all users to their main product (fiber7), and as that will be a new POP, 25 Gbps will be available immediately.

Photo of mountaines that I took this summer. This is a place on a hiking road from Rigi Kulm towards Kräbel Gondola station.
Have I mentioned that photography is my hobby?

I work as an SRE at Google Switzerland and have some Networking experience. Still, I only have a little hands-on experience besides relatively dull things at home, so I've decided that this is the chance for me to get some, even though 25Gbps at home is absolute overkill. Here, I'll describe my journey and thought process with that project. There will be at least one cat in this series of posts, some photos I took (including photos of hardware), and, of course, the story.

Planning

Learning from other people's experience

After some searching, I found a series of blog posts by Michael Stapelberg about his router from a PC and his experience with the Mikrotik CCR2004 25 Gbit/s router. However, the last one doesn't include any performance test to show how it works with 25 Gbps networks. However, on Mikrotik forums, people write all kinds of posts, including those about performance; let me quote some information from the thread:

When I first got my 2004’s, routing performance seemed limited to 2–3Gbps, a far cry from the potential 20–40Gbps mentioned on the website

But that was at first, the same user also wrote:

At around 19Gbps both 2004’s hit over the 90% mark, regardless of UDP or TCP traffic. Using the built-in speed tests on the 2116’s, the TCP tests ranged from 4Gbps one-way to 8.8Gbps, with a very rare 9.2Gbps on occasion. There were zero firewall rules, queues, or anything else in all four routers, simply IP addresses and a couple of static routes.

That sounds good, right? 19 Gbps is okay; maybe I don't need to bother building my own router, right? I can leave with the problems that Michael described.

But wait…

On 7.6 and 7.7beta9, my throughput tests were much lower, like 6–7Gbps instead of 8–9Gbps per port. So while 7.7 claims to have fixed support for the various SFP+ rates, there’s a performance hit somewhere else.

And another user says on the same thread:

Locally I get max 7 Gbps, and against the init7 iperf3 server (25G connection!), about 5.5 Gbps.

And that is a router that costs 500 CHF in Switzerland (~570$ as of the time of writing). People over the internet are discussing other options, and the conclusion is that if you want to have reliable 20+ Gbps, you need to go for CCR2216, which has an MSRP of 2795$ and, not surprisingly, in Switzerland, that would be around 2420 CHF (2760$). While specs are impressive (16-core ARM CPU with 2 GHz single-core clocks, for example), the price is not very convincing.

And so I've decided to build my own router.

Gathering requirements

Thinking about what I want my router to do is an essential first step, as I can see what I need to buy. After a bit of thinking, here is the list I've gathered:

  1. It should be able to route ≥ 20 Gbps
  2. It should have at least 1 SFP28 for input and at least 2 SFP28 for output, but ideally, that should be 1 SFP28 in and 4 SFP28 out (a total of 5 in an ideal case)
  3. I should be able to have one port that has decent 10 Gbps performance, SFP+ or RJ45 — it doesn't matter that much.
  4. It should be quiet as it will be in a living room (the place where I have fibre coming out of my wall)
  5. Ideally, it should have low power consumption.
  6. Ideally, I should be able to reuse leftover hardware I have.

In terms of leftover hardware, since the last PC upgrade, I have an entire Ryzen 3000 system without a case and disks sitting on a shelf (Ryzen 3900X, 32 GB of RAM, MSI X570 Ace as the motherboard and relatively old AIO water cooler, I also have a bunch of old graphics cards around).

Choosing network cards

Backpanel of a router, close up, it shows SFP+ RJ45 with cat6e attached to it and SFP28 with fiber. Below — PiKVM4 addon panel but without any cables attached to it, and below, barely visible, another network card with DAC cable attached to it (second SFP28 port is free).
SFP+ RJ45, SFP28 with fibre attached.

A few words about PCIe

As we are talking about bandwidth — let's do some math to check what we'll need in terms of PCIe bandwidth.

If you want to know more about PCIe (and in simple words), I recommend a video by Gynvael Coldwind called PCI Express To Hell. But the vital thing to remember is that PCIe bandwidth for each lane is limited by a version that both the motherboard and the device support. Then, you can safely multiply single-lane bandwidth by the number of lanes the device or slot provides (the lowest number of those two). As price matters, I won't consider very new cards, so we can forget about PCIe 5.0 here (also, only Ryzen 7000 and Core 12th gen or newer might have support for that). Most of the network cards I'll consider will be relatively old, so the prime candidate here is PCIe 3.0, but there is a chance that I'll get something a bit newer — so I need to keep in mind the specs of PCIe 4.0.

For PCIe 3.0, single lane bandwidth is 8 GT/s (you can roughly think that it is 8 Gbps per lane, but there is overhead, so, in reality, it is a bit smaller, closer to 7.5 Gbps), and for PCIe 4.0 — that is 16 GT/s (the overhead is still there, so I will approximate that by 15 Gbps per lane).

Another thing to keep in mind is that desktop motherboards and CPUs support roughly 20–24 PCIe lanes in total, so it is even theoretically impossible to get more than ~360 Gbps. But there is a caveat — the slot can be (I'm talking here only about bandwidth, not physical parameters) x16, x8, x4, x2 or x1. And in reality, x2 is extremely rare. So even if the card can't consume all 8 lanes, it will still cost us all 8 if inserted into a slot that works in x8 mode.

A single 25 Gbps network port will need about 3.34 PCIe 3.0 lanes (rounding that up to 4), and in the case of PCIe 4.0 — half of that (rounding that up to 2). So, in the case of PCIe 3.0, it is possible to have 2 cards in x8 slots with 2x 25 Gbps ports (SFP28) or 1 card with 4 ports in x16 slot. With PCIe 4.0 (if I find an inexpensive PCIe 4.0 card with these parameters), it is theoretically possible to have 8 ports in an x16 PCIe 4.0 slot or 2x 4 port cards in x8 slots. There are 2xQSFP28 cards for PCIe 4.0; however, you will likely be limited to 4xSFP28 port cards at best.

For cards, it is essential to verify what they can do. Some cards support both x8 PCIe 4.0 and x16 PCIe 3.0, but that is relatively rare, and newer cards (newer in this case = more expensive).

About network cards

After some research, here is what I found online (note: I was searching for a decent quality used hardware here as well, but I would mention it only when the price difference is more than 2x compared to the price of the new — that would allow me to buy even extra pairs of spare cards in case it will stop working and still save a lot of money) and here is the list (I'll be providing both prices in CHF and USD for simplicity):

  1. Broadcom P425G — You don't see it that often as Used on eBay, but there are some, like this one for 195$ + 29.33$ shipping costs + duties(241.60 USD total = 213.63 CHF total, excl. duties)
    The card has 4 SFP28 ports, supports either PCIe 4.0 x8 or PCIe 3.0 x16 (same card), passive cooling, about 19W total power consumption, and supports fancy stuff like RoCE, NVMe-oF offloading. It is, however, designed for classic enterprise with some settings available only in the network card's BIOS (which you need to enter during the boot process).
  2. Intel XXV710–99.99$ + 32.46$ shipping + duties (142.65$ total = 126.07 CHF).
    It has 2 SFP28 ports, PCIe 3.0 x8, and passive cooling. 14.1W power consumption, almost no offloads, and people saying that firmware on those cards is funky and you might need to juggle it to make it work properly (e.g. card might hang if ports are added to a bridge with other cards).
  3. Intel E810–339.26 CHF (incl. VAT) = 384.04$ NEW on Distrelec (there are offers on eBay, but the price difference is insignificant). It is a bit newer than XXV710 and has more offloads (RoCE v2, NVMe-oF and stuff), but power consumption is reported to be 16–20W (so a bit up compared to XXV710), but people complain about the same problems as with XXV710 — problems finding correct firmware.
  4. Mellanox ConnectX-4 LX. The cheapest I could find was on eBay, costing 49.48 GBP (62.7 USD = 55.18 CHF). It is advertised as "brand new, made in China", but I am unsure how true that is.
    The card has 2 SFP28 ports, PCIe 3.0 x8, and passive cooling. Power consumption is tricky as different sources tell different stories, somewhere between 7.8 and 11.1 W.
    It has some essential offloading support, nothing fancy, though.
  5. Mellanox ConnectX-5. The cheapest one I've found was 158$ (=139.88 CHF). That version also has 2xSFP28, and the main difference between this one and ConnectX-4 is support for more offloading, including Open V-Switch. It seems overkill for home use.
  6. Mellanox ConnectX-5 MCX516A-CCAT. That one has 2xQSFP28, which means a potential total bandwidth of 200 Gbps. Its price is 269.99$ + 30.05$ shipping + duties (333.63$ total = 295.36 CHF). There are caveats there…
    First — it is PCIe 3.0 x16, so the total available bandwidth (on the bus) is about 126 Gbps, slightly less than 200. However, some forum threads mention that it is possible to force-flash it into MCX516A-CDAT, which gives it PCIe 4.0 support, as it seems to use the same chip and the identical PCB. It is unknown and not guaranteed to work or be stable after that.
    Second — it is QSFP28. You must use breakout cables unless you want 100 Gbps (over aggregated 4xSFP28). That means it will be a DAC (direct-attached copper) cable or fibre (either with an SFP28 end or just plain fibre). Usually, those cables are designed so that all 4 of your destination devices are in the same place (and in case it is a breakout to SFP28 — you'll have 4 identical SFP28 ends, which limits its compatibility as no SFP+ for you).
  7. Netronome Agilio CX 2x25 GbE. That one costs about 118$ + 27.21$ shipping + duties (156.39$ = 138.42 CHF). It has 2xSFP28 and supports quite a lot of offloads, and what makes it interesting is that it has an SDK that allows you to do eBPF offloading or write processing in P4.

Ultimately, I decided to go for a mix of ConnectX 4-LX and Netronome because the ability to play with P4 sounded so cool that I thought it would be fun to play with it. I bought 6 Mellanoxes (2 into router, 1 into PC, 1 to Thunderbolt -> PCIe for laptop and 2 spares) and 2 Netronomes.

Note about ConnectX

I bought mine from several different sellers (including one I've mentioned here; two others were a bit more expensive) and got a mix of "Made in China" and "Made in Israel" cards as well, and they came with different firmware. One that was "Made in China" came with ancient but original firmware (it seems to be from 2018, around its manufacturing date from the sticker), but one of them was OEM Mellanox for Huawei. The difference is that with the original one, you download mlxup from the Nvidia site, and if you have internet, it updates firmware (also, beware: firmware from 2018 didn't work in all motherboards I've tried — e.x. Desktop with X670E didn't recognise the card until I updated it; I suspect that in 2018, they didn't come with UEFI-native firmware by default as the UEFI version was empty), but that doesn't work with Huawei — product ID was different. There is a trick to force-flash it with original firmware, described in two posts on the servethehome thread.

TLDR: I assume you have something debian-like and downloaded new firmware from the website. It is essential to use mstflint from distro or GitHub instead of Nvidia's because Nvidia seems to be not working with the allow_psid_change flag. The script below is for educational purposes only, and I'm not responsible if you brick your card.

echo "You are about to flash your Mellanox card to a different firmware without any validation. That can cause irreversible damage to your network card or PC, please STOP IF YOU ARE NOT SURE YOU KNOW WHAT YOU ARE DOING AND YOU TAKE FULL RESPONSIBILITY FOR WHAT IS ABOUT TO HAPPEN"
sleep 60
sudo apt install mstflint gawk
NEW_FIRMWARE_BIN="<set this to the path to your new unzipped firmware, bin file>"
# This will get the only ID of the first card, you should modify that if you need another one
PCI_ID=$(sudo lspci | gawk '($0 ~ /ConnectX/ && $1 ~ /\.0$/){print $1}' | head -n 1)
mkdir -p "mellanox_${PCI_ID}_backup"
sudo mstflint -d "${PCI_ID}" query full > "mellanox_${PCI_ID}_backup"/full_query.txt
sudo mstflint -d "${PCI_ID}" hw query > "mellanox_${PCI_ID}_backup"/hw_query.txt
sudo mstflint -d "${PCI_ID}" ri "mellanox_${PCI_ID}_backup"/orig_firmware.bin
sudo mstflint -d "${PCI_ID}" dc "mellanox_${PCI_ID}_backup"/orig_firmware.ini
sudo mstflint -d "${PCI_ID}" -i "${NEW_FIRMWARE_BIN}" -allow_psid_change burn
sudo reboot # alternatively you can try running:
# mstfwreset -d "${PCI_ID}" reset

# Restore old GUID and MAC (can be found in query full)
# that should be the same as above
PCI_ID=$(sudo lspci | gawk '($0 ~ /ConnectX/ && $1 ~ /\.0$/){print $1}' | head -n 1)
GUID=$(gawk '($1 == "Base" && $2 == "GUID:"){print $3}' "mellanox_${PCI_ID}_backup"/full_query.txt)
MAC=$(gawk '($1 == "Base" && $2 == "MAC:"){print $3}' "mellanox_${PCI_ID}_backup"/full_query.txt)
sudo mstflint -d "${PCI_ID}" -guid ${GUID} -mac ${MAC} -ocr sg

And if everything goes well — you'll have a Mellanox card that identifies itself as the original one, and even mlxup will start working.

About SFP connectors

What I've learned can be summarised as "Try to use fibre connectors wherever possible".

With SFP+ RJ45, compatibility is… tricky. I've tried 3 different connectors — fs.com's, ubiquity's and MikroTik's (v2) — The last one didn't want to work in Mellanox card — it just doesn't power up at all without any debug messages (even in debug mode in drivers) or anything. Ubiquity's one worked well, but it is detected as 10G Ethernet, transceiver type: multimode, 50um, Shortwave laser(!) with a wavelength of 850nm(!!) which have 100m(max) length copper cable(!!!). And while that seems to work fine with Mellanox, I can quickly get that it won't start in other cards or devices. Output for such a connector (if you are interested) looks like this:

# ethtool -m enp16s0f1np1
Identifier : 0x03 (SFP)
Extended identifier : 0x04 (GBIC/SFP defined by 2-wire interface ID)
Connector : 0x22 (RJ45)
Transceiver codes : 0x10 0x00 0x00 0x00 0x20 0x40 0x04 0x80 0x00
Transceiver type : 10G Ethernet: 10G Base-SR
Transceiver type : FC: intermediate distance (I)
Transceiver type : FC: Shortwave laser w/o OFC (SN)
Transceiver type : FC: Multimode, 50um (M5)
Transceiver type : FC: 1200 MBytes/sec
Encoding : 0x06 (64B/66B)
BR, Nominal : 10300MBd
Rate identifier : 0x00 (unspecified)
Length (SMF,km) : 0km
Length (SMF) : 0m
Length (50um) : 0m
Length (62.5um) : 0m
Length (Copper) : 100m
Length (OM3) : 0m
Laser wavelength : 850nm
Vendor name : Ubiquiti Inc.
Vendor OUI : 24:5a:4c
Vendor PN : UACC-CM-RJ45-MG
Vendor rev : U07
Option values : 0x00 0x00
BR margin, max : 0%
BR margin, min : 0%
Vendor SN : AK22117511563
Date code : 221119

And the second thing — is power consumption. For some reason, ethtool doesn't report the temperature of SFP+ RJ45. Still, they are known to dissipate roughly 2–3W of power, which is quite a lot for a connector and without active cooling of your card, it will likely overheat under certain conditions.

Other than that, I bought a few SFP28 single-mode fibre modules from fs.com, and they are working fine, as well as SFP28 DACs. However, if you want to support European manufacturers — you might want to consider buying from Flexoptix. Remember that you need to check wavelength for both rx and tx. Whatever is needed on the receiving side will be specified by your ISP; however, for connecting different devices, you will need to swap RX and TX wavelength on the other end. For, you went for TX 1331 and RX 1271 nm on the router, then on NAS, you will need to have TX 1271 and RX 1331 modules. Websites usually will advise you on what to get.

Singlemode vs multimode vs DAC

The maximum length of DAC cables is only about 10 meters (33 feet), and as I needed fibre across my apartment, I went for DAC only for connection between Router and NAS as there is no reason to move them away from each other.

I went for single-mode everywhere for the sole reason of unification, as I already have single-mode fibre as input. Also, in that case, I can change the cables or modules in case of any failures. Single-mode cables are cheaper, but SFP modules are a bit more expensive; single-mode cable is thinner (one cable, while multimode is usually 2 cables attached).

Correction: after I finished this post, people told me that I likely looked at a wrong type of modules/cables and duplex would be cheaper (~60$ per module vs ~90$ per module, and roughly 25$ per 100m of fibre vs 15$ per 100m of fibre) and size-wise there are thin non-splittable cables which are the same size (they split only closer to the end).

Cable types

For multimode cables, you will have different OM specs (OM3, OM4, etc.). Speed and distance will depend on what you will get. OM3 is usually used for 10G for up to 300 meters; it can work for 25Gbps or 40Gbps, but distance will be significantly reduced (to 70–100 meters). In many places, it is mentioned that SFP28 is usually combined with OM4 cables. I haven't tried it, though, as I went for single-mode cables.

For single-mode cables, I went for BiDi SFPs (I am not even sure if you can find non-BiDi SFP28 ones) and simplex cables. Connector type is essential; when connecting 2 devices, you want to get LC UPC / LC UPC ones.

Remember that cutting fibre and adding a new connector is more complex. You will need a Kevlar Cutter to cut fibre, an Optical Stripper to cut out extra stuff from the cable, a Cleaver to align it and pre-polished field assembly connectors to put back on. You can mend the cable, but a device wielding fibre costs about 2,000.

Here, I refer to Michael Stapelbrg's blog post written in 2020 about his choice of equipment and instructions on how to use it.

Other hardware and final configuration

I've replaced my AIO water cooling with a simple air cooler. I went for Be Quiet! Pure Rock Slim 2 claims to be enough to handle up to 135W TDP CPUs, which should be more than enough for my use case.

As for the chassis, I initially thought about Corsair 4000D, copying Michael's decision, but after some experiments, I've decided to go with Fractal North Mesh instead. There were a few reasons behind that:

  1. Fractal North have more fans installed out of the box.
  2. Fractal's chassis is a bit bigger, and its mesh version has a fan mount that can blow directly over PCIe cards.
  3. Ergonomically, Fractal seems to be a bit better — SSD mount is easier to access, it has more space for proper cable management and a few more small benefits over 4000D.

One thing you should be aware about Fractal North — while it is a rather big case that can handle a large CPU cooler, it would interfere with the extra fan mounting plate, and something like Noctua D-15 will block you from having additional fans blowing onto your PCIe cards.

For a PSU, I went with Be Quiet! Pure Power 12m was the cheapest PSU not from a no-name brand with cable management.

The final configuration (v1) is:

  1. CPU: Ryzen 3900X (because I had one)
  2. MB: MSI X570 Ace (because I had one)
  3. Cooler: Be Quiet! Pure Rock Slim 2 (reasons stated above), 32.2 CHF (36.08 USD)
  4. RAM: 2x16GB DDR4 3200 CL14 (because I had one)
  5. Mellanox ConnectX-4 LX from eBay (x1), 55.18 CHF (62.7 USD)
  6. Netronome Agilio CX from eBay (x1, because I want to try something programmable), 138.42 CHF (156.39 USD)
  7. Graphics Card: Radeon R5 230 — because it's slim and X570 Ace doesn't have HDMI or VGA, 46.70 CHF (52.32 USD)
  8. PiKVM v4 for remote management (because I had one)
  9. PSU: Be Quiet! Pure Power 12M 550W, 87.50 CHF (98.02 USD)
  10. Disks: 2xSamsung 870 Evo 512GB in SW Raid 1 (no particular reason), 2x44 CHF = 88 CHF (98.58)
  11. Extra fans: 1x Noctua NF-A14, 39.40 CHF (44.14 USD)
  12. Chasis: Fractal North (Mesh version) — 119 CHF (133.33 USD)
The back panel, however, has 2 Mellanoxes installed, instead of Netronome + Mellanox.
The back panel, however, has 2 Mellanoxes installed, instead of Netronome + Mellanox and the insides of the router

So far I’ve spent 606.40 CHF (679.30 USD).

V1 hints that it was not a final version :) But more about that later (in the following parts)

What I'd change if I built from scratch

I'd go for either a B550 motherboard as even if I go for a Broadcom network card, I won't need more than 1 PCIe 4.0 slot, and B550 consumes less power. An example of such a motherboard is the AsRock B550 Taichi.

I would swap a CPU for either Ryzen 5700X or Ryzen Pro 5750G. Though a bit questionable, what is better as Ryzen 5xxx with integrated graphics is limited to PCIe 3.0. If I need more ports available for me, I'd need to go for P425, which would require PCIe 4.0 x8 or PCIe 3.0 x16, but Ryzen doesn't have enough PCIe lanes to support x16 + x8 (PCIe 3.0), and I'll be limited to x16+x4 which is not enough, and then 5700X would be better as I'll get x8 + x8 PCIe 4.0.

I would also go for 3200 MHz ECC DDR4 memory instead of a typical desktop one, and I won't need more than 16GB total (even if that is more than I need).

Taichi, CPU and RAM would cost an extra 490.5 CHF (549.44 USD): motherboard would be about 267 CHF (299.11 USD), RAM would be about 43.50 CHF (48.72 USD) for 2x8, and CPU is another 180 CHF (201.61 USD) for 5700X, which would bring the total spent to 1096.9 CHF (1228.74 USD), some of the HW you can now find used and save a bit on the overall setup.

It is also possible to build around Intel, and as of now, I'd go for 12th Gen Core i3 as a start (i3–12300T), just keeping in mind that the router still benefits from great 2–4 core performance.

I might've considered EPYC or Xeon builds to work around PCIe slot limitations as an alternative.

Results

Dark Ginger Cat (Somali) sits on top of a PC
My cats like to sit on warm PCs. Btw, her nickname is Patch.

After the build, while waiting for a 25 Gbps connection to be available, I ran some basic tests using a directly attached machine (Netronome <-> Mellanox ConnectX 4 LX, Netronome is on the server).

In this case, 172.16.0.1 is the router, and 172.16.0.2 will be a client. For simple benchmarking, I will use iperf3 and, by default, in single-threaded TCP mode (that would be a bit unrealistic and also should be harder for the system):

# iperf3 -c 172.16.0.2 -t 120
Connecting to host 172.16.0.2, port 5201
[ 5] local 172.16.0.1 port 41114 connected to 172.16.0.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 2.74 GBytes 23.5 Gbits/sec 0 2.67 MBytes
[ 5] 1.00-2.00 sec 2.74 GBytes 23.5 Gbits/sec 0 2.80 MBytes
[ 5] 2.00-3.00 sec 2.74 GBytes 23.5 Gbits/sec 0 2.80 MBytes
[ 5] 3.00-4.00 sec 2.74 GBytes 23.5 Gbits/sec 0 2.80 MBytes
[ 5] 4.00-5.00 sec 2.74 GBytes 23.5 Gbits/sec 0 2.80 MBytes
[ 5] 5.00-6.00 sec 2.74 GBytes 23.5 Gbits/sec 0 2.80 MBytes

Looks good and promising (reminding that bitrate is useful data size, while raw line rate in that case is around 25 Gbps). So I've started iperf from the router towards the client:

# iperf3 -c 172.16.0.1
Connecting to host 172.16.0.1, port 5201
[ 5] local 172.16.0.2 port 38690 connected to 172.16.0.1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.91 GBytes 16.4 Gbits/sec 0 3.15 MBytes
[ 5] 1.00-2.00 sec 1.95 GBytes 16.7 Gbits/sec 0 3.15 MBytes
[ 5] 2.00-3.00 sec 1.96 GBytes 16.8 Gbits/sec 0 3.15 MBytes
[ 5] 3.00-4.00 sec 1.95 GBytes 16.8 Gbits/sec 0 3.15 MBytes
[ 5] 4.00-5.00 sec 1.96 GBytes 16.8 Gbits/sec 0 3.15 MBytes
[ 5] 5.00-6.00 sec 1.97 GBytes 16.9 Gbits/sec 0 3.15 MBytes
[ 5] 6.00-7.00 sec 1.97 GBytes 16.9 Gbits/sec 0 3.15 MBytes
[ 5] 7.00-8.00 sec 1.96 GBytes 16.8 Gbits/sec 0 3.15 MBytes
[ 5] 8.00-9.00 sec 1.96 GBytes 16.9 Gbits/sec 0 3.15 MBytes
[ 5] 9.00-10.00 sec 1.97 GBytes 16.9 Gbits/sec 0 3.15 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 19.6 GBytes 16.8 Gbits/sec 0 sender
[ 5] 0.00-10.00 sec 19.6 GBytes 16.8 Gbits/sec receiver

The top shows that single-core performance is the bottleneck, and all CPU is consumed by ksoftirqd. Touching the knobs provided by the driver didn't change the situation significantly. Of course, if I'd swap Netronome for Mellanox. In that case, everything seems to be just running fine (and the CPU isn't ramping up to its full base frequency, consuming roughly 50% of a single core while running it at ~3 GHz, while Netronome caused it to boost to the highest achievable boost frequency, which for mine 3900X is ~4.4 GHz).

So I realised it wouldn't be easy to keep Netronome in place and get decent performance.

What is next?

As of the time of writing, I'm still waiting for ISP to connect me to 25 Gbps. That is about a week or two away, and I've been talking to Netronome support about some other stuff. So, in the next part, I'll tell the next part of the journey, what came out of all the experiments around Netronome and if I've managed to fix the performance.

Meanwhile, I've changed a few things in my router build (mostly in attempts to get a 25 Gbps line rate on Netronome), so I will tell more about those.

--

--

Vladimir Smirnov

SRE@Google Switzerland. Everything that I publish is my own opinion and do not reflect Google's opinion/position.