Capturing Packets in Linux at a Speed of Millions of Packets per Second

Jay Taylor's notes

back to listing index

Capturing Packets in Linux at a Speed of Millions of Packets per Second | Hacker News

[web search]

Original source (news.ycombinator.com)

Tags: news.ycombinator.com

Clipped on: 2016-10-24

jaytaylor (2501) | logout

		Capturing Packets in Linux at a Speed of Millions of Packets per Second (kukuruku.co)
		79 points by andreygrehov 5 hours ago \| unvote \| flag \| hide \| past \| web \| 15 comments \| favorite

cm3 5 hours ago [-]

In case you're wondering about the different linux kernel bypass mechanisms, here's the relevant slide from a recent talk: https://lh3.googleusercontent.com/TO1UdUicn1wuF4jIAhskikO6ML...

Sorry I haven't found the actual slides yet, that's why it's a photo from someone who took it while attending the talk.

bsder 2 hours ago [-]

That's great, but how do you then get that many packets to disk so that you can do something with them?

Presumably you need flash drives, and probably an append-only filesystem?

signa11 50 minutes ago [-]

> but how do you then get that many packets to disk

it may be possible to do disk i/o at that high rate e.g. with pci-e or a dedicated appliance for dumping the entire stream. but you would running out of storage pretty fast.

for example, a quick back-of-the-envelope calculation, where you dump packet stream from 4x10gbps cards with minimal 84b size (on ethernet), show that you would exhaust the storage in approx. 4.5 minutes :)

akira2501 27 minutes ago [-]

> where you dump packet stream from 4x10gbps cards with minimal 84b size (on ethernet), show that you would exhaust the storage in approx. 4.5 minutes

I don't understand your relations here, or the size of the storage you're considering. I did this:

3.5 million packets per second * 84 bytes per packet * 4 interfaces == 101.606 terabytes/day.

Or, at the raw interface rate:

10 Gb/s * 4 interfaces == 432 terabytes/day.

bsder 14 minutes ago [-]

40 Gigabits per second is roughly 4 Gigabytes per second.

4 Gigabytes per second times 86400 seconds per day is 345,600 Gigabytes per day.

Roughly: 345 Terabytes per day.

Large, but not stupidly so.

Damogran6 4 hours ago [-]

There's still not a lot of oomph left over to do anything with the traffic...or is that not the point of the exercise? You're not going to be comparing it, or writing it to disk at these levels.

This traffic is a bit above the levels I've dealt with, but I've seen Cloud Datacenter levels of traffic that, as far as I know, you can't practically log/monitor/IPS/SIEM...or am I misinformed?

nmjohn 3 hours ago [-]

> you can't practically log/monitor/IPS/SIEM...or am I misinformed?

It depends the hardware you're using (specifically router), but using netflow / sflow / ipfix [0] you can get pretty high visibility even for high bandwidth networks. This only gets you "metadata" and not a full packet capture - but for monitoring and the like, the metadata can be far more useful.

I'm not entirely sure what level of traffic you're talking about, but I know it's possible with the right hardware to use netflow with 100GbE links without having to sample (ie: Recording flows for every packet, not 1 in every n packets)

[0]: Good sflow vs. netflow beakdown: http://networkengineering.stackexchange.com/a/1335

revelation 5 hours ago [-]

This must be the 10th blog post to land on HN on the same topic, and they all walk through the same steps and all use the same hardware (ixgbe), which is by the way a hard prerequisite to make much of these strategies effective.

In any case, stop reinventing the wheel, just use a library purpose-made:

http://dpdk.org/

lukego 5 minutes ago [-]

I am a Snabb hacker and I see things differently. Ethernet I/O is fundamentally a simple problem, DPDK is taking the industry in the wrong direction, and application developers should fight back.

Ethernet I/O is simple at heart. You have an array of pointer+length packets that you want to send, an array of pointer+length buffers where you want to receive, and some configuration like "hash across these 10 rings" or "pick a ring based on VLAN-ID." This should not be more work than, say, a JSON parser. (However, if you aren't vigilant you could easily make it as complex as a C++ parser.)

DPDK has created a direct vector for hardware vendors to ship code into applications. Hardware vendors have specific interests: they want to differentiate themselves with complicated features, they want to get their product out the door quickly even if that means throwing bodies at a complicated implementation, and they want to optimize for the narrow cases that will look good on their marketing literature. They are happy for their complicated proprietary interfaces to propagate throughout the software ecosystem. They also focus their support on their big customers via account teams and aren't really bothered about independent developers or people on non-mainstream platforms.

Case in point: We want to run Snabb on Mellanox NICs. If we adopt the vendor ecosystem then we are buying into four (!) large software ecosystems: Linux kernel (mlx5 driver), Mellanox OFED (control plane), DPDK (data plane built on OFED+kernel), and Mellanox firmware tools (mostly non-open-source, strangely licensed, distributed as binaries that only work on a few distros). In practice it will be our problem to make sure these all play nice together and that will be a challenge e.g. in a container environment where we don't have control over which kernel is used. We also have to accept the engineering trade-offs that the vendor engineering team has made which in this case seems to include special optimizations to game benchmarks [1].

I say forget that for a joke.

Instead we have done a bunch more work up front to first successfully lobby the vendor to release their driver API [2] and then to write a stand-alone driver of our own [3] that does not depend on anything else (kernel, ofed, dpdk, etc). This is around 1 KLOC of Lua code when all is said and done.

I would love to hear from other people who want to join the ranks of self-sufficient application developers. Honestly our ConnectX driver has been a lot of work but it should be much easier for the next guy/gal to build on our experience. If you needed a JSON parser you would not look for a 100 KLOC implementation full of weird vendor extensions, so why do that for an ethernet driver?

[1] http://dpdk.org/ml/archives/dev/2016-September/046705.html [2] http://www.mellanox.com/related-docs/user_manuals/Ethernet_A... [3] https://github.com/snabbco/snabb/blob/mellanox/src/apps/mell...

mianos 4 hours ago [-]

Plus it also supports even lower level drivers for a bunch of cards (some are VM virtualised, such as the Intel em), as well as AF_PACKET, oh, and pcap.

rdtsc 4 hours ago [-]

Depending what you are doing, with latencies (or throughput) in that range, sticking a black box library in there right away might not be the best idea always. Doing what the author did is also a way to learn how things work. Eventually the library might be the answer, but if I had to do what they did, I would do it by hand first as well.

Jweb_Guru 3 minutes ago [-]

I doubt there's a single real use case out there for whom DPDK isn't fast enough, but custom hardware isn't warranted.

bogomipz 4 hours ago [-]

Can you elaborate on what you mean by the Intel NICs/drivers being a hard prerequisite here?

revelation 3 hours ago [-]

Lots of the low latency options require driver and Hardware cooperation, busy polling, BQL, essentially all of the ethtool options, even the IRQ affinity.

Intel has been a driving force behind many kernel networking improvements but they naturally don't care for other manufacturers, so they implement a little bit of kernel infrastructure and put the rest into their drivers.

bogomipz 3 hours ago [-]

Understood, agreed. Thanks for the clarification.

Search: