That Time I Banned Hetzner From Discord

11/19/20243:34:05 AM

I recently read a news post from the tor project and a related blog post from Pierre Bourdon which reminded me of an interesting anecdote from my experiences at Discord. Both talk about the issue of SYN spoofing on the modern internet, which can often be used as a social engineering attack to get cheap hosts to null-route or otherwise block customer services. While it's not a widely known or exploited attack vector, it should have been solved many years ago, and it's annoying that community-based services are the most subjected to this issue.

DDoS Protection As A Service

In the early days, one of Discord's main selling points for adoption was its ability to mask and protect your IP. At the time, DDoS (and, in most cases, just simple DoS) attacks against individuals (usually via Skype) or dedicated TS3/mumble servers were very common. To implement this, we had to ensure that 100% of user traffic was proxied through infrastructure managed by Discord. This included anything an end-user would interact with, from voice and video data to the images shared in chat.

A Lot Of Bytes For Not Very Much

The largest amount of end-user traffic that needed to be proxied and masked was via our real-time infrastructure. These servers managed all the data transmitted during a voice or video call. Data sent by users is encrypted (today, this is E2EE via DAVE, but at the time, it was purely via symmetric encryption), and thus, these servers had to decrypt and re-encrypt traffic as they proxied it.

This workload has a high demand for bandwidth and compute, which tend to be very expensive. To solve this problem without breaking the startup bank we relied on commodity dedicated hardware. This hardware was ideal for a few reasons:

Servers usually came with a large bandwidth allocation, and it tended to be very cheap to purchase more on top
Easy to obtain performant CPUs if you could handle being a gen or two behind
Providers usually have some form of pooled DDoS protection, which would handle 99% of the attacks we saw
Cheaper dedicated hardware usually has less disk/RAM, which would have gone to waste for our use case

Knock Knock, Who's There?

Despite this filtering, we would often see a plethora of minor attacks being attempted. One of the more common ones was SYN flooding and spoofing, which almost always fall through DDoS filters (SYNs are hard to filter). SYN flooding was easily mitigated on servers via SYN Cookies, but it also would rarely impact end-users since it only prevented new connections. Sometimes, these SYN attacks would be targeted at specific Discord channels and other times, we would simply fall victim to having HTTP(S) services hosted.

Similar to the Tor blog post when we got SYN floods with spoofed ranges, they often targeted IP ranges owned by hosting providers that generated automated abuse reports. We would either handle these case by case with providers or, in some cases, we had providers "null-route" these abuse reports for us (in a chat with one of our providers):

2017-06-27 15:07:36 - b1nzy: Thats a hetzner report no? They keep reporting port scans because people SYN flood our servers on port 80 with their response range set.

2017-06-27 15:08:06 - b1nzy: At this point we've requested most of our other providers drop hetzner reports unless they provide actual information and its not just their shitty automated reporting

Because the volume was so low and since our providers usually handled things for us, we generally ignored these reports.

Hetzner, Can You Like, Stop?

At some point, the problem started growing to the point that I wanted to get things sorted out. I can't give an accurate number anymore, but I think almost 90% of the reports we received would come from Hetzner. They had developed an automated system that would detect these SYN-spoofing attacks incorrectly and ship out a plethora of abuse reports.

These reports became cumbersome to deal with, and for our cheaper providers, who outsourced support staff, it would sometimes result in servers getting null-routed, requiring further intervention (and actually affecting our end-user's calls).

On Thursday, September 14th, 2017, I sent the following email to the listed network contact for Hetzner's IP space:

Hello all,

I'm a server engineer from Discord https://discordapp.com. As part of our product, we host our voice services on a wide variety and range of commodity hardware across many regions and providers. This email is intended for the network engineers at Hetzner.

As of recently, we've received a few completely invalid abuse complaints from Hetzners automated abuse detection (although this has been going on for months). Namely, malicious actors tend to SYN flood our open HTTP servers (port 80 / 443) which respond to a percentage of requests with SYN-ACKs. Hetzner then chooses to file an abuse report with the provider, which provides more issues for us as we have to individually respond to these.

Because automating abuse reports in this fashion is a completely invalid behavior for a provider of your nature, we are left with two choices here:

We block all Hetzner traffic to our various servers, which will prevent SYN floods from triggering the automated abuse reports. Based on my quick digging of our logging information, this will effect about 300 (see attached image) Hetzner customers which host services that require/utilize Discord's API negatively.

Hetzner stops sending these automated invalid abuse reports to our providers. Our IP ranges are extensive and ever changing, so whitelisting here is not an option.

[image: Inline image 1]

Thanks, Andrei

So Anyway, I Started Blocking

After we didn't receive a response in over 24 hours, I decided to block Hetzner. This started out with a block on just our RTC servers, meaning clients wouldn't be able to connect to voice, but the vast majority of other features would work as normal via the primary Discord API servers.

2017-09-15 18:59:40 - b1nzy: FYI CX, we're going to be blacklisting a german server provider (only for voice servers atm) because they are causing us trouble. Based on the numbers I've pulled this may effect some portion of ~800 bots. If you get any emails reporting problems connecting to voice for a bot, please inform them of this and that they should reach out to Hetzner to resolve.

We already had a custom firewall script we used across all our exposed infrastructure, so I just had to collect the list of IP ranges and add a block like so:

# Ignore (drop) all traffic from hetzner
{% for hetzner_range in pillar.hetzner_ranges %}
iptables -A EXTERNAL -p tcp -s {{ hetzner_range }} -j LOGDROP
{% endfor %}

When hours later we still hadn't received a response, I fully blocked Hetzner from all of Discord's services. This meant clients would be unable to connect or interact with our API in any capacity. Any users reaching out would be redirected to contact Hetzner support and ask them to contact us.

2017-09-15 20:51:52 - b1nzy: We're going to be blacklisting the same provider from all of our services at 16:00 PST today, I expect that may generate some ticket traffic

2017-09-15 22:52:03 - b1nzy: Pulling the trigger on Hetzner IP blocks

Finally, on Monday of the next week, Hetzner network engineers reached out and opened a dialog on the original email I had sent, allowing us to start working towards a solution and unblock Hetzner from our API servers (but not our RTC infrastructure):

Date: Mon, 18 Sep 2017 10:21:13 +0200

Hello Andrei,

now we can communicate over this ticket.

The first step is that we now don't send any abuse to abuse@REDACTED.com, till we have fixed the main reason.

I have looked at the abuse tickets, and it looks like that your ips are attacked via ddos, and your server answers on this attack. Because the IPs are spoofed, we receive the answer packets and our system detects this as an attack from you.

Kind regards

Tobias Frenz

Hetzner Online GmbH Industriestr. 25 91710 Gunzenhausen / Germany Tel: +49 9831 505-0 Fax: +49 9831 505-3 www.hetzner.de

In the end, we didn't resolve the issue as Hetzner was unable to admit fault with their automated systems, and we couldn't waste further resources on a small hosting company. I think my final email in the chain does a good job summarizing our position at the time well:

Hey Sebastian,

Your statement doesn't make sense. We're not "reflecting" traffic, we're simply responding to what looks like normal internet traffic. We host a public internet service, like almost everyone else on the web. IP white-listing does not make sense and is not plausible, we have millions of clients connected across hundreds of servers. Furthermore, DNS reflection is an entirely different issue, as that is a valid reflection attack. If your system is receiving malicious traffic (SYN floods, DNS reflection, UDP flood, etc) then I could understand the automated system triggering. It is however triggering on traffic that can be expected on the internet, in various cases.

Thanks, Andrei