Replacing DiscordSRV with ELK and a py-cord bot

Let's be real: DiscordSRV is one of the most popular Java Minecraft Server Plugin (Top 25 - according to bStats). It creates a great bridge between Discord and Minecraft, not only for Chat, but even allows you to manage your Minecraft server from a Discord channel! (or rather 😱)

If you are running a small server, those are all great features, but for our use-case, and also since we switched to Folia, we would like to:

  1. Keep our attack surface small: with other Plugins causing wide-range hostile server takeovers, how often does DiscordSRV get audited?
  2. Improve performance: with nearly 10MB in size, DiscordSRV is not only the largest plugin we had on our server, but also it requires quite some resources to maintain a good chat bridge. If Discord had API failures, it was not seldom for our - otherwise healthy - server to crash.
So how were we able to get rid of it while still maintaining a functional chat bridge?

With the help of ELK

Lets break it down a bit... firstly we run two servers (mostly): one for the actual Minecraft Java server software and another where we put all other tooling and monitoring. Having this separated allows us to keep the Minecraft server itself minimal, clean and secure while also having low risk that another software can hog CPU or other resources we would rather want to dedicate to Minecraft.

On the Minecraft Server we listen to the latest.log using Filebeat and transport it right away to Redis as "list". This allows us to fail Logstash without losing any logs.
From there, Logstash will pick up the logs and do some data transformation. We tag logs, parse timestamps, categorize and much more:

One of the more useful features is "log enrichment" - anytime a plugin outputs a Minecraft "username" we extend the log with an additional "uuid" field. That allows us to filter logs by uuid, even though the original log would not have contained that information.

At the very end we push all logs to Elasticsearch, for later consumption using Kibana and Grafana, while also pushing logs tagged as "chat" to Redis with "pubsub" in addition.

With a small py-cord based Discord-bot we are then able to listen to the "pubsub" events and post them into our Discord channel. Using rcon + tellraw, we can also push messages from the Discord channel back into the Minecraft server allowing us for bi-directional communication.

The results are seamless compared to DiscordSRV:

So why bother with ELK in the first place?

Running a Minecraft server with staff means giving access to server logs as it helps greatly in understanding player movement and action as well as server health. Using ELK I am able to give that access over a functional web interface without giving direct access to the server.

Going a step further, the value is in enriching and "documentizing" the logs. One common workflow as a server owner is to identify lags. Using Folia's built-in tps command I can easily find the highest utilized region:

[18:30:53] [Region Scheduler Thread #0/INFO]: Server Health Report
 - Online Players: 75
 - Total regions: 54
 - Utilisation: 347.8% / 600.0%
 - Load rate: 226.20, Gen rate: 52.27
 - Lowest Region TPS: 11.66
 - Median Region TPS: 20.00
 - Highest Region TPS: 20.00
Highest 3 utilisation regions
 - Region around block [w:'world',coords-x,coords-y,coords-z]:
    98.9% util at 84.53 MSPT at 11.66 TPS
    Chunks: 634 Players: 1 Entities: 9,457
 - Region around block [w:'world',]:
    27.4% util at 13.73 MSPT at 20.00 TPS
    Chunks: 3,763 Players: 22 Entities: 3,513
 - Region around block [w:'world',]:
    18.1% util at 9.06 MSPT at 20.00 TPS
    Chunks: 776 Players: 3 Entities: 1,514

But given these are regions, it will not be easy to right away find the chunk that is causing lag, with ELK it becomes very easy to show me players that have logged in within the area in the past using simple >, < or = operators:

With this I was able to teleport to a player right away and open up a conversation to figure out what was causing the lag.

However there is much more that we do with logs today, as well as even more that can still be done. I hope this was an interesting read and inspires you to do more with your server outside of what plugins can offer :)

Shameless plug: if you are searching for a Semi-Anarchy Survival Server, with NO Hacking allowed, no /home, /tpa or similar, join us on Simply Vanilla!