Lemmy.world has been down between 02:00 UTC and 05:45 UTC. This was caused by the database spiking to 100% cpu (all 32 cores/64 threads!) due to inefficient queries been fired to the db very often.
I’ve collected the logs and we’ll be checking how to prevent this. (And what caused this)
It seems to start at roughly the same hour every day, at around 01:20 UTC
Yup. We might be on to something now
lemmy.world is a test environment for Lemmy developers… 😅 Jokes aside, issue is an issue.
Well, as pretty much the biggest instance, it provides the best data for load-testing. 🫣
All Lemmy instances are test environments right now. It’s just that lemmy.world is being tested the hardest.
Every Lemmy update:
“We fixed some performance issues by optimising some queries.”
Also: “To balance it out, we added some new even more inefficient queries.”
Next time just god damn upload it directly. Thx
(And what caused this)
Prediction: bad database programming. ;)
Are we extracting enough value out of our volunteer developers and DBAs?!
The beatings will continue until morale improves!
Another DDOS attack?
People move to smaller instances so that with such outage not everyone is affected. Use fediverse as its supposed to be used.
What’s the name of the server you are running?
A large instance today will be a small instance in the future. There are hardly any users on lemmy compared to other more established platforms. So if lemmy is to ever handle a lot more users, stress testing the code makes a lot of sense.
What’s going to happen in the future, do you expect there to be 50,000 servers? That’s unrealistic.
Instances should be divided more into groups of Communities. So they theoretically don’t grow infinitely, only as high as the “group” if communities grows. Ex. An NBA or Sports instance containing /c/NBA /c/NFL /c/NHL and all the related teams. Or similar to the programming.dev instance all being programming and development. While these would grow it would grow at a much slower rate than everything in one instance and be much more maintainable.
Of course this is somewhat of a social construct so everyone has to be in agreement with how to handle this and move accordingly, which won’t happen.
Long-term long-term for federation there has to be a distributed computing solution that allows the users to contribute to hosting.
Ex. An NBA or Sports instance containing /c/NBA /c/NFL /c/NHL and all the related teams.
You’re not taking into account that some people are dumb as fuck. They will sit on one instance and when the instance goes down , they’ll start whining
They will sit on one instance and when the instance goes down , they’ll start whining
Its true. Especially so, since its my instance, and it being broken means I need to fix it. :'(
I can’t claim to know what the designers intended, but having users spread across a large numbers of servers is terribly inefficient for how Lemmy works: each server maintains a copy of each community that it’s users are subscribed to, and changes to those communities need to be communicated across each of those instances.
Given this architecture, it is much more efficient and robust to have users concentrate on what are effectively high performance cacheing servers, and communities spread out on smaller, interest focused instances.
Bro I was talking about you. read my previous comment