Hey folks!

Unfortunately, roughly 2 hours ago, lemm.ee went offline. This was caused by our load balancer thinking all of our servers becoming unhealthy, despite all health checks responding successfully when I requested them directly. I am still not sure what exactly caused the issue, but I will try to investigate more over the weekend.

For now, we have partially recovered, and I am continuing to work on remaining issues. Hopefully we will be back to 100% very soon. Sorry for the inconvenience!

  • tacosanonymous@lemm.ee
    link
    fedilink
    arrow-up
    0
    ·
    2 months ago

    Is there another instance where you could report issues?

    If we logged into another account, we’d be able to see those before it comes back up.

    • sunaurus@lemm.eeOPM
      link
      fedilink
      arrow-up
      0
      ·
      2 months ago

      There are two useful sections on https://status.lemm.ee for this - firstly, there is an automated check for federation with all other instances on the bottom of the page, and everything there being red is a definite sign that something is wrong with lemm.ee itself. Secondly, near the top of that page, I will always write a status message manually when I discover & start work on any issues. This second part can have a bit of a delay, as it requires manual input from myself, but I have updated it every time we had any issues so far.

  • fossphi@lemm.ee
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    Thanks for the quick fix! What did you have to do to get the load balancer working again?

    • sunaurus@lemm.eeOPM
      link
      fedilink
      arrow-up
      0
      ·
      2 months ago

      For now, I just redeployed all of our servers completely, but as I don’t know the actual root cause of the issue yet, I’m still investigating to figure out if anything more is needed.

  • JimmyBigSausage@lemm.ee
    link
    fedilink
    arrow-up
    0
    ·
    2 months ago

    Thank goodness! Hopefully discovering these vulnerabilities and protecting them will help keep Lemmy alive when the big dogs come in to sweep us away! (Worst fears)

    • sunaurus@lemm.eeOPM
      link
      fedilink
      arrow-up
      0
      ·
      2 months ago

      Sorry for the delay in updating the status page - I actually had gone out for lunch just a few minutes before the downtime started, so I didn’t even realize anything was up until I was back at my computer about 45 minutes later 💀

      • ToxicWaste@lemm.ee
        link
        fedilink
        arrow-up
        0
        ·
        2 months ago

        no need to apologise. still a better response time, than some of the professionals I work with ;-)

  • ramble81@lemm.ee
    link
    fedilink
    arrow-up
    0
    ·
    2 months ago

    Nginx? I had an nginx LB shit itself yesterday. Luckily it auto-recovered and I had HA but just weird it happened.

  • edric@lemm.ee
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    I thought the entire lemmy network was down because status.lemm.ee was saying our instance was fine and federation wasn’t working with every other instance. lol

  • db0@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    0
    ·
    2 months ago

    Typically when this happens, the issue is on the LB itself. Maybe its own network had issues?

  • Clot@lemm.ee
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    Sometimes, downtimes are awesome. Get off your machine and spend time with your family, folks!