• vortexsurfer@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    4 months ago

    Had a similar thing at work not long ago.

    A newly deployed version of a component in our system was only partially working, and the failures seemed to be random. It’s a distributed system, so the error could be in many places. After reading the logs for a while I realized that only some messages were coming through (via a message queue) to this component, which made no sense. The old version (on a different server) had been stopped, I had verified it myself days earlier.

    Turns out that the server with the old version had been rebooted in the meantime, therefore the old component had started running again, and was listening to the same message queue! So it was fairly random which one actually received each message in the queue 😂

    Problem solved by stopping the old container again and removing it completely so it wouldn’t start again at the next boot.