Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
The pre-Christmas period is the most crucial time of the year for e-shops. Every hour of sales means hundreds of thousands of dollars, and every day of downtime can cost a company millions. That's what happened to the company whose name we'd preffer not to mention.
A single day of downtime can cost a company millions
It all started very subtly. The web-based e-commerce system (EShop Platform) relies on data from the ERP system to verify stock availability, which is updated through the ETL process and stored in DWH. However, due to a bug in the ETL pipeline, there was no regular synchronization between the ERP and the WMS (Warehouse Management System). The website was therefore working with out-of-date data - stock availability could not be verified, and because of that orders did not go through. At first, only individual customers noticed this, but as time went on, complaints accumulated. The service support call center was overwhelmed with requests from clients complaining that they couldn't purchase goods - a different one for each customer. The operators had no idea what the problem was, so they just promised to pass the matter on. But after a series of similar calls, it became clear that this was not an isolated problem. A ticket was created in JIRA.
The company was losing hundreds of thousands of dollars every hour
Unfortunately, the L1 support technician didn’t classify it as critical and didn’t escalate it with the necessary priority. Minutes passed, phone lines collapsed, and the company was losing hundreds of thousands of dollars every hour.
IT only became aware of the problem when the warehouse workers called in. Usually, during the peak season, they have piles of pending orders from the previous day and night in the morning, and orders from the current day are not dealt with until before noon. But no orders today. What's going on? The warehouse manager's the man for the job, so he went straight to the IT guys and demanded answers.
A frantic investigation began. Where was the issue? Logs? Notifications? IT received thousands of system messages daily—reading them all would be impossible. The tension in the room thickened. The developers hopped from one system message to the next, one checking the network and the other checking API calls. Every minute meant more losses. How long had the website been down? Hours were passing, and revenue was vanishing.
After an exhausting search, the issue was finally identified: a failure in communication between the inventory system and the website. The website couldn’t verify stock levels, so it blocked all purchases. The fix was ultimately simple, but the damage was already done. The company lost millions, its good name took a hit, and disappointed customers had already moved on to competitors.
This was foreseeable
The subsequent meeting was heated. A culprit was being searched for. The sales director blamed IT. The CEO was angry at everyone. The head of IT defended his team - the constant hotfixing of problems is not their choice. It is the result of underfunding and pressure for new features instead of prevention. But one thing was clear: this was foreseeable.
We left out names, but if you recognize yourself in this story, you probably have the same problem. IT outages are not a matter of if but when. The difference is how you respond to them. Get in touch. Qeedio will help you be prepared before it's too late.