Bug Found in VSTS
Last week on Friday, I spotted an issue with VSTS and raised with Microsoft. It was a minor bug, but bad enough to break all the nuget restore tasks of the CI pipeline.
Anyway, we all know that there is no bug free software in the world. Most importantly it is the way how the company or team respond/react to it and time to fix it.
Bug Fixed within 5 hours
In this incident, from the time I confirmed it was a bug from Microsoft side, and submitted the support ticket to the time it was fixed. All up it was 5 hours.
Thinking about this incident, a support ticket created by some random guy (i.e. me) in New Zealand with a random hotmail account. The ticket was received by a technical support person in Shanghai China. Then we had a couple of email exchanges for further questions/answers. Then, a message was put up on Microsoft’s official VSTS Blog showing the team is currently investigating the issue. A couple of hours later, it was confirmed as a bug in VSTS and the DevOps team (Microsoft’s language, not sure if that correct term to use) started action on the bug, also provided an interim solution. After another two hours, the bug was fixed, an hotfix was deployed to Production to an application used globally without any downtime.
I hope you as a reader, right now please take a minute or two thinking about the companies or applications you ever worked on. If the scenario happened to your production system, what is your reaction, your team’s reaction. How long will it take to resolve the issue? I am not talking about a small company with a website only has 50 visits a day. I am talking about global scale application.
DevOps is about the culture
This is a real example. It is another good reason for you moving towards to DevOps culture. It is not about whatif you have CI/CD, it is about the how. Traditionally, if same thing like this happens, the bug might be fixed very quickly by the Development team. Then the team will produce an artifact, either manually or automatically. Hand the artifact over to the Operations team to deploy it. Operations team then comes back telling the Development team, acknowledge that the application was deployed successfully to either UAT or Staging or PreProd, please find a tester to test it. Once the bug is signed off by a tester, another form has to be filled out to go through CAB before going into Production. This whole process is already very challenging by itself, it becomes even more challenging when tracking down people’s availability.