It was day 1 of vacation. Teachers still ran Mac Quattro's or Windows 3.1x systems, Many people were still computer illiterate, and some even go as far as having their pre-teen children talk to us because of how clueless they are. The engineer rolled out of bed and checked his email. Anyways, my boss assured me that two of our partners were going to contact me to walk me through the process, and that I had nothing to worry about.... right. Instead someone just left the cabinet open and informed their supervisor that the problem was solved. After some conversations, I agreed to fly out to this new site to mount some switches, firewalls, and servers to get the network up and running. There was one cabinet full of industrial control systems and network switching gear in that rarely visited area. Posted by 10 hours … Those don't even really matter." He went ahead with the deploy, hastily jamming them into an already overloaded function, eager to get the change out before the 4-day weekend. "Hrm, that can't be good," he thought. The counts all matched up, and so it was time to get started. I also want to point out that this would be my first time ever mounting this stuff on a rack. That thing had been there since the 1980's and saw many modifications (usually with more stuff being added in). Time was running out, there were only two weeks before the old crash reporting service would stop, It was mid-day in the office, the sun shining brightly in the city. I was confused. He created the diagram and from what I can see everything matches up perfectly so it should be straightforward. "This is easy, right?" he thought, worriedly. The default termination policy stated to kill the servers that had been with the autoscaling group the longest. Guess what coats the wires and cables in the cabinet? If they failed, it took another 40 minutes to try an alternative fix. "Probably not a big deal though." He debated ignoring it and going on with his weekend, but something compelled him to check. The young engineer tread through the ominous documentation and applied the necessary changes to the codebase. As noted in the hangout, signups are probably broken right now." "Oh…oh no," he thought as a scream lodged in his throat. The insulation on the wires in the cabinets were melting or flaking off, allowing bare copper to come in contact with each other. "Wait a minute," I thought to myself, "Friday, late at night, I was snoring in a sleeping bag in the middle of a national forest, miles away from any cell reception!" A shadow cast across my face. His manager tried to calm him down, encouraging him to fix. At first, I was very reluctant to go because my significant other was going to undergo surgery during my stay (after this experience I'll never make a sacrifice like this again), but in the end I agreed to go and got to packing. The second issue was that Firebase's new service required the "dead" API key, otherwise known as the secret password assigned to Reddit from Fabric's former service! With panic rising throughout his body and mind, he jumped into the outage tracking company Slack channel. With the diagram pulled up on front of me, I get on the call with him and he starts guiding me through the process. He decided to double-check the code. She was further tasked with replacing it with a new crash reporting service called Firebase. But not all was lost! Over the years that my site has existed tucked between two different airports, we have seen manifested many splendored evidences of intelligent life behaving flawlessly brilliantly on … He just didn't know what to do. After ritualistic testing, she marked the changes ready to ship. "Start tracking more fields in the backend events," the Jira ticket read. I was confused. It was around 4PM at this time, since the devices took quite a while to arrive. In the build, I saw code running that programmatically committed back to the repo… as my user. There were also 480V cables and breakers inside of that cabinet as well. He just didn't know what to do. That's the flow this touches and the only possible user impact. It was using test configurations— configurations that would only work on a developer's machine. This was too fast to complete a context switch onto another task, yet too slow to watch. With haste, he went to kill all the new servers. A malicious hacker had not impersonated me, our own code had! Although the work week was short, it had felt long  and his bleary eyes needed rest. "That's right! "400k exceptions per day in the bug tracker!" Oh no, no, this was bad. He looks at his old house and tells his family that time has passed with a blink of an eye. He tried again: *another* 500 error. Our quick-witted engineer prepared a plan to remove the new servers with the ill-conceived configuration, to restore glory to the Front Page of the Internet to the world. Time was running out, there were only two weeks before the old crash reporting service would stop working completely. His event errors, while very troubling, weren't blocking sign-ups. "It can't break anything either, because it's just events. Little did it matter, though. As he frantically scrolled the Slack message history, the clouds began to part: this was unrelated. He looks at the Loop blaring in the distance. It was mid-day in the office, the sun shining brightly in the city. Surprisingly, the Firebase docs were not quite up to date and this was one of its unpredictable kinks. "Hrm, that can't be good," he thought. "Probably not a big deal though." He debated ignoring it and going on with his weekend, but something compelled him to check. He tried on mobile: *more* 500 errors. What made this entire process FAR more frustrating was that each crash test took 40 minutes to complete. Somehow the 480V cables hadn't lost their insulation yet, which would have been a problem if they had metal-to-metal contact somewhere. With such attention, the meticulous programmer prepared. One subject caught his eye: . Between each test, the engineer would make a small change and wait for the test results to return. During a team meeting on Monday, she checked in on the shiny new crash reporter, Firebase. In a rush and with gusto, the engineer made a change to the termination policy in the Amazon web console that dictated: "Kill the newest instances first," hoping to rid the world of the diseased servers. Isolated in her apartment, she meticulously tested each configuration, forcing crashes on all versions of the app. Any cables that are in the exposure zone are in armored cables or metal tubing. I don't have the best grammar, and this is my first posting. A week later I fly out to the site and start unboxing the devices once I arrive (Everything was sent directly to the site so I didn't have to bring anything with me on the plane apart from some tools). "It can't break anything either, because it's just events. I didn't remember merging his pull request, but then again, it was Monday morning and last week felt draped in a haze. 500 error. But somewhere in this same city was a man, sitting in front of a glowing screen in an office dark as night. "This is easy, right?" he thought, worriedly. Me: Thank you so much the help Tom! That meant that any and all crashes were undetected and unresolved. Another Monday rolled in, except this time, the engineer jumped with glee as 40k crashes came rolling in. More than a year down the road, the station starts having erratic behaviors and would randomly alarm out. It was time for The Ritual of the Configuration Change Deploy with a set of new Amazon autoscaling group servers, which were forged for this very special occasion of testing the new configuration. He emailed me a diagram of the cabling, and the order in which the devices are supposed to be racked. But somewhere in this same city was a man, sitting in front of a glowing screen in an office dark as night. Unfortunately, the news was much worse: the Firebase update had gone completely wrong and crash reports were no longer being sent. On this particular day the call center was experiencing high call volume so my department joined the regular queue, and the supervises fulfilled my initial roll to boost bodies taking these calls. Sometime ago, it started having problems and one of the technicians used a thermal imaging camera to confirm that it was an overheating problem. After a decent amount of physical labor, I managed to get everything set up according to the diagram. Indeed, at 11:33pm PST on Friday I had merged his pull request. Some of the coolant ends up being aerosolized from the spray and while it is below OHSA's safety limit, it's still enough to slowly degrade the paint inside the facility over time. In a panic, he reached out to his manager, apologizing profusely for the interruption on a day off. 