On May 11, 2021, without warning, Salesforce instances around the world started to deny users access to their systems. The shutdown was both rapid and without any early signs. Users who were logged in were able to continue to access services up to a point, but new logins were rejected. The splash page informed users that their instance was down for maintenance.
If you have been a Salesforce Admin for any length of time, you know that your immediate next step is to go to the company’s Trust site (status.salesforce.com), which is built and maintained by the Trust team at Salesforce. The company places Trust at the forefront of all that it does; a core value.
Upon visiting the site on Wednesday, you were greeted with an equally ominous page that gave little indication of what was happening. This is very unusual, even during a system-wide outage.
The outage came during a week when Colonial Pipeline had been attacked by a Ransomware attack and prompted many to immediately wonder if this was a similar incident.
As a Salesforce consulting partner, our support lines, across the globe, lit up. In the moment, we were just as disadvantaged in identifying for our clients what was happening. The immediate questions ranged from (obviously) “what is happening…”, “when will service return” and, most importantly, “is my data safe”. The first two questions were difficult to answer as we had no direction from Salesforce (yet). The third one was relatively easy to answer. Our Support Clients participate in a regular backup-service, and we were able to indicate exact date and time of the last offsite back (lessons learned, we are modifying our recommendations to include almost-real-time backups).
Teamwork
As the afternoon progressed, Salesforce shared with us that the culprit was a DNS (internal) table update gone bad. Deciphering the geek-speak, this was very comforting news, because it meant that none of our client’s data had been compromised.
- The internal tables that route your ‘https’ requests had lost their ability to resolve to Salesforce internal IP addresses. No aberrant requests received to see your Salesforce data.
- In fact, the security platform (the thing that authenticates who should see the data) was very much active.
- We watched as the Salesforce Rapid Response team jumped into action to identify, isolate, and remediate the problem. The team is a thing of beauty to watch; the very best that our industry has to offer. Thankfully, we don’t see them do their thing very often.
The team started re-threading the tables manually across the entire Salesforce ecosystem and completed the arduous task in hours. Trust restored.
How Salesforce Partners Stepped up their Game
Most Salesforce registered partners have a game plan for these types of events. At Thanawalla Digital, we commenced a communication protocol to our clients with the assurance that their data was backed-up as of the date/time their SLA dictated.
- We also deciphered what the ‘Internal DNS’ message from Salesforce really meant. Our real tasks began when the orgs started showing back on-line.
- We ran data-compare routines that confirmed that all expected data remained intact.
- Next, we ran metadata comparisons against our configurations backup and ran all unit tests to confirm Apex tests where 100% functional. These tests were rapidly completed in all production systems, and we so reported to our clients by 10:00PM central.
- Once completed in production, our Global Delivery Center picked up the task of testing full Sandboxes as well.
With the help of the Salesforce Rapid Response team, we were able to deliver a robust messaging to our clients during the outage, and we were able to independently verify that data integrity had been maintained.
At its core, Salesforce believes that Trust is their number 1 goal. They demonstrated that, indeed, preparation is key to that commitment. Partners across the world were able to rely on Salesforce remediation strategies to ensure that trust was not compromised during the outage.
Reach out to us directly to better understand how we harden our clients’ Enterprise orgs against catastrophic data events.