Cyberattacks on business systems—including hybrid identity systems—continue to make headlines, including recent breaches targeting healthcare company Henry Schein and hospitality conglomerate MGM Resorts. Apart from these well-publicized attacks, the Semperis Breach Preparedness & Response Services team has seen a spike in requests from our customers (and our partners’ customers) to help them recover from identity-related attacks.
During these incident response (IR) engagements, I’ve observed stark differences in the recovery time and the impact it has on various organizations, which got me thinking about the resiliency level of these companies. What is the difference between a company that bounces back in a relatively short amount of time after an identity-related attack versus a company whose recovery drags on for days or weeks, incurring huge costs to the organization? From my firsthand experience, I’ve concluded that the biggest difference is the organization’s ability to orchestrate, automate, and test the recovery process.
Backups are only the beginning
Having backups is of course an essential starting point for business recovery. Although this shouldn’t be the case today, some companies still struggle with offline/offsite backup policies and procedures. In the case of a cyberattack that takes down the Active Directory environment, we’ve often seen that the first encryption target is the backup and recovery server on the company network. Once the attackers successfully encrypt the backup system, they move on to encrypt the rest of the organization.
Recommendation: Make sure you have offline/offsite backups that cannot be accessed by using the same credentials as the rest of your production network.
The recovery process can be a stumbling block
A convoluted recovery procedure also can delay the return to normal business operations. The best approach for recovery is “practice makes progress.” In many IR cases we’re pulled into, the biggest portion of the recovery timeline is spent on getting the right people to approve the recovery process. But other aspects of the recovery that seem obvious—such as maintaining an offline list of key contacts and organizing shift schedules for expert responders—also are overlooked, which adds complication to an inherently chaotic situation.
Gartner analysts Wayne Hankins and Craig Porter recently pointed out the importance of speed in recovering from a ransomware attack: “CISOs responsible for preparing for ransomware attacks must build resilience by developing a containment strategy they can execute during a ransomware attack. Failure to do so will increase the risk of uncoordinated and ineffective response, prolonging the recovery time.”
Although Gartner doesn’t provide specific instructions on how to build the recovery playbook, I can attest that the more details an organization captures, the faster the recovery will go. Uncovering those details requires practicing every step of the recovery journey.
Recommendation: Make sure you have a well-documented IR procedure that provides details about all aspects of the recovery process—and verify that this information can be accessed even if the network is down. Basic information (we’ll publish a more comprehensive list in the future and link to it here) includes:
- Who needs to approve various steps in the process—for example, who can authorize initiating the Active Directory recovery procedure?
- Who are your key vendors (this list might be different for an IR case as compared to vendors for normal IT operations), and where is their contact information?
- What SLAs are in place with your vendors?
Time is the critical factor in recovery success
After we’ve retrieved the backups and the recovery procedure has been approved, the next challenge is time. Time is the one factor that is working against you. I would argue that given enough time and access to valid backups, any environment can be recovered. But extensive time to recover can cause irreparable harm to the organization. There is a direct connection between operational downtime and the damage cost to the organization. While it’s not common for recovery efforts to fail completely, the costs associated with excessive downtime can be devastating.
Accurately assessing the all-in costs of downtime is notoriously difficult, and the final tally will vary based on the size of the organization and the industry. Gartner’s 2014 estimate that IT downtime costs organizations, on average, $5,600 a minute is still used as a benchmark. Gartner recently reported that costs for recovering from ransomware attacks in particular rose 20% in 2023 compared to 2022.
But again, downtime costs can vary wildly: According to Forbes, the average automotive manufacturer loses $22,000 per minute when the production line stops. The relevant question is not what the industry average downtime cost is. The important consideration is how much downtime will cost your organization—not only in hard costs but also in reputational, legal, and regulatory damages.
Recommendation: Make sure you orchestrate and automate as much of the recovery process as possible. Orchestrated recovery with complex systems can make the difference between days and weeks of recovery time versus minutes and hours. For instance, it took Maersk nine days just to recover their Active Directory following the NotPetya attack. In the same environment, but with an orchestrated solution, the time to recover can be reduced to 30 minutes.
Post-breach security prevents follow-on attacks
Once the recovery has been completed, ensuring the environment is secured and can be trusted is the last major effort in the recovery process. The last thing that you want is to expose the recovered environment too soon, which could open the door for attackers to immediately return and take it down again. You need to identify and eradicate lingering malware before returning systems to production.
Recommendation: Make sure your IR process includes a well-defined process for securing the environment post recovery. The procedure should include scanning all systems for indicators of exposure (IOEs), indicators of compromise (IOCs), and potential indicators of attack (IOAs).
Focus on reducing downtime to improve cyber resiliency
Preparing for the worst-case scenario is the first step in ensuring your organization can survive a cyber disaster. The number and severity of attacks on organizations of every size, every vertical market, and every geographic location rises with each passing month.
Based on my experience in helping organizations recover from devastating attacks, I urge every business leader to prioritize developing a fully tested, cyber-first disaster recovery plan. Although you can’t prevent every cyberattack, you can dramatically reduce the time to recover. That reduced downtime could be the factor that determines whether your business survives.