Last month I was able to catch up with my long-time friend Guido Grillenmeier, who is currently Chief Technologist at DXC Technology. In 2007-2008, Guido and I worked together, developing and delivering the “Active Directory Masters of Disaster” disaster recovery workshops at the Directory Experts Conference. It was, at the time, the only place IT Pros could get hands-on experience recovering an entire Active Directory forest from backup. Guido and I talked about how modern ransomware has drastically increased the risk of a total Active Directory meltdown. Yet, the actual process of recovering Active Directory hasn’t changed at all in 15 years.
At that time, discussions about recovering Active Directory (AD) focused on the possibility of outages caused by localized events. The most common culprits in these discussions were fires, hurricanes, and electrical problems in the data center. At worst, it was a failed schema update that corrupted domain controllers, something that seldom happened.
Today, however, the threat landscape has changed. The prevalence of ransomware means that companies have must deal with the real possibility of a threat actor crippling their entire IT environment—everything, including AD. The idea of having to recover Active Directory from scratch is no longer theoretical. It now must be a critical part of incident response planning.
Related reading
Do—Develop a Detailed AD Recovery Plan
When it comes to recovery planning, today’s organizations need to be prepared for disruptions caused by a multitude of scenarios: accidental deletion of AD objects, broken domain controllers, accidental change of AD attributes, and the recovery of a partition or an entire forest.
Recovering from the first few of these scenarios is relatively easy. For example, enabling the recycle bin feature offers a practical way to recover from accidental object deletion. The accidental alteration of attributes can be solved by identifying and leveraging the backups that contain the information needed to restore the attributes to normal. Third-party tools can also allow users to rollback any changes.
The most challenging scenarios involve the recovery of a partition or forest. Dealing with this situation requires detailed planning that extends to knowing what individual commands will be run on what machine. Because of the complexity of the entire process, this level of planning detail is necessary. In the aftermath of an attack, tension will be high, and the less thinking on the fly that needs to be done, the better.
Implementing recovery plans can be broken down into five phases:
- Pre-recovery phase
- Initial forest
- Clean up and verify
- Scale-out
- Wrap up
The pre-recovery phase is when you assemble the recovery team, implement the corporate communication plan, begin collaborating with the other teams you need to work with, and verifying the backups you will use to recover, and particularly ensuring that the backups themselves are not infected. The pre-recovery phase ends with shutting down all the domain controllers in the environment.
The next phase of the recovery is to build the initial forest from backup. This involves:
- Restoring the first domain controller (DC) in the domain from backup.
- Removing the Global Catalog partition
- Adjusting the RID pool
- Seizing the FSMO roles
- Cleaning up the metadata— references to domain controllers that no longer exist
- Resetting credentials and trust information
- Repeat these steps for the other domains in the forest
- This phase ends when you recreate the global catalog partition on the appropriate DCs, and replication, authentication, and group policy is working properly across the entire forest.
Once there are enough DCs, the other recover teams can start bringing applications back online.
For an AD recovery plan to be effective, organizations need to understand the dependencies of the different systems and services leveraging AD in their environment. Knowing this information is crucial for determining how to sequence the recovery of various applications. Every computer or application that is joined to an AD domain must be accounted for during this process. Any mistake can cause services to fail and slow the pace of the recovery.
Like any other incident response plan, this needs to be practiced periodically. IT leaders should set up a test environment on virtual hosts and attempt to recover AD forests from backups. Getting comfortable with the recovery plan increases the odds implementing it will go smoother in the event of an actual incident.
Don’t—Forget to Handle the Basics
Stopping attacks is much better than recovering from them after your environment has been hit. Adopting security best practices makes it harder for malware to exploit vulnerable systems and migrate to another machine. Chief among these practices is good patch management. Falling behind on patching undermines even the most intricate security planning. Many ransomware variants leverage vulnerabilities to worm their way through the enterprise. In the fight against ransomware, keeping systems up-to-date with security patches is a must. When organizations detected WannaCry in May of 2017, they discovered it exploited a vulnerability Microsoft had already patched earlier in the year. Organizations that are slow to apply patches or use unsupported systems risk leaving an open door for attackers. Once they are inside, AD will almost certainly become a target.
Next on the best practice list should be applying the principle of least privilege. Giving users administrator access on their devices helps criminal hackers clear a critical hurdle. Elevating privileges is a crucial step toward expanding the foothold attackers have in the compromised environment, making AD an attractive target. Following best practices for protecting AD, such as continuous monitoring, implementing an administrative tier model, and operating AD DCs as a server core, reduces the attack surface and lowers risk.
Do—Keep Some of Your Backups Offline
If a ransomware attack is successful, any system connected to the network is susceptible to being encrypted. In the case of the attack on Maersk, they reportedly were able to recover their AD essentially due to luck—a power outage had taken a domain controller offline at the time of the attack, leaving a single domain controller available for the company to recover from. About two weeks passed before the company was able to reissue personal computers to most of the staff.
If backups are saved on a non-domain joined server, it offers a safe place to start for AD recovery. Companies should carve off a piece of their Storage Area Network (SAN) and copy their backups there securely offline. Another strategy is to use third-party tools to copy backup images to Azure or AWS blob storage. Keeping backups offline removes luck from the equation by ensuring an unaffected copy of Active Directory is there to help the business reduce recovery time.
Don’t—Assume Cyber-insurance Will Save You
Time, after all, is money, and every moment the business is down has a price. The costs of missed customer opportunities and incident remediation are not decreasing, making cyber-insurance a must-have for enterprises. However, while the cost of cyber-insurance premiums increases, the possibility that insurance companies will not cover losses is real. For example, Mondelez International sued Zurich Insurance after Zurich refused to pay a $100 million claim filed by Mondelez after it was infected with NotPetya. Zurich justified the decision by citing a “war exclusion” clause, and contending the attack was the result of cyberwarfare.
Make sure that management is not relying on the prospect of an insurance policy saving them. While cyber-insurance policies provide a level of financial protection, real protection comes from sound security practices and proper planning.
But Wait, There’s More
Guido and I enjoyed this discussion so much that we decided to put together a webinar that goes into detail on these and many other Dos and Don’ts of Active Directory disaster recovery. You can find it here. Watch it and tell me what you think.
Times have changed. Enterprise approaches to AD recovery have to change as well.