Active Directory Domain Services (AD DS) has grown to be a marvelously reliable, highly scalable, and fault tolerant core component of your company’s IT infrastructure. It generally works quite well without requiring a lot of attention. But the AD DS admin must put in extra work to take the service from a “working well on a day to day basis” level to a “bulletproof reliable when some kind of unusual situation arises” level. When this unusual situation happens (and notice I said “when”, not “if”), there are a number of bad configurations and bad practices that will put your AD DS domain or forest at risk for data loss, domain controller (DC) failure, or even domain or forest failure. Some of these items may seem elementary, but you’d be astonished to know how common they really are.
Related reading
You still have Windows Server 2003 DCs. Windows Server 2003 was an excellent OS in its day…but its day has passed. More importantly, it’s a vulnerable, unpatchable operating system. Why put your company at risk? Move on!
All your DCs and backups are in one physical location. Many bad things can happen in a single location. It’s good that you have more than DC, but if they’re all in the same data center, your forest is still vulnerable to whatever can take out that data center. And that obviously goes for backups of your forest as well. There’s no excuse to not be prepared for a slow-moving natural disaster like Hurricane Sandy on the East Coast of the US. But I’ve also known fast-moving external fires to cause significant damage to a data center and at least render it unavailable if not heavily damaged. The moral: Look for all single points of vulnerability in your AD DS environment.
Recycle Bin is not enabled. We admins all breathed a sigh of relief when the Active Directory Recycle Bin first became available in Windows Server 2008 R2. It took a while before this much-needed object restore utility was widely used, however, because it requires a minimum forest functional level of Windows Server 2008 R2. And to achieve that requires that you upgrade all your DCs in the forest from older versions, notably Windows Server 2003. Because an embarrassingly large number of organizations have only just recently put their Windows Server 2003 DCs to a well-deserved rest, many of these companies have also not enabled Recycle Bin. You know who you are; get to it!
You aren’t following virtualization best practices. If you have DCs that are older than Windows Server 2012, they don’t comprehend virtualized environments. In particular, they recover poorly from image-based restores where the AD database has been restored to an earlier time without trigger. Make sure you (and especially your virtualization admins) follow Microsoft’s best practices for virtualizing AD DS.
You aren’t monitoring DC / DNS health on a regular basis. One of the most common ways Active Directory gets into problems is because no one pays any attention to it. If you’re a small organization, I understand you may not have the funding for an extensive monitoring solution (though there are excellent free solutions for small and medium businesses such as SpiceWorks). All it takes to do some basic monitoring is a scheduled script that runs DCDIAG /s: /E and writes it to a file. Look it over every morning. If you encounter replication errors for example, there’s the free ADREPLSTATUS tool available to help you out.
You’re running multiple roles on a DC. Running multiple roles on a DC compromises both security and recoverability. In an era of easily-created VMs, there’s no reason to not have servers dedicated to this task.
You never change your service account passwords. This is a dirty little secret most IT departments don’t want to admit: the service accounts for many of their major infrastructure services haven’t been updated in years. Before Windows Server 2008 R2, changing a service account password meant incurring downtime in the service (for example a SQL Server database in a multitier line of business application). So, it never got changed. In one large organization I was acquainted with, the SMS service account password was known by at least three generations of sysadmins! This vulnerability has been rectified, first by managed service accounts (MSAs) in Windows Server 2008 R2 and then group MSAs in Windows Server 2012. Have you done anything about it?
Too many administrators. This should keep you up at night. Many, many organizations have far too many administrative accounts because whoever built their AD DS never took the extra time to put together a delegated administration model. As a result, granting broad rights is the fastest way to close a service ticket. This is about the fastest way I know of to get your company a headline on the SC Magazine security newsfeed or worse.
Don’t kid yourself. If you have any of these conditions in your AD DS environment, it’s your responsibility to show management why correcting them must be a priority for your company’s safety.