Friday, August 28, 2009

Introduction

It’s been quite a while since I wrote anything about Windows. The reason for that is quite simple, I haven’t had any real technical challenges up until now.

Situation: A medium sized windows domain with a time synchronization issue. It had gotten so bad that clients were experiencing Kerberos issues and some applications were displaying irrational behavior.

A quick investigation revealed that due to a change a few weeks earlier the domain had no reliable time source. In addition to that the PDC emulator had been moved to VMware ESX, something my VMware expert tells me is not a good idea.

Further investigation told me that the problems started much earlier than this and that ESX was most likely not the culprit here.

Now, how does time synchronization work? Quite simple really, a client or a member server asks the domain controller serving the logon request. The domain controllers get their time from the PDC Emulator.The PDC emulator gets time either from it’s internal clock, a source on the internet or an external device such as a GPS clock. There’s a very good description of the whole process in KB 884776.

The problems start if your PDC emulator isn’t keeping good time or, and this happened to me, if it’s not advertising on the network as being a reliable time source.

Symptoms

Usually you’ll see things like these in the eventlog:

Date: 27-5-2009
Time: 20:46:32
User: N/A
Computer: DC02
Description:
The time provider NtpClient cannot reach or is currently receiving invalid time data from <ip address> (ntp.m|0x0|<ip address>:123-><ip address>:123).

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Event Type: Warning
Event Source: W32Time
Event Category: None
Event ID: 47
Date: 27-5-2009
Time: 20:48:08
User: N/A
Computer: DC02
Description:
Time Provider NtpClient: No valid response has been received from manually configured peer <ip address> after 8 attempts to contact it. This peer will be discarded as a time source and NtpClient will attempt to discover a new peer with this DNS name.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Event Type: Error
Event Source: W32Time
Event Category: None
Event ID: 29
Date: 27-5-2009
Time: 20:48:08
User: N/A
Computer: DC02
Description:
The time provider NtpClient is configured to acquire time from one or more time sources, however none of the sources are currently accessible. No attempt to contact a source will be made for 15 minutes. NtpClient has no source of accurate time.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Event Type: Warning
Event Source: W32Time
Event Category: None
Event ID: 14
Date: 28-5-2009
Time: 1:46:47
User: N/A
Computer: DC02
Description:
The time provider NtpClient was unable to find a domain controller to use as a time source. NtpClient will try again in 15 minutes.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Quick fixes

Google for any of these event ids and you’ll find thousands of discussions. Common causes of this problem:

Windows time service (w32time) isn’t running.
Your system isn’t allowed to make a connection on port 123 UDP to your time source. So, check your service and firewall settings and restart the service (on the command prompt: net stop w32time && net start w32time).

If that doesn’t help…

Check if your domain controllers know and agree which server the PDC emulator is. On a command prompt:

Type ntdsutil, and then press ENTER.
1. Type domain management, and then press ENTER.
2. Type connections, and then press ENTER.
3. Type "connect to server ServerName", where ServerName is the Name of the Domain Controller you would like to view, and then press ENTER.
4. Type quit, and then press ENTER.
5. Type "select operation target", and then press ENTER.
6. Type "list roles for connected server", and then press ENTER.

Check this on at least two domain controllers. If they don’t agree you’ve got a real problem. If any of the roles are on a server that is offline you will need to seize the roles to a working server.

In my case the roles were properly distributed but on the PDC Emulator I still had the problems I mentioned earlier. In an hour I would have several Event ID 37 (currently receiving valid time) and Event ID 38 (ntp client cannot reach or is receiving invalid time data) messages. On the other DCs I had pretty much the same pattern.

So, next step was to make sure the other DCs and member servers were actually set to take their time from the domain. This is set in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\Parameters in the string Type. This string can have these values:

# NoSync. The time service does not synchronize with other sources.
# NTP. The time service synchronizes from the servers specified in the NtpServer registry entry.
# NT5DS. The time service synchronizes from the domain hierarchy.
# AllSync. The time service uses all the available synchronization mechanisms.

Now, all systems except my PDC Emulator were in fact set to NT5DS. And yet none were properly synchronizing time.

Hmmm….

Was my PDC emulator doing the proper advertising on the network? Let’s investigate.

C:\WINDOWS\system32>dcdiag

Doing primary tests

Testing server: Default-First-Site-Name\DC01
Starting test: Replications
......................... DC01 passed test Replications
Starting test: NCSecDesc
......................... DC01 passed test NCSecDesc
Starting test: NetLogons
......................... DC01 passed test NetLogons
Starting test: Advertising
Warning: DC01 is not advertising as a time server.
......................... DC01 failed test Advertising

<truncated>

Starting test: FsmoCheck
Warning: DcGetDcName(TIME_SERVER) call failed, error 1355
A Time Server could not be located.
The server holding the PDC role is down.
Warning: DcGetDcName(GOOD_TIME_SERVER_PREFERRED) call failed, error 1355
A Good Time Server could not be located.
......................... domainname.com failed test FsmoCheck

Now, that last sentence about the PDC role holder being down had me puzzled and send me chasing ghosts. Google it and you’ll find numerous discussions where this was true. But in my case it simply wasn’t. I double checked with NTDSutil and it simply was not down!

So… what was going wrong here? At this point I realized that the messages in the event log weren’t very helpful. They are too generic. I wanted to know what was really happening so I turned on debug logging on the Windows Time service. And that gives you information!

149258 09:35:51.3586829s - DomainHierarchy: we are now the domain root. Should be advertised as reliable
149258 09:35:51.3586829s - ClockDispln: we're a reliable time service with no time source: LS: 0, TN: 864000000000, WAIT: 86400000

Quickly followed by two successful synchronizations and a lot of failed ones after that.

At this point I was formulating the hypothesis that there’s some sort of successful/failed ratio that might influence whether or not windows considers itself a reliable source. I was also highly suspicious of my NTP source by now. I asked around and found I had a network connection to another domain that didn’t have this problem. So I decided to set the domain controller of that domain as my time source to see what would happen. I also decided to check my time settings agains the values outlined in KB 816042.

I changed:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\TimeProviders\NtpServer
with the proper IP address (don’t forget to put in a ,0x1 if you enter a name instead of IP address.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\Config\AnnounceFlags
I set to 5

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\TimeProviders\NtpClient\SpecialPollInterval
I changed from 3600 to 900.

And I restarted the time service again. Went to grab a cup of coffee and typed in dcdiag /v /test:advertising

C:\WINDOWS\system32>dcdiag /v /test:advertising

Domain Controller Diagnosis

Performing initial setup:
* Verifying that the local machine DC01, is a DC.
* Connecting to directory service on server DC01.
* Collecting site info.
* Identifying all servers.
* Identifying all NC cross-refs.
* Found 4 DC(s). Testing 1 of them.
Done gathering initial info.

Doing initial required tests

Testing server: Default-First-Site-Name\DC01
Starting test: Connectivity
* Active Directory LDAP Services Check
* Active Directory RPC Services Check
......................... DC01 passed test Connectivity

Doing primary tests

Testing server: Default-First-Site-Name\DC01
Test omitted by user request: Replications
Test omitted by user request: Topology
Test omitted by user request: CutoffServers
Test omitted by user request: NCSecDesc
Test omitted by user request: NetLogons
Starting test: Advertising
The DC DC01 is advertising itself as a DC and having a DS.
The DC DC01 is advertising as an LDAP server
The DC DC01 is advertising as having a writeable directory
The DC DC01 is advertising as a Key Distribution Center
The DC DC01 is advertising as a time server
The DS DC01 is advertising as a GC.
......................... DC01 passed test Advertising

Yesss!!!

I restarted the time service on the secondary domain. It immediately confirmed it was synchronising by logging and event id 35.

Event Type: Information
Event Source: W32Time
Event Category: None
Event ID: 35
Date: 28-8-2009
Time: 12:33:06
User: N/A
Computer: DC02
Description:
The time service is now synchronizing the system time with the time source dc01.domain.com (ntp.d|<ip address>:123-><ip address>:123).

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Lessons learned

  • A lot of people experience time problems that are related to either the service not running or a firewall blocking UDP traffic on port 123.
  • Don’t use net time anymore, use w32tm instead.
  • Check if your servers are getting time from the domain (that NT5DS key)
  • Troubleshooting starts at your PDC Emulator.
  • If your PDC Emulator isn’t advertising properly your domain will not synchronize.
  • Turn on debug logging on the Windows Time service if you’re into serious issues.
  • You need a reliable time source that’s well connected to your network.
  • You should monitor your event logs on warnings and errors generated by the w32time.

2 comments:

Benjamin said...

Thank you for writing this page. I had a problem with one of my DCs not advertising as a time server too. Nothing was affected, just dcdiag. Eventually I found that AnnouceFlags had to be set to 0xa instead of 0x5 (default). I wrote about it here: http://ben.goodacre.name/tech/Domain_Controller_is_not_advertising_as_a_time_server_Error_in_dcdiag_(Windows)

Benjamin said...

try again
Domain Controller is not advertising as a time server Error in dcdiag