Sql Server and SSPI handshake failed error hell
The infamous SSPI Failed error strikes again!
One of our SQL servers was generating these errors for “some” Windows logins but not all.
Error: 17806, Severity: 20, State: 2.
SSPI handshake failed with error code 0x8009030c while establishing a connection with integrated security; the connection has been closed. [CLIENT: 192.168.1.1]
Error: 18452, Severity: 14, State: 1.
Login failed for user ”. The user is not associated with a trusted SQL Server connection. [CLIENT: 192.168.1.1]
After exhausting all of the normal troubleshooting for this error (accounts locked, disabled, Sql Service accts, bad connection strings, SPN’s, etc.) I spent the next few hours learning more about the way SQL handles authentication requests than I had ever wanted to know.
The Scenario –
A couple of separate individual Windows ID’s started generating these errors while attempting connections, all other windows logins were working properly. The connections were initially happening through applications, but also occurred through sqlcmd. When logged in to the server locally with the offending ID’s the connections to SQL would succeed.
The Troubleshooting process –
Check all the regular SSPI issues, I wont bore you with the details as they are easily searchable
- A relatively easy way of checking the “easy” authentication issues If possible/appropriate is to log into the SQL Server locally with the offending ID and fire up sqlcmd and connect to the server via sqlcmd –Sservername,port –E (by specifying the port you force TCP/IP instead of LPC, thereby forcing the network into the equation)
Verify whether the login is trying to use NTLM or Kerberos (many ways to do this but simplest is to see if there are any other KERBEROS connections on the machine)
- SELECT DISTINCT auth_scheme FROM sys.dm_exec_connections
- If Kerberos is in use, there are a few additional things to verify related to SPN’s, since only NTLM was in use on this server I skipped that
Determine if the accounts were excluded from connecting to the machine through the network through a group policy or some other AD setting
After all of these checked out OK, I began to try and figure out what the error code 0x8009030c meant, turns out, its fairly obvious what the description is : sec_e_logon_denied. This description was so helpful I thought about making this server into a boat anchor but, luckily for my employer the server room is located many miles away and has armed guards.
Since I knew we could logon locally to the SQL Server with the ID that SQL was rejecting with logon denied something else was trying to make my life miserable.
We didn’t have logon failure security auditing turned on so, I had no way of getting a better error description, As luck would have it though this would prove instrumental in finding the root cause. To get a better error message, I found this handy KB article detailing steps needed to put net logon into debug mode.
Say hello to my new best friend! — nltest.exe
After downloading nltest & using it to enable netlogon debugging on the SQL Server, I got this slightly better message in the netlogon.log file
06/15 14:15:39 [LOGON] SamLogon: Network logon of DOMAIN\USER from Laptop Entered
06/15 14:15:39 [CRITICAL] NlPrintRpcDebug: Couldn’t get EEInfo for I_NetLogonSamLogonEx: 1761 (may be legitimate for 0xc0000064)
06/15 14:15:39 [LOGON] SamLogon: Network logon of DOMAIN\USER from Laptop Returns 0xC0000064
The error code 0XC0000064 maps to “NO_SUCH_USER”
Since I was currently logged in to the server with the ID that was returning no such user, something else was obviously wrong, and luckily at this point I knew it wasn’t SQL.
Running “set log” on the server revealed that a local DC (call it DC1) was servicing the local logon request.
After asking our AD guys about DC1 and its synchronization status, as well as whether the user actually existed there, everything still looked OK.
After looking around a bit more I discovered this gem of a command for nltest to determine which DC will handle a logon request
C:\>nltest /whowill:Domain Account
[16:32:45] Mail message 0 sent successfully (\MAILSLOT\NET\GETDC579)
[16:32:45] Response 0: DC2 D:Domain A:Account (Act found)
The command completed successfully
Even though this command returned “act found” it was returning from DC2. (I dont exactly understand why the same account would authenticate against 2 different DC’s based on a local desktop login or a SQL login but it apparently can)
After asking the AD guys about DC2 the light bulbs apparently went off for them as that server actually exists behind a different set of firewalls, in a totally different location. While DC2 would return a ping, the console wouldn’t allow logons for some reason. After a quick reboot of DC2, and some magic AD pixie dust (I am not an AD admin, if it wasn’t totally obvious from my newfound friend nltest) the windows Id’s that were having trouble started authenticating against DC3 and our SSPI errors went away.
Interesting tidbit — During troubleshooting, I found that this particular SQL Server was authenticating accounts against at least 5 different DC’s. Some of this might be expected since there are different domains at play but, I haven’t heard a final answer from the AD guys about whether it should work that way.
Reboot the misbehaving DC, of course there may be other ways to fix this by redirecting requests to a different DC without a reboot but, since it was misbehaving anyway, and the AD experts wanted to reboot so we went with that. A reboot of SQL would have likely solved this problem too but, I hate reboot fixes of issues, they always seem to come back!
|Print article||This entry was posted by Allen Kinsel on June 17, 2010 at 10:18 am, and is filed under Connections, Ramblings, Security, SQL Server, Syndicated. Follow any responses to this post through RSS 2.0. You can leave a response or trackback from your own site.|
No trackbacks yet.
about 4 months ago - No comments
Came across a strange permission issue today that I thought I’d blog about since it is apparently somewhat common judging by the number of unanswered forum posts about it.
Adding/editing or Testing SSRS Datasources in Report Manager generates a permission error
A developer was trying to add a datasource and use “test connection” in the report manager web interface instead of BIDS which, incidentally
about 10 months ago - No comments
Earlier this week I was working through an issue that I’ve seen one too many times lately and thought Id write about it. For various reasons *cough* Legacy *cough* I often seem to run across columns in tables that are defined with what would appear to be the wrong data type or worse, multiple data
about 1 year ago - 40 comments
Its the first week of June and for those of us living along the Gulf and Atlantic coasts of the US, that brings the beginning of hurricane season. It also means its time for this months installment of T-SQL Tuesday.
This Months Topic
Disaster Recovery. This topic is very near and dear to me based on the fact that
about 2 years ago - 11 comments
After last years Summit we launched a feedback site http://feedback.sqlpass.org in hopes of gathering all of the feedback about the event in one place. The number one thing people have asked for on that site is for there to be a track of sessions in the 400-500 level range. The problems with the community desire
about 2 years ago - 2 comments
Occasionally I’m asked what the most important thing for a DBA to know is. It seems some people want to know what 1 thing to focus on in order to be successful long term as a DBA. I’ve pondered this question for quite a long time and I’ve given various answers over the years.
about 2 years ago - 5 comments
This months TSQL Tuesday is hosted by a good friend Pat right over at SQL Asylum
For this months entry I decided to keep it short and sweet, following in my Bits N Bytes theme.
The Meta Script
In the true sense of the word automation, this really doesn’t fit but, in the terms of quickly getting something
about 2 years ago - 4 comments
This corruption story begins like many. Somebody in a server room far far away decided to make a change to a VMware guest machine and that little change rippled through our poor server like a lady Gaga Meat Dress through the VMA’s. Needless to say, it wasnt pretty. The full set of events may never
about 2 years ago - 1 comment
The other day we were going through some old scripts and had a huge discussion about changing object ownership in SQL. Forever the only answers were to recreate the objects or use the stored procedures: sp_changeobjectowner or sp_changedbowner. Alternatively at times there were those of us who used some other undisclosed hack of system tables
about 2 years ago - No comments
For the new year I have decided to try something different with a series of small regular posts.
Dealing with our large team of developers and our DBA team as well, I often run across interesting discussions about small things that are easily overlooked. Often times these discussions end in a exclamation of “I knew that!!” Of
about 2 years ago - 1 comment
And no, I dont mean in my mothers eyes, thats a given.
I won the Ipad/SQL Monitor Contest that Redgate was sponsoring over on Grant Fritcheys Blog
Im thrilled to have won!! Ive been eyeing Ipads for quite some time but, the novelty & cost has kept me at arms length.
Im always in the market for DBA tools that can make