Had a strange error on a SQL 2008 cluster the other day,
The OS was Windows 2008 R2
We kept getting messages that the cluster node was offline because the Quorum was unavailable. This made little sense as both nodes in this cluster were online and the Quorum disk was available. We could ping across the heartbeat, everything looked fine except for these errors.
After a little research we determined that a new version of Symantec Endpoint Security had been pushed to these servers. Even with the new version of endpoint security, we could establish communication across all networks between the 2 nodes so we were a little stumped. Eventually we ran across a policy that was being enforced from the Symantec central management server/policy/whatever its called!
As it turns out, Symantec endpoint security by default blocks all IPV6 traffic. If you’re like me, I didn’t even realize that a windows 2008 cluster would use IPV6 for the heartbeat communication. After disabling the rules that were preventing IPV6 traffic everything returned to normal.
So, the moral of all this is nothing new… NEVER trust anything new getting pushed to your servers..
The infamous SSPI Failed error strikes again!
One of our SQL servers was generating these errors for “some” Windows logins but not all.
Error: 17806, Severity: 20, State: 2.
SSPI handshake failed with error code 0x8009030c while establishing a connection with integrated security; the connection has been closed. [CLIENT: 192.168.1.1]
Error: 18452, Severity: 14, State: 1.
Login failed for user ”. The user is not associated with a trusted SQL Server connection. [CLIENT: 192.168.1.1]
After exhausting all of the normal troubleshooting for this error (accounts locked, disabled, Sql Service accts, bad connection strings, SPN’s, etc.) I spent the next few hours learning more about the way SQL handles authentication requests than I had ever wanted to know.
The Scenario –
A couple of separate individual Windows ID’s started generating these errors while attempting connections, all other windows logins were working properly. The connections were initially happening through applications, but also occurred through sqlcmd. When logged in to the server locally with the offending ID’s the connections to SQL would succeed.
The Troubleshooting process –
Check all the regular SSPI issues, I wont bore you with the details as they are easily searchable
- A relatively easy way of checking the “easy” authentication issues If possible/appropriate is to log into the SQL Server locally with the offending ID and fire up sqlcmd and connect to the server via sqlcmd –Sservername,port –E (by specifying the port you force TCP/IP instead of LPC, thereby forcing the network into the equation)
Verify whether the login is trying to use NTLM or Kerberos (many ways to do this but simplest is to see if there are any other KERBEROS connections on the machine)
- SELECT DISTINCT auth_scheme FROM sys.dm_exec_connections
- If Kerberos is in use, there are a few additional things to verify related to SPN’s, since only NTLM was in use on this server I skipped that
Determine if the accounts were excluded from connecting to the machine through the network through a group policy or some other AD setting
After all of these checked out OK, I began to try and figure out what the error code 0x8009030c meant, turns out, its fairly obvious what the description is : sec_e_logon_denied. This description was so helpful I thought about making this server into a boat anchor but, luckily for my employer the server room is located many miles away and has armed guards.
Since I knew we could logon locally to the SQL Server with the ID that SQL was rejecting with logon denied something else was trying to make my life miserable.
We didn’t have logon failure security auditing turned on so, I had no way of getting a better error description, As luck would have it though this would prove instrumental in finding the root cause. To get a better error message, I found this handy KB article detailing steps needed to put net logon into debug mode.
Say hello to my new best friend! — nltest.exe
After downloading nltest & using it to enable netlogon debugging on the SQL Server, I got this slightly better message in the netlogon.log file
06/15 14:15:39 [LOGON] SamLogon: Network logon of DOMAIN\USER from Laptop Entered
06/15 14:15:39 [CRITICAL] NlPrintRpcDebug: Couldn’t get EEInfo for I_NetLogonSamLogonEx: 1761 (may be legitimate for 0xc0000064)
06/15 14:15:39 [LOGON] SamLogon: Network logon of DOMAIN\USER from Laptop Returns 0xC0000064
The error code 0XC0000064 maps to “NO_SUCH_USER”
Since I was currently logged in to the server with the ID that was returning no such user, something else was obviously wrong, and luckily at this point I knew it wasn’t SQL.
Running “set log” on the server revealed that a local DC (call it DC1) was servicing the local logon request.
After asking our AD guys about DC1 and its synchronization status, as well as whether the user actually existed there, everything still looked OK.
After looking around a bit more I discovered this gem of a command for nltest to determine which DC will handle a logon request
C:\>nltest /whowill:Domain Account
[16:32:45] Mail message 0 sent successfully (\MAILSLOT\NET\GETDC579)
[16:32:45] Response 0: DC2 D:Domain A:Account (Act found)
The command completed successfully
Even though this command returned “act found” it was returning from DC2. (I dont exactly understand why the same account would authenticate against 2 different DC’s based on a local desktop login or a SQL login but it apparently can)
After asking the AD guys about DC2 the light bulbs apparently went off for them as that server actually exists behind a different set of firewalls, in a totally different location. While DC2 would return a ping, the console wouldn’t allow logons for some reason. After a quick reboot of DC2, and some magic AD pixie dust (I am not an AD admin, if it wasn’t totally obvious from my newfound friend nltest) the windows Id’s that were having trouble started authenticating against DC3 and our SSPI errors went away.
Interesting tidbit — During troubleshooting, I found that this particular SQL Server was authenticating accounts against at least 5 different DC’s. Some of this might be expected since there are different domains at play but, I haven’t heard a final answer from the AD guys about whether it should work that way.
Reboot the misbehaving DC, of course there may be other ways to fix this by redirecting requests to a different DC without a reboot but, since it was misbehaving anyway, and the AD experts wanted to reboot so we went with that. A reboot of SQL would have likely solved this problem too but, I hate reboot fixes of issues, they always seem to come back!
When creating a new application, after going through the entire business analysis & requirements gathering process, normally you wind up with a datamodel that includes many tables and relationships. By this time, depending on the size of the datamodel/system there has been considerable amounts of time invested on all sides. We need a way of preserving this investment of time while still allowing developers to do their thing!
Most shops have policies in place for what level of access developers can have in each environment. In many places I’ve seen, developers are allowed DBO access in development, and some lesser access in the higher environments (read only usually).
After you’ve deployed the datamodel to the physical database in a development environment, before you grant the developer group dbo access consider all of the time/effort that has been spent making the datamodel what it is. In order to allow the developers to do their jobs but not allow them to modify the actual table/schema layout you can grant a combinations of privileges.
Grant Alter Schema on the schemas where the developers will need to modify database objects (for instance stored procedures and functions)
Grant db_datareader –to allow read access
Grant db_datawriter –to allow write access
Grant Create Procedure, Function, Default, Etc — Allow developers to do whatever you are comfortable with
Deny Create Table in the database –This restricts all Table based DDL
Optional** Deny Create View, Function, Default, in the database — Restrict any create/alter permissions as needed.
Important** Alter Schema permissions will allow Alter of ANY object type in the schema that you havent explicitly used a Deny on
Principle of least privilege
This method has proven effective to allow developers to write Stored procs, Functions & Views while still keeping the actual datamodel (tables and relationships usually) in pristine shape. You could also mix and match your own grants/denys on certain object types to allow for unlimited configuration without granting the almighty DBO. Yes, you might say that I’m a paranoid DBA who restricts permissions even in DEV! Of course my great developers would never change a modeled database thereby forcing my hand into figuring out this lockdown of privileges
What is the DAC?
The Dedicated Admin Connection, Commonly called the DAC is used to manage SQL Server when a regular connection wont succeed. Here’s what SQL Books Online (BOL) has to say about the DAC “This diagnostic connection allows an administrator to access SQL Server to execute diagnostic queries and troubleshoot problems even when SQL Server is not responding to standard connection requests.”
Occasionally, while troubleshooting SQL servers in a large environment, especially one thats managed from many different geographic locations you could come up with this error, if more than 1 person is using the DAC. It should also be noted this only happens if you have remote DAC enabled in your environment
Could not connect because the maximum number of ’1′ dedicated administrator connections already exists. Before a new connection can be made, the existing dedicated administrator connection must be dropped, either by logging off or ending the process. [CLIENT: 127.0.0.1]
Since I could still connect with a regular connection currently, I set out looking for a query to determine who was using the DAC connection. I whipped this up, and since I couldnt find anything in search, I thought id blog it
select conn.session_id, sess.login_name, sess.nt_domain, sess.nt_user_name, conn.connect_time, conn.last_read, conn.last_write, sess.host_name, conn.client_net_address
from sys.dm_exec_connections conn
join sys.endpoints edp
on conn.endpoint_id = edp.endpoint_id
join sys.dm_exec_sessions sess
on sess.session_id = conn.session_id
where edp.is_admin_endpoint = 1
This should return everything you need to know about who is using your DAC connection so you can ask them to disconnect, or KILL their connection.
Sometimes login mapping issues exist where you least expect them
This is not the traditional SQL login SID mismap issue that is frequently encountered and discussed here. This mismap was a new one on me so I thought id document it.
When trying to add a new Windows login for SQL Server 2005 or 2008 (probably earlier versions but not tested) you may wind up with this error message
Msg 15025, Level 16, State 2, Line 1
The server principal 'DXXXX\UXXX' already exists.
Contrary to the error message, If we need to proove the account doesnt exist, the following code should suffice
select name from sys.server_principals where name = ‘DXXXX\UXXX’
SQL wont let you create an account because it thinks it already exists, but clearly the account doesnt exist.
How did this happen?
If a login was created previously for a windows account that has since been renamed in the active directory you cannot grant the new userid access to the SQL server because the SID already exists in SQL and you cannot duplicate it. This occurs when renaming an account in Active directory, because the SID is reused instead of recreated.
Find the pre-existing SID
The SID from Active directory can be obtained many ways, the one I use is PsGetsid which can be obtained here the syntax you’d use for this is
PsGetSid DOMAIN\USER (or DXXXX\UXXX from earlier)
This will return the SID in the SDDL Format of S-x-x-x-x
Unfortunately SQL Stores the binary representation of the SID not the SDDL format. You have several choices to do the mapping, you can hand figure out the SDDL mappings from sys.server_principals using this page Or the better way -> Todd Engen had found some need to convert this before and designed a great function to do this conversion from binary to SDDL , im only going to list it here for completeness, the original is found here
CREATE FUNCTION fn_SIDToString
@BinSID AS VARBINARY(100)
IF LEN(@BinSID) % 4 <> 0 RETURN(NULL)
DECLARE @StringSID VARCHAR(100)
DECLARE @i AS INT
DECLARE @j AS INT
SELECT @StringSID = 'S-'
+ CONVERT(VARCHAR, CONVERT(INT, CONVERT(VARBINARY, SUBSTRING(@BinSID, 1, 1))))
SELECT @StringSID = @StringSID + '-'
+ CONVERT(VARCHAR, CONVERT(INT, CONVERT(VARBINARY, SUBSTRING(@BinSID, 3, 6))))
SET @j = 9
SET @i = LEN(@BinSID)
WHILE @j < @i
DECLARE @val BINARY(4)
SELECT @val = SUBSTRING(@BinSID, @j, 4)
SELECT @StringSID = @StringSID + '-'
+ CONVERT(VARCHAR, CONVERT(BIGINT, CONVERT(VARBINARY, REVERSE(CONVERT(VARBINARY, @val)))))
SET @j = @j + 4
RETURN ( @StringSID )
After creating this function use it like so
select name from sys.server_principals where dbo.fn_SIDToString(sid)=‘S-X-X-X-X’ where the ‘S-X-X-X-X’ is the SID obtained earlier from PsGetSid
This should return the name of the user that has the offending “duplicate” SID
Once you know the logon ID that has been renamed you can script out the permissions of that ID (hopefully it doesn’t own any objects), drop it, and recreate it with the appropriate name and grant any additional permissions