Earlier this week I was working through an issue that I’ve seen one too many times lately and thought Id write about it. For various reasons *cough* Legacy *cough* I often seem to run across columns in tables that are defined with what would appear to be the wrong data type or worse, multiple data elements stuffed into one column!
In this case the table held various account codes in a char(06) column. These codes were things like ‘001’, ‘002’,’A’, ‘B’ etc
Each account code was for use in a different system. so, on system X it was an integer and system Y the codes were character based, etc
This data was used to join many different tables. Initially these joins were small and straightforward with predicates restricting the code table results to only include the proper data for the system being used.
pseudo code looks roughly like this
SET DATADESC = CODE.MORE_DATADESC
ON DATA.CODE_KEY = CODE.CODE_KEY
ON META_KEY = CODE.CODE_KEY
AND DATESTAMP > = ’9999-12-31′
WHERE DATA IS NULL
AND CODE.DATATYPE = ‘ACCT’
This is the execution plan from a typical update of 150Million rows
Notice the Metadata table and Codes table are joined before the data table so the resulting codes data is only going to include the 16 rows of smallint data which passes the implicit convert to smallint
As the rowcounts increased and joins changed the implicit data type conversions continued to work until one day they didn’t. Instead they started returning this error message
Msg 245, Level 16, State 1, Line 2
Conversion failed when converting the varchar value ‘B ‘ to data type smallint”
After some digging it turns out the optimizer had started using a different plan where it joined the tables together in a different order.
Heres that execution plan of the same update
Notice the Metadata table join is reordered and happens after the codes table is joined to the data table. When the data table is joined to the code table the error above occurs as character data in codes cant be converted to smallint during the join
To overcome this problem there are a couple of choices (admittedly there are more than a couple so this list isnt inclusive)
Best –> Change the source table to account for having different source systems account code types in different columns with the correct datatypes. Benefits are straightforward, the BIG drawback is the extra work for system redesign
Better –> Change the queries to use an explicit conversion on the joined columns changing the datatype on the data table from smallint to char(06). The benefit is the small amount of change needed, the drawback is the extra bytes involved per row joined
Good OK –> Add a query hint to force the join order. The benefit is the small amount of change needed, the drawback is in my mind we’re now forcing the optimizer to take a different path than it would if left to its own. I try to avoid hints unless there arent other options and in this case there are.
Other things that could be considered as solutions include temp tables, CTE’s, etc. all of which were more change then we wanted to pursue testing.
Occasionally I’m asked what the most important thing for a DBA to know is. It seems some people want to know what 1 thing to focus on in order to be successful long term as a DBA. I’ve pondered this question for quite a long time and I’ve given various answers over the years.
Ive come to the conclusion that there really is no one special thing or “skill” that its important for a DBA to know. Sure, there are lots of qualities that tend to make a successful DBA. Things like attention to detail, thirst for knowledge, and an uncanny love of BACON… The problem is as huge as the SQL Server product is becoming it is also becoming increasingly difficult to be an expert in everything. I am actually of the opinion that it is impossible to be an expert in everything SQL Server.
Knowing what you dont know
The most important thing for a DBA to know can be summed up in 3 simple words ” I Dont Know” Why is that important? Because, knowing what you don’t know, and being able to admit it to those around you is by far the most valuable skill that you can poses as a DBA at any skill level.
When in doubt its always good to remember that its ok to say “I don’t know” The Corollary to this statement is of course “Ill find out, or find someone who does know”. Ive seen many small issues blow up into large problems over the years because someone didn’t really understand what was going on and they were afraid to let their coworkers (or boss) know that they don’t understand. In many cases if we’ll just learn to say “I don’t know” some further pain can easily be avoided.