Have you checked your datatypes lately?
Earlier this week I was working through an issue that I’ve seen one too many times lately and thought Id write about it. For various reasons *cough* Legacy *cough* I often seem to run across columns in tables that are defined with what would appear to be the wrong data type or worse, multiple data elements stuffed into one column!
In this case the table held various account codes in a char(06) column. These codes were things like ‘001’, ‘002’,’A’, ‘B’ etc
Each account code was for use in a different system. so, on system X it was an integer and system Y the codes were character based, etc
This data was used to join many different tables. Initially these joins were small and straightforward with predicates restricting the code table results to only include the proper data for the system being used.
pseudo code looks roughly like this
SET DATADESC = CODE.MORE_DATADESC
ON DATA.CODE_KEY = CODE.CODE_KEY
ON META_KEY = CODE.CODE_KEY
AND DATESTAMP > = ‘9999-12-31’
WHERE DATA IS NULL
AND CODE.DATATYPE = ‘ACCT’
This is the execution plan from a typical update of 150Million rows
Notice the Metadata table and Codes table are joined before the data table so the resulting codes data is only going to include the 16 rows of smallint data which passes the implicit convert to smallint
As the rowcounts increased and joins changed the implicit data type conversions continued to work until one day they didn’t. Instead they started returning this error message
Msg 245, Level 16, State 1, Line 2
Conversion failed when converting the varchar value ‘B ‘ to data type smallint”
After some digging it turns out the optimizer had started using a different plan where it joined the tables together in a different order.
Heres that execution plan of the same update
Notice the Metadata table join is reordered and happens after the codes table is joined to the data table. When the data table is joined to the code table the error above occurs as character data in codes cant be converted to smallint during the join
To overcome this problem there are a couple of choices (admittedly there are more than a couple so this list isnt inclusive)
Best –> Change the source table to account for having different source systems account code types in different columns with the correct datatypes. Benefits are straightforward, the BIG drawback is the extra work for system redesign
Better –> Change the queries to use an explicit conversion on the joined columns changing the datatype on the data table from smallint to char(06). The benefit is the small amount of change needed, the drawback is the extra bytes involved per row joined
Good OK –> Add a query hint to force the join order. The benefit is the small amount of change needed, the drawback is in my mind we’re now forcing the optimizer to take a different path than it would if left to its own. I try to avoid hints unless there arent other options and in this case there are.
Other things that could be considered as solutions include temp tables, CTE’s, etc. all of which were more change then we wanted to pursue testing.
Comments are closed.