Through the magic of the system tables in SQL Server: sysobjects, systables, and syscolumns, I’ve found a new favorite query.
ETL packages must account for trimming extra pads placed on data fields from the source. This is usually a problem from data sourced from flat files, especially fixed-width files. Some may argue that not addressing these issues results in data quality issues. I just prefer that things are perfect for future users. (and yes, data quality is on my mind too)
The following query will dynamically generate a query for every column in a table. The each query will confirm the max length of a field versus the trimmed max length. This results in a huge time savings from manually typing everything. Enjoy!
— SQL Server: generate column trimmings SQL
‘select ”’+so.name+‘.’+sc.name+”’,sum(case when datalength(‘+sc.name+’) > datalength(ltrim(rtrim(‘+sc.name+‘))) then 1 else 0 end) as err_count from ‘+so.name+‘;’
from sysobjects so, syscolumns sc, systypes st
sc.id = so.id
and sc.type = st.type
and so.name in (‘individual’)
order by sc.name
** Note: The query above refers to ‘individual’ as a specific table name. You’ll want to change that based on your validation.
Update: The SQL Server function DATALENGTH() should be used instead of LEN(). LEN() will automatically rtrim a column, thus giving you a false positive. This behavior may be limited to VARCHAR columns, but be warned.