Sunday, January 14, 2024

SQL remove invisible characters

You will need a combination of 3 functions 

1. PATINDEX: PATINDEX is used to find a position in a string where a specified character or pattern exists. In our chase we have specified all characters from ascii value 0 to 31, 127 and 255. 

2. SUBSTRING: Once the PATINDEX function return the location where one of the invisible characters is found, substring function extracts that 1 length string to isolate the invisible character. 

3. REPLACE: This function then replaces all occurrences of the invisible character found and replaces it with a zero length string.

select REPLACE([Text], SUBSTRING([Text], PATINDEX('%[' + CHAR(0)+ '-' +CHAR(31) + CHAR(127)+ '-' +CHAR(255)+']%'  , [Text] COLLATE Latin1_General_100_BIN2 ), 1 ), ' ') [Text] 
from DataExtract

Please note that casting the column to be replaced with binary collation is required else some of the characters are not parsed.  


As the above statement only replaces all occurences of the the first invisible character found. It might be more meaningful to extract this data into a table and update the text column replacement until there are no more rows found with invisible characters 


update DataExtract
set Text = REPLACE([Text], SUBSTRING([Text], PATINDEX('%[' + CHAR(0)+ '-' +CHAR(31) + CHAR(127)+ '-' +CHAR(255)+']%'  , [Text] COLLATE Latin1_General_100_BIN2 ), 1 ), ' ')
where  PATINDEX('%[' + CHAR(0)+ '-' +CHAR(31) + CHAR(127)+ '-' +CHAR(255)+']%'  , [Text] COLLATE Latin1_General_100_BIN2 ) > 0


Sometimes a value starting with a double quote '"' might cause an issue when one tries to paste the data in Excel, as excel tries to interpret a start and end of text column using double quote as a delimiter. So if you are copying and pasting values in excel make sure that the column does not start with a double quote else best is to replace such occurrences:

Update DataExtract
set  [Text] = SUBSTRING( [Text], 2, 10000 )
where left( [Text], 1 ) = '"'


No comments: