web-dev-qa-db-fra.com

Comment puis-je enlever étendu ASCII caractères d'une chaîne dans T-SQL?

Je dois filtrer (supprimer) les caractères étendus ASCII d'une instruction SELECT dans T-SQL.

J'utilise une procédure stockée pour le faire.

Entrée attendue:

ËËËËeeeeËËËË

Production attendue:

eeee

Tout ce que j'ai trouvé est pour MySQL .

J'utilise :

Microsoft SQL Server Management Studio  11.0.2100.60
Microsoft .NET Framework    4.0.30319.17929
10
Codeman

OK, essayez ceci. Il semble que le même problème existe. Quoi qu'il en soit, vous devez le modifier en fonction de vos besoins.

CREATE FUNCTION RemoveNonASCII 
(
    @nstring nvarchar(255)
)
RETURNS varchar(255)
AS
BEGIN

    DECLARE @Result varchar(255)
    SET @Result = ''

    DECLARE @nchar nvarchar(1)
    DECLARE @position int

    SET @position = 1
    WHILE @position <= LEN(@nstring)
    BEGIN
        SET @nchar = SUBSTRING(@nstring, @position, 1)
        --Unicode & ASCII are the same from 1 to 255.
        --Only Unicode goes beyond 255
        --0 to 31 are non-printable characters
        IF UNICODE(@nchar) between 32 and 255
            SET @Result = @Result + @nchar
        SET @position = @position + 1
    END

    RETURN @Result

END
GO

Découvrez-le sur SqlServerCentral

30
amit kohan

La réponse acceptée est d'utiliser une boucle qui devrait être évitée ...

Ma solution est complètement inclinable, il est facile de créer un fichier UDF (ou peut-être même mieux: un fichier TVF en ligne) à partir de ceci.

L'idée: créer un ensemble de nombres en cours d'exécution (ici, le nombre d'objets dans sys.objects est limité, mais il existe de nombreux exemples montrant comment créer un décompte des nombres à la volée). Dans le second CTE, les chaînes sont scindées en caractères uniques. La sélection finale revient avec la chaîne nettoyée.

DECLARE @tbl TABLE(ID INT IDENTITY, EvilString NVARCHAR(100));
INSERT INTO @tbl(EvilString) VALUES('ËËËËeeeeËËËË'),('ËaËËbËeeeeËËËcË');

WITH RunningNumbers AS
(
    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Nmbr
    FROM sys.objects
)
,SingleChars AS
(
    SELECT tbl.ID,rn.Nmbr,SUBSTRING(tbl.EvilString,rn.Nmbr,1) AS Chr
    FROM @tbl AS tbl
    CROSS APPLY (SELECT TOP(LEN(tbl.EvilString)) Nmbr FROM RunningNumbers) AS rn 
)
SELECT ID,EvilString
      ,(
        SELECT '' + Chr 
        FROM SingleChars AS sc
        WHERE sc.ID=tbl.ID AND ASCII(Chr)<128
        ORDER BY sc.Nmbr
        FOR XML PATH('')
      ) AS GoodString
FROM @tbl As tbl

Le résultat

1   ËËËËeeeeËËËË    eeee
2   ËaËËbËeeeeËËËcË abeeeec

Voici une autre réponse de moi où cette approche est utilisée pour remplacer tous les caractères spéciaux par secure pour obtenir plain latin 

4
Shnugo

Merci de partager votre code.

J'avais besoin de quelque chose de similaire, pas seulement pour nettoyer les adresses électroniques, mais aussi pour filtrer les entrées du site Web des utilisateurs avant qu'elles n'atteignent SAP ERP via des modules d'intégration.

Courant au Brésil, il doit obéir aux accents linguistiques ...

Voici le code résultant. 

Peut-être que cela peut aider quelqu'un un jour, comme cela m'a été fait.

     IF EXISTS
     (
         SELECT * 
           FROM sys.objects 
          WHERE object_id = OBJECT_ID(N'[dbo].[fnRemoveNonASCII]') 
            AND type IN (N'FN')
     )
         DROP FUNCTION dbo.fnRemoveNonASCII
     GO

     CREATE FUNCTION [dbo].[fnRemoveNonASCII] 
     (
         @nstring nvarchar(MAX) 
     )
     RETURNS nvarchar(MAX)
     AS
     BEGIN

         DECLARE @nchar nvarchar(1)      -- individual char in string
         DECLARE @nUnicode nvarchar(3)   -- ASCII for individual char in string
         DECLARE @position int           -- subscript to control loop in the string
         DECLARE @Result nvarchar(MAX)   -- return valus

         SET @Result = ''
         SET @position = 1

         WHILE @position <= LEN(@nstring)

            BEGIN

                --Unicode & ASCII are the same from 1 to 255.
                --Only Unicode goes beyond 255
                --0 to 31 are non-printable characters

                SET @nchar = SUBSTRING(@nstring, @position, 1)

                SET @nUnicode = UNICODE(@nChar) 

                IF @nUnicode = 10 
                   OR @nUnicode = 13
                   OR @nUnicode BETWEEN 32  AND 126
                   OR @nUnicode = 160
                   OR @nUnicode BETWEEN 192 AND 207
                   OR @nUnicode BETWEEN 210 AND 213
                   OR @nUnicode BETWEEN 217 AND 219
                   OR @nUnicode BETWEEN 224 AND 227
                   OR @nUnicode BETWEEN 231 AND 234
                   OR @nUnicode = 236
                   OR @nUnicode = 237
                   OR @nUnicode BETWEEN 242 AND 245
                   OR @nUnicode = 247
                   OR @nUnicode = 249
                   OR @nUnicode = 250        

                   SET @Result = @Result + @nchar

                ELSE IF @nUnicode = 9 -- TAB
                   SET @Result =  @Result + '  '

                ELSE
                   SET @Result =  @Result + ' '

                SET @position = @position + 1
         END

         RETURN @Result
     END

     /*
     ---------------------------------------------------------------------------------------------------------------

     -- Tabela dos caracteres Unicode/ASCII exportáveis

     &#9;   |   &#x9;      |    %9     =    TAB
     &#10;     |    &#xa;      |    %a     =    0A   Line Feed
     &#13;     |    &#xd;      |    %d     =    0D   Carriage Return
     &#32;      |   &#x20;  |   %20 =   <space>  
     &#33;     |    &#x21;  |   %21 =   !            
     &#34;     |    &#x22;  |   %22 =   "            
     &#35;     |    &#x23;  |   %23 =   #            
     &#36;     |    &#x24;  |   %24 =   $            
     &#37;     |    &#x25;  |   %25 =   %            
     &#38;     |    &#x26;  |   %26 =   &            
     &#39;     |    &#x27;  |   %27 =   '            
     &#40;     |    &#x28;  |   %28 =   (            
     &#41;     |    &#x29;  |   %29 =   )            
     &#42;     |    &#x2a;  |   %2a =   *            
     &#43;     |    &#x2b;  |   %2b =   +            
     &#44;     |    &#x2c;  |   %2c =   ,            
     &#45;     |    &#x2d;  |   %2d =   -            
     &#46;     |    &#x2e;  |   %2e =   .            
     &#47;     |    &#x2f;  |   %2f =   /            
     &#48;     |    &#x30;  |   %30 =   0            
     &#49;     |    &#x31;  |   %31 =   1            
     &#50;     |    &#x32;  |   %32 =   2            
     &#51;     |    &#x33;  |   %33 =   3            
     &#52;     |    &#x34;  |   %34 =   4            
     &#53;     |    &#x35;  |   %35 =   5            
     &#54;     |    &#x36;  |   %36 =   6            
     &#55;     |    &#x37;  |   %37 =   7            
     &#56;     |    &#x38;  |   %38 =   8            
     &#57;     |    &#x39;  |   %39 =   9            
     &#58;     |    &#x3a;  |   %3a =   :            
     &#59;     |    &#x3b;  |   %3b =   ;            
     &#60;     |    &#x3c;  |   %3c =   <            
     &#61;     |    &#x3d;  |   %3d =   =            
     &#62;     |    &#x3e;  |   %3e =   >            
     &#63;     |    &#x3f;  |   %3f =   ?            
     &#64;     |    &#x40;  |   %40 =   @   
     &#65;     |    &#x41;  |   %41 =   A
     &#66;     |    &#x42;  |   %42 =   B
     &#67;     |    &#x43;  |   %43 =   C
     &#68;     |    &#x44;  |   %44 =   D
     &#69;     |    &#x45;  |   %45 =   E
     &#70;     |    &#x46;  |   %46 =   F
     &#71;     |    &#x47;  |   %47 =   G
     &#72;     |    &#x48;  |   %48 =   H
     &#73;     |    &#x49;  |   %49 =   I
     &#74;     |    &#x4a;  |   %4a =   J
     &#75;     |    &#x4b;  |   %4b =   K
     &#76;     |    &#x4c;  |   %4c =   L
     &#77;     |    &#x4d;  |   %4d =   M
     &#78;     |    &#x4e;  |   %4e =   N
     &#79;     |    &#x4f;  |   %4f =   O 
     &#80;     |    &#x50;  |   %50 =   P
     &#81;     |    &#x51;  |   %51 =   Q
     &#82;     |    &#x52;  |   %52 =   R
     &#83;     |    &#x53;  |   %53 =   S
     &#84;     |    &#x54;  |   %54 =   T
     &#85;     |    &#x55;  |   %55 =   U
     &#86;     |    &#x56;  |   %56 =   V
     &#87;     |    &#x57;  |   %57 =   W
     &#88;     |    &#x58;  |   %58 =   X
     &#89;     |    &#x59;  |   %59 =   Y
     &#90;     |    &#x5a;  |   %5a =   Z
     &#91;     |    &#x5b;  |   %5b =   [
     &#92;     |    &#x5c;  |   %5c =   \
     &#93;     |    &#x5d;  |   %5d =   ]
     &#94;     |    &#x5e;  |   %5e =   ^
     &#95;     |    &#x5f;  |   %5f =   _
     &#96;     |    &#x60;  |   %60 =   `
     &#97;     |    &#x61;  |   %61 =   a
     &#98;     |    &#x62;  |   %62 =   b
     &#99;     |    &#x63;  |   %63 =   c
     &#100; |   &#x64;  |   %64 =   d
     &#101; |   &#x65;  |   %65 =   e
     &#102; |   &#x66;  |   %66 =   f
     &#103; |   &#x67;  |   %67 =   g
     &#104; |   &#x68;  |   %68 =   h
     &#105; |   &#x69;  |   %69 =   i
     &#106; |   &#x6a;  |   %6a =   j
     &#107; |   &#x6b;  |   %6b =   k
     &#108; |   &#x6c;  |   %6c =   l
     &#109; |   &#x6d;  |   %6d =   m
     &#110; |   &#x6e;  |   %6e =   n
     &#111; |   &#x6f;  |   %6f =   o
     &#112; |   &#x70;  |   %70 =   p
     &#113; |   &#x71;  |   %71 =   q
     &#114; |   &#x72;  |   %72 =   r
     &#115; |   &#x73;  |   %73 =   s
     &#116; |   &#x74;  |   %74 =   t
     &#117; |   &#x75;  |   %75 =   u
     &#118; |   &#x76;  |   %76 =   v
     &#119; |   &#x77;  |   %77 =   w
     &#120; |   &#x78;  |   %78 =   x
     &#121; |   &#x79;  |   %79 =   y
     &#122; |   &#x7a;  |   %7a =   z
     &#123; |   &#x7b;  |   %7b =   {
     &#124; |   &#x7c;  |   %7c =   |
     &#125; |   &#x7d;  |   %7d =   }
     &#126; |   &#x7e;  |   %7e =   ~
     &#160; |   &#xa0;  |   %a0 =   <nbsp>      
     &#192; |   &#xc0;  |   %c0 =   À
     &#193; |   &#xc1;  |   %c1 =   Á
     &#194; |   &#xc2;  |   %c2 =   Â
     &#195; |   &#xc3;  |   %c3 =   Ã
     &#196; |   &#xc4;  |   %c4 =   Ä
     &#197; |   &#xc5;  |   %c5 =   Å
     &#198; |   &#xc6;  |   %c6 =   Æ
     &#199; |   &#xc7;  |   %c7 =   Ç
     &#200; |   &#xc8;  |   %c8 =   È
     &#201; |   &#xc9;  |   %c9 =   É
     &#202; |   &#xca;  |   %ca =   Ê
     &#203; |   &#xcb;  |   %cb =   Ë
     &#204; |   &#xcc;  |   %cc =   Ì
     &#205; |   &#xcd;  |   %cd =   Í
     &#206; |   &#xce;  |   %ce =   Î
     &#207; |   &#xcf;  |   %cf =   Ï
     &#210; |   &#xd2;  |   %d2 =   Ò
     &#211; |   &#xd3;  |   %d3 =   Ó
     &#212; |   &#xd4;  |   %d4 =   Ô
     &#213; |   &#xd5;  |   %d5 =   Õ
     &#217; |   &#xd9;  |   %d9 =   Ù
     &#218; |   &#xda;  |   %da =   Ú
     &#219; |   &#xdb;  |   %db =   Û
     &#224; |   &#xe0;  |   %e0 =   à
     &#225; |   &#xe1;  |   %e1 =   á
     &#226; |   &#xe2;  |   %e2 =   â
     &#227; |   &#xe3;  |   %e3 =   ã
     &#231; |   &#xe7;  |   %e7 =   ç
     &#232; |   &#xe8;  |   %e8 =   è
     &#233; |   &#xe9;  |   %e9 =   é
     &#234; |   &#xea;  |   %ea =   ê
     &#236; |   &#xec;  |   %ec =   ì
     &#237; |   &#xed;  |   %ed =   í
     &#242; |   &#xf2;  |   %f2 =   ò
     &#243; |   &#xf3;  |   %f3 =   ó
     &#244; |   &#xf4;  |   %f4 =   ô
     &#245; |   &#xf5;  |   %f5 =   õ
     &#247; |   &#xf7;  |   %f7 =   ÷
     &#249; |   &#xf9;  |   %f9 =   ù
     &#250; |   &#xfa;  |   %fa =   ú
     */

     GO     
1
AlxSts

Juste en corrigeant le code ci-dessus (ça coupait les DOT)

CREATE FUNCTION [dbo].[RemoveNonASCII] 
(
    @nstring nvarchar(255)
)
RETURNS nvarchar(255)
AS
BEGIN

    DECLARE @Result nvarchar(255)
    SET @Result = ''

    DECLARE @nchar nvarchar(1)
    DECLARE @position int

    SET @position = 1
    WHILE @position <= LEN(@nstring)
    BEGIN
        SET @nchar = SUBSTRING(@nstring, @position, 1)
        --Unicode & ASCII are the same from 1 to 255.
        --Only Unicode goes beyond 255
        --0 to 31 are non-printable characters
        IF (UNICODE(@nchar) between 192 and 198) or (UNICODE(@nchar) between 225 and 230) -- letter A or a with accents
            SET @nchar = 'a'
        IF (UNICODE(@nchar) between 200 and 203) or (UNICODE(@nchar) between 232 and 235) -- letter E or e with accents
            SET @nchar = 'e'
        IF (UNICODE(@nchar) between 204 and 207) or (UNICODE(@nchar) between 236 and 239) -- letter I or i with accents
            SET @nchar = 'i'
        IF (UNICODE(@nchar) between 210 and 214) or (UNICODE(@nchar) between 242 and 246) or (UNICODE(@nchar)=240) -- letter O or o with accents
            SET @nchar = 'o'
        IF (UNICODE(@nchar) between 217 and 220) or (UNICODE(@nchar) between 249 and 252)  -- letter U or u with accents
            SET @nchar = 'u'
        IF (UNICODE(@nchar)=199)  or (UNICODE(@nchar)=231)  -- letter Ç or ç 
            SET @nchar = 'c'
        IF (UNICODE(@nchar)=209)  or (UNICODE(@nchar)=241)  -- letter Ñ or ñ 
            SET @nchar = 'n'
        IF (UNICODE(@nchar) between 45 and 46) or (UNICODE(@nchar) between 48 and 57) or (UNICODE(@nchar)  between 64 and 90) or (UNICODE(@nchar) = 95) or (UNICODE(@nchar)  between 97 and 122)
            SET @Result = @Result + @nchar
        SET @position = @position + 1
    END
    set @Result = lower(@Result) -- e-mails in lower case
    RETURN @Result

END
0
Clicks to Bricks

La solution fournie par Amit Kohan fonctionne, mais était de ralentir pour moi. Avec de grandes chaînes de données, l'itération de chaque caractère n'est pas optimale. J'ai créé la fonction suivante, avec n'est pas très compact mais rapide.

CREATE FUNCTIONdbo.F_StripLowAscii
(
   @Name nvarchar(max)
)
RETURN nvarchar(max) as
BEGIN
DECLARE @Result nvarchar(max) 

    If @Name IS NULL 
      RETURN @Name 

    DECLARE @BlankRange VARCHAR(15) 
    DECLARE @FoundAt INTEGER 

    -- ASCII CHAR #0 needs a special treatment
    SET @BlankRange = '%[' + CHAR(0) + ']%' 
    SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
    WHILE @FoundAt > 0 
    BEGIN 
        SET @name = left(@name, @FoundAt-1 ) + SUBSTRING(@name, @FoundAt+1, LEN(@Name))
        SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
    END 


    SET @BlankRange = '%[' + CHAR(1)+'-'+CHAR(8) + ']%' 
    SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
    WHILE @FoundAt > 0 
    BEGIN 
        SET @name = Replace(@Name, SUBSTRING(@Name, @FoundAt,1),'')
        SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
    END 

    SET @BlankRange = '%[' + CHAR(11)+'-'+CHAR(12) + ']%' 
    SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
    WHILE @FoundAt > 0 
    BEGIN 
        SET @name = Replace(@Name, SUBSTRING(@Name, @FoundAt,1),'')
        SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
    END 

    SET @BlankRange = '%[' + CHAR(14)+'-'+CHAR(31) + ']%' 
    SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
    WHILE @FoundAt > 0 
    BEGIN 
        SET @name = Replace(@Name, SUBSTRING(@Name, @FoundAt,1),'')
        SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
    END 

    RETURN @Name 

END
GO
0
brenkdar