从 SQL Server varchar 列中删除奇怪的字符(带帽子的 A) [英] Remove weird characters ( A with hat) from SQL Server varchar column

查看:22
本文介绍了从 SQL Server varchar 列中删除奇怪的字符(带帽子的 A)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一些奇怪的字符被存储在其中一个表中.它们似乎来自 .csv 提要,所以我对此没有太多控制权.

Hello Kitty 基本款配件套件

如何清理并删除这些字符.我可以在 db 级别或 C# 中执行此操作.

编辑

根据评论中收到的建议.我也在研究我可以做些什么来纠正它在饲料级.这是有关它的更多信息.

  1. Feed 来自第三方.
  2. 我在记事本++中打开提要并检查编码菜单我看到点在 'encode in ansi' 前面,所以我相信这是编码文件
  3. 这就是它在记事本++中的显示方式Hello Kitty必备配件套件"
  4. 不过有一件奇怪的事情.当我在 powershel 中搜索该行时.csv 文件.它提出了这一行.我不觉得这些奇怪那里的字符..

解决方案

您可以使用 .net 正则表达式函数.例如,使用 Regex.Replace:

Regex.Replace(s, @"[^u0000-u007F]", string.Empty);

由于SQL Server 中不支持正则表达式,因此您需要创建一个SQL CLR 函数.可在此处找到有关 SQL Server.net 集成的更多信息:

  • 然后将该类重命名为 StackOverflow 并将以下代码粘贴到其文件中:

    使用 Microsoft.SqlServer.Server;使用系统;使用 System.Collections.Generic;使用 System.Data.SqlTypes;使用 System.Linq;使用 System.Text;使用 System.Text.RegularExpressions;使用 System.Threading.Tasks;公共类 StackOverflow{[SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true, Name = "RegexReplace")]公共静态 SqlString 替换(SqlString sqlInput, SqlString sqlPattern, SqlString sqlReplacement){字符串输入 = (sqlInput.IsNull) ?string.Empty : sqlInput.Value;字符串模式 = (sqlPattern.IsNull) ?string.Empty : sqlPattern.Value;字符串替换 = (sqlReplacement.IsNull) ?string.Empty : sqlReplacement.Value;返回新的 SqlString(Regex.Replace(input, pattern, replacement));}}

  • 现在,构建项目.打开 SQL Server Management Studio.选择您的数据库并替换以下 FROM 子句的路径值以匹配您的 StackOverflow.dll:

    CREATE ASSEMBLY [StackOverflow] FROM 'C:UsersgotqnDesktopStackOverflowStackOverflowinDebugStackOverflow.dll';

  • 最后,创建SQL CLR函数:

    CREATE FUNCTION [dbo].[StackOverflowRegexReplace] (@input NVARCHAR(MAX),@pattern NVARCHAR(MAX), @replacement NVARCHAR(MAX))返回 NVARCHAR(4000)作为外部名称 [StackOverflow].[StackOverflow].[替换]走

  • 您已准备好直接在您的 T-SQL 语句中使用 RegexReplace .net 函数:

     SELECT [dbo].[StackOverflowRegexReplace] ('Hello Kitty EssentialÃ'Â AccessoryÃ'Â Kit', '[^u0000-u007F]', '')//Hello Kitty 必备配件包

    Some weird characters are getting stored in one of the table. They seem to be coming from .csv feeds so I don't have much control over that.

    Hello Kitty Essential Accessory Kit
    

    How can I clean it and remove these characters. I am ok doing it at db level or in C#.

    EDIT

    As per the suggestions received in comments. I am also looking into what I can do to correct it at feed level. Here's more info on it.

    1. Feeds are from third party.
    2. I opened feed in notepad++ and checked the encoding menu I see dot in front of 'encode in ansi' so I believe that's the encoding of the file
    3. And that's how it appears in notepad++ "Hello Kitty Essential Accessory Kit"
    4. One strange thing though. when I search that row in powershel from csv file. and it comes up with the row. I don't see these weird characters there..

    解决方案

    You can use .net regular expression functions. For example, using Regex.Replace:

    Regex.Replace(s, @"[^u0000-u007F]", string.Empty);
    

    As there is no support for regular expressions in SQL Server you need to create a SQL CLR function. More information about the .net integration in SQL Server can be found here:


    In your case:

    1. Open Visual Studio and create Class Library Project:

    2. Then rename the class to StackOverflow and paste the following code in its file:

      using Microsoft.SqlServer.Server;
      using System;
      using System.Collections.Generic;
      using System.Data.SqlTypes;
      using System.Linq;
      using System.Text;
      using System.Text.RegularExpressions;
      using System.Threading.Tasks;
      
      public class StackOverflow
      {
          [SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true, Name = "RegexReplace")]
          public static SqlString Replace(SqlString sqlInput, SqlString sqlPattern, SqlString sqlReplacement)
          {
              string input = (sqlInput.IsNull) ? string.Empty : sqlInput.Value;
              string pattern = (sqlPattern.IsNull) ? string.Empty : sqlPattern.Value;
              string replacement = (sqlReplacement.IsNull) ? string.Empty : sqlReplacement.Value;
              return new SqlString(Regex.Replace(input, pattern, replacement));
          }
      }
      

    3. Now, build the project. Open the SQL Server Management Studio. Select your database and replace the path value of the following FROM clause to match your StackOverflow.dll:

      CREATE ASSEMBLY [StackOverflow] FROM 'C:UsersgotqnDesktopStackOverflowStackOverflowinDebugStackOverflow.dll';
      

    4. Finally, create the SQL CLR function:

      CREATE FUNCTION [dbo].[StackOverflowRegexReplace] (@input NVARCHAR(MAX),@pattern NVARCHAR(MAX), @replacement NVARCHAR(MAX))
      RETURNS NVARCHAR(4000)
      AS EXTERNAL NAME [StackOverflow].[StackOverflow].[Replace]
      GO
      

    You are ready to use RegexReplace .net function directly in your T-SQL statements:

        SELECT [dbo].[StackOverflowRegexReplace] ('Hello Kitty Essential Accessory Kit', '[^u0000-u007F]', '')
    
        //Hello Kitty Essential Accessory Kit
    

    这篇关于从 SQL Server varchar 列中删除奇怪的字符(带帽子的 A)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆