从 SQL Server varchar 列中删除奇怪的字符(带帽子的 A) [英] Remove weird characters ( A with hat) from SQL Server varchar column
问题描述
一些奇怪的字符被存储在其中一个表中.它们似乎来自 .csv
提要,所以我对此没有太多控制权.
Hello Kitty 基本款配件套件
如何清理并删除这些字符.我可以在 db 级别或 C# 中执行此操作.
编辑
根据评论中收到的建议.我也在研究我可以做些什么来纠正它在饲料级.这是有关它的更多信息.
- Feed 来自第三方.
- 我在记事本++中打开提要并检查编码菜单我看到点在 'encode in ansi' 前面,所以我相信这是编码文件
- 这就是它在记事本++中的显示方式Hello Kitty必备配件套件"
- 不过有一件奇怪的事情.当我在 powershel 中搜索该行时.csv 文件.它提出了这一行.我不觉得这些奇怪那里的字符..
您可以使用 .net
正则表达式函数.例如,使用 Regex.Replace
:
Regex.Replace(s, @"[^u0000-u007F]", string.Empty);
由于SQL Server
中不支持正则表达式,因此您需要创建一个SQL CLR
函数.可在此处找到有关 SQL Server
中 .net
集成的更多信息:
然后将该类重命名为
StackOverflow
并将以下代码粘贴到其文件中:使用 Microsoft.SqlServer.Server;使用系统;使用 System.Collections.Generic;使用 System.Data.SqlTypes;使用 System.Linq;使用 System.Text;使用 System.Text.RegularExpressions;使用 System.Threading.Tasks;公共类 StackOverflow{[SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true, Name = "RegexReplace")]公共静态 SqlString 替换(SqlString sqlInput, SqlString sqlPattern, SqlString sqlReplacement){字符串输入 = (sqlInput.IsNull) ?string.Empty : sqlInput.Value;字符串模式 = (sqlPattern.IsNull) ?string.Empty : sqlPattern.Value;字符串替换 = (sqlReplacement.IsNull) ?string.Empty : sqlReplacement.Value;返回新的 SqlString(Regex.Replace(input, pattern, replacement));}}
现在,构建项目.打开
SQL Server Management Studio
.选择您的数据库并替换以下FROM
子句的路径值以匹配您的StackOverflow.dll
:CREATE ASSEMBLY [StackOverflow] FROM 'C:UsersgotqnDesktopStackOverflowStackOverflowinDebugStackOverflow.dll';
最后,创建
SQL CLR
函数:CREATE FUNCTION [dbo].[StackOverflowRegexReplace] (@input NVARCHAR(MAX),@pattern NVARCHAR(MAX), @replacement NVARCHAR(MAX))返回 NVARCHAR(4000)作为外部名称 [StackOverflow].[StackOverflow].[替换]走
您已准备好直接在您的 T-SQL
语句中使用 RegexReplace
.net
函数:
SELECT [dbo].[StackOverflowRegexReplace] ('Hello Kitty EssentialÃ'Â AccessoryÃ'Â Kit', '[^u0000-u007F]', '')//Hello Kitty 必备配件包
Some weird characters are getting stored in one of the table. They seem to be coming from .csv
feeds so I don't have much control over that.
Hello Kitty Essential Accessory Kit
How can I clean it and remove these characters. I am ok doing it at db level or in C#.
EDIT
As per the suggestions received in comments. I am also looking into what I can do to correct it at feed level. Here's more info on it.
- Feeds are from third party.
- I opened feed in notepad++ and checked the encoding menu I see dot in front of 'encode in ansi' so I believe that's the encoding of the file
- And that's how it appears in notepad++ "Hello Kitty Essential Accessory Kit"
- One strange thing though. when I search that row in powershel from csv file. and it comes up with the row. I don't see these weird characters there..
You can use .net
regular expression functions. For example, using Regex.Replace
:
Regex.Replace(s, @"[^u0000-u007F]", string.Empty);
As there is no support for regular expressions in SQL Server
you need to create a SQL CLR
function. More information about the .net
integration in SQL Server
can be found here:
- String Utility Functions Sample - full working examples
- Stairway to SQLCLR - still in progress
- Introduction to SQL Server CLR Integration - official documentation
In your case:
Open
Visual Studio
and createClass Library Project
:Then rename the class to
StackOverflow
and paste the following code in its file:using Microsoft.SqlServer.Server; using System; using System.Collections.Generic; using System.Data.SqlTypes; using System.Linq; using System.Text; using System.Text.RegularExpressions; using System.Threading.Tasks; public class StackOverflow { [SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true, Name = "RegexReplace")] public static SqlString Replace(SqlString sqlInput, SqlString sqlPattern, SqlString sqlReplacement) { string input = (sqlInput.IsNull) ? string.Empty : sqlInput.Value; string pattern = (sqlPattern.IsNull) ? string.Empty : sqlPattern.Value; string replacement = (sqlReplacement.IsNull) ? string.Empty : sqlReplacement.Value; return new SqlString(Regex.Replace(input, pattern, replacement)); } }
Now, build the project. Open the
SQL Server Management Studio
. Select your database and replace the path value of the followingFROM
clause to match yourStackOverflow.dll
:CREATE ASSEMBLY [StackOverflow] FROM 'C:UsersgotqnDesktopStackOverflowStackOverflowinDebugStackOverflow.dll';
Finally, create the
SQL CLR
function:CREATE FUNCTION [dbo].[StackOverflowRegexReplace] (@input NVARCHAR(MAX),@pattern NVARCHAR(MAX), @replacement NVARCHAR(MAX)) RETURNS NVARCHAR(4000) AS EXTERNAL NAME [StackOverflow].[StackOverflow].[Replace] GO
You are ready to use RegexReplace
.net
function directly in your T-SQL
statements:
SELECT [dbo].[StackOverflowRegexReplace] ('Hello Kitty Essential Accessory Kit', '[^u0000-u007F]', '')
//Hello Kitty Essential Accessory Kit
这篇关于从 SQL Server varchar 列中删除奇怪的字符(带帽子的 A)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!