在SQL Server中从VARCHAR中删除非数字字符的最快方法 [英] Fastest way to remove non-numeric characters from a VARCHAR in SQL Server

查看:332
本文介绍了在SQL Server中从VARCHAR中删除非数字字符的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个导入实用程序,该实用程序使用电话号码作为导入中的唯一键.

I'm writing an import utility that is using phone numbers as a unique key within the import.

我需要检查数据库中是否不存在电话号码.问题在于数据库中的电话号码可能包含破折号和括号之类的内容,也可能包含其他内容.我写了一个删除这些东西的函数,问题是它,并且数据库中有成千上万条记录,一次要导入成千上万条记录,所以此过程可能会令人难以接受.我已经将电话号码列作为索引.

I need to check that the phone number does not already exist in my DB. The problem is that phone numbers in the DB could have things like dashes and parenthesis and possibly other things. I wrote a function to remove these things, the problem is that it is slow and with thousands of records in my DB and thousands of records to import at once, this process can be unacceptably slow. I've already made the phone number column an index.

我尝试使用这篇文章中的脚本:
T-SQL修剪& nbsp(以及其他非字母数字字符) )

I tried using the script from this post:
T-SQL trim &nbsp (and other non-alphanumeric characters)

但这并没有加快速度.

是否有更快的方法来删除非数字字符?当必须比较10,000到100,000条记录时,可以执行某些操作.

Is there a faster way to remove non-numeric characters? Something that can perform well when 10,000 to 100,000 records have to be compared.

无论做什么都需要执行快速.

Whatever is done needs to perform fast.

更新
鉴于人们的回应,我认为在运行导入实用程序之前,我必须清理字段.

Update
Given what people responded with, I think I'm going to have to clean the fields before I run the import utility.

要回答我在其中编写导入实用程序的问题,它是一个C#应用程序.现在,我正在将BIGINT与BIGINT进行比较,而无需更改数据库数据,而我仍然会因为很少的一组数据(大约2000条记录)而对性能造成冲击.

To answer the question of what I'm writing the import utility in, it is a C# app. I'm comparing BIGINT to BIGINT now, with no need to alter DB data and I'm still taking a performance hit with a very small set of data (about 2000 records).

将BIGINT与BIGINT进行比较会降低速度吗?

Could comparing BIGINT to BIGINT be slowing things down?

我已尽我所能优化了应用程序的代码侧(删除了正则表达式,删除了不必要的数据库调用).尽管我再也无法将SQL视为问题的根源,但我仍然感觉是这样.

I've optimized the code side of my app as much as I can (removed regexes, removed unneccessary DB calls). Although I can't isolate SQL as the source of the problem anymore, I still feel like it is.

推荐答案

我可能会误解,但是您有两组数据要从数据库中当前数据的一个字符串中删除字符串,然后在每次导入时都将一组新的字符串删除

I may misunderstand, but you've got two sets of data to remove the strings from one for current data in the database and then a new set whenever you import.

对于更新现有记录,我只使用SQL,那只需要发生一次.

For updating the existing records, I would just use SQL, that only has to happen once.

但是,SQL并没有针对这种操作进行优化,因为您说过您正在编写导入实用程序,所以我将在导入实用程序本身而不是在SQL中进行这些更新.这将是更好的性能明智的选择.您在用什么编写实用程序?

However, SQL isn't optimized for this sort of operation, since you said you are writing an import utility, I would do those updates in the context of the import utility itself, not in SQL. This would be much better performance wise. What are you writing the utility in?

此外,我可能会完全误解该过程,因此如果不在基地,我深表歉意.

Also, I may be completely misunderstanding the process, so I apologize if off-base.


对于初始更新,如果您使用的是SQL Server 2005,则可以尝试CLR函数.这是一个使用正则表达式的快捷方法.不确定性能如何比较,除了快速测试外,我从来没有使用过它.


For the initial update, if you are using SQL Server 2005, you could try a CLR function. Here's a quick one using regex. Not sure how the performance would compare, I've never used this myself except for a quick test right now.

using System;  
using System.Data;  
using System.Text.RegularExpressions;  
using System.Data.SqlClient;  
using System.Data.SqlTypes;  
using Microsoft.SqlServer.Server;  

public partial class UserDefinedFunctions  
{  
    [Microsoft.SqlServer.Server.SqlFunction]  
    public static SqlString StripNonNumeric(SqlString input)  
    {  
        Regex regEx = new Regex(@"\D");  
        return regEx.Replace(input.Value, "");  
    }  
};  

部署之后,可以使用以下命令进行更新:

After this is deployed, to update you could just use:

UPDATE table SET phoneNumber = dbo.StripNonNumeric(phoneNumber)

这篇关于在SQL Server中从VARCHAR中删除非数字字符的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆