如何使用sql server中的查找表替换列中的子字符串 [英] How to replace substring in a column using lookup table in sql server

查看:26
本文介绍了如何使用sql server中的查找表替换列中的子字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个表,它有两列数据代码点,如 .这些代码点需要用日语字符更改.我有一个包含日文字符的这些代码点的查找表.但问题是在两列中,单行中有多个代码点.

I have one table which has two columns with data code points like . These code points need to be changed with japanese characters. I have a lookup table of these code points with the japanese characters. But the problem is in both of the columns there are multiple code points in single rows.

主表:-

Id    body                                      subject
1    <U+9876> Hi <U+1234>No <U+6543>           <U+9876> Hi <U+1234>No <U+6543>
2    <U+9826> <U+5678><U+FA32> data            <U+9006> <U+6502>

查找表:-

char     value
<U+9876>  だ
<U+9826>  づ

我尝试在内部联接中使用 like 运算符创建更新查询,但由于主表中有 14k 行,查找表中有 6K 值,因此需要花费大量时间.

I tried creating an update query with like operator in inner join but it is taking lot of time as we have 14k rows in main table and 6K values in lookup tables.

推荐答案

如果性能真的很重要,您需要提前物化数据.这可以通过创建单独的表并使用触发器或修改填充原始表的例程来完成.如果您没有批量插入/更新记录,则不会损害 CRUD 执行时间.

If the performance really matters you need to materialized the data in advanced. This can be done creating separate table and using a trigger or modifying the routine that is populating the original table. If you are records are not inserted/updated on batches you will not harm the CRUD execution time.

您可以轻松创建一个漂亮的短 T-SQL 语句来构建动态代码以执行 6K 更新,因此您也可以尝试一下 - 不要使用 LIKE 或复杂条件 -对于每个查找值,只是简单的 UPDATE-REPLACE 语句.

You can easy create a good looking short T-SQL statement for building dynamic code for performing 6K updates, so you can give this a shot, too - don't use LIKE or complex conditions - just simple UPDATE-REPLACE statements for each lookup value.

在某些情况下,我使用 SQL CLR 函数进行此类替换.例如:

In some cases, I am using SQL CLR functions for such replaces. For example:

DECLARE @Main TABLE
(
    [id] TINYINT
   ,[body] NVARCHAR(MAX)
   ,[subject] NVARCHAR(MAX)
);

DECLARE @Lookup TABLE
(
    [id] TINYINT -- you can use row_number to order
   ,[char] NVARCHAR(32)
   ,[value] NVARCHAR(32)
);

INSERT INTO @Main ([id], [body], [subject])
VALUES (1, '<U+9876> Hi <U+1234>No <U+6543>', '<U+9876> Hi <U+1234>No <U+6543>')
      ,(2, '<U+9826> <U+5678><U+FA32> data', '<U+9006> <U+6502>');

INSERT INTO @Lookup ([id], [char], [value])
VALUES (1, '<U+9876>', N'だ')
      ,(2, '<U+9826>', N'づ');

DECLARE @Pattern NVARCHAR(MAX)
       ,@Replacement NVARCHAR(MAX);

SELECT @Pattern = [dbo].[ConcatenateWithOrderAndDelimiter] ([id], [char], '|')
      ,@Replacement = [dbo].[ConcatenateWithOrderAndDelimiter] ([id], [value], '|')
FROM @Lookup;


UPDATE @Main
SET [body] = [dbo].[fn_Utils_ReplaceStrings] ([body], @Pattern, @Replacement, '|')
   ,[subject] = [dbo].[fn_Utils_ReplaceStrings] ([subject], @Pattern, @Replacement, '|');

 SELECT [id]
       ,[body]
       ,[subject]
 FROM @Main;        

我向您展示了下面的代码,但这只是一个想法.您可以自由地自己实现一些东西,以满足您的性能要求.

I am showing you the code behind below, but this is just an idea. You are free to implement something on your own, which satisfy your performance requirements.

此处,可以看到 SQL CLR 函数是如何创建的.这里,是聚合函数的一个变体,按顺序连接:

Here, you can see how SQL CLR function is created. Here, is a variant of aggregate function concatenating with order:

[Serializable]
[
    Microsoft.SqlServer.Server.SqlUserDefinedAggregate
    (
        Microsoft.SqlServer.Server.Format.UserDefined,
        IsInvariantToNulls = true,
        IsInvariantToDuplicates = false,
        IsInvariantToOrder = false,
        IsNullIfEmpty = false,
        MaxByteSize = -1
    )
]
/// <summary>
/// Concatenates <int, string, string> values defining order using the specified number and using the given delimiter
/// </summary>
public class ConcatenateWithOrderAndDelimiter : Microsoft.SqlServer.Server.IBinarySerialize
{
    private List<Tuple<int, string>> intermediateResult;
    private string delimiter;
    private bool isDelimiterNotDefined;

    public void Init()
    {
        this.delimiter = ",";
        this.isDelimiterNotDefined = true;
        this.intermediateResult = new List<Tuple<int, string>>();
    }

    public void Accumulate(SqlInt32 position, SqlString text, SqlString delimiter)
    {
        if (this.isDelimiterNotDefined)
        {
            this.delimiter = delimiter.IsNull ? "," : delimiter.Value;
            this.isDelimiterNotDefined = false;
        }

        if (!(position.IsNull || text.IsNull))
        {
            this.intermediateResult.Add(new Tuple<int, string>(position.Value, text.Value));
        }
    }

    public void Merge(ConcatenateWithOrderAndDelimiter other)
    {
        this.intermediateResult.AddRange(other.intermediateResult);
    }

    public SqlString Terminate()
    {
        this.intermediateResult.Sort();
        return new SqlString(String.Join(this.delimiter, this.intermediateResult.Select(tuple => tuple.Item2)));
    }

    public void Read(BinaryReader r)
    {
        if (r == null) throw new ArgumentNullException("r");

        int count = r.ReadInt32();
        this.intermediateResult = new List<Tuple<int, string>>(count);

        for (int i = 0; i < count; i++)
        {
            this.intermediateResult.Add(new Tuple<int, string>(r.ReadInt32(), r.ReadString()));
        }

        this.delimiter = r.ReadString();
    }

    public void Write(BinaryWriter w)
    {
        if (w == null) throw new ArgumentNullException("w");

        w.Write(this.intermediateResult.Count);
        foreach (Tuple<int, string> record in this.intermediateResult)
        {
            w.Write(record.Item1);
            w.Write(record.Item2);
        }
        w.Write(this.delimiter);
    }
}

这是执行替换的函数的一种变体:

Here is one variant of function performing replacement:

[SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true)]
public static SqlString ReplaceStrings( SqlString input, SqlString pattern, SqlString replacement, SqlString separator ){
    string output = null;
    if(
        input.IsNull == false
        && pattern.IsNull == false
        && replacement.IsNull == false
    ){
        StringBuilder tempBuilder = new StringBuilder( input.Value );

        if( separator.IsNull || String.IsNullOrEmpty( separator.Value ) ){
            tempBuilder.Replace( pattern.Value, replacement.Value );
        }
        else{
            //both must have the exact number of elements
            string[] vals = pattern.Value.Split( new[]{separator.Value}, StringSplitOptions.None ),
                newVals = replacement.Value.Split( new[]{separator.Value}, StringSplitOptions.None );

            for( int index = 0, count = vals.Length; index < count; index++ ){
                tempBuilder.Replace( vals[ index ], newVals[ index ] );
            }
        }

        output = tempBuilder.ToString();
    }

    return output;
}

或者这个但是使用正则表达式:

or this one but using regex:

[SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true, Name = "RegexReplaceStrings")]
public static SqlString ReplaceStrings(SqlString sqlInput, SqlString sqlPattern, SqlString sqlReplacement, SqlString sqlSeparator)
{
    string returnValue = "";

    // if any of the input parameters is "NULL" no replacement is performed at all
    if (sqlInput.IsNull || sqlPattern.IsNull || sqlReplacement.IsNull || sqlSeparator.IsNull)
    {
        returnValue = sqlInput.Value;
    }
    else
    {
        string[] patterns = sqlPattern.Value.Split(new string[] {sqlSeparator.Value}, StringSplitOptions.None);
        string[] replacements = sqlReplacement.Value.Split(new string[] { sqlSeparator.Value }, StringSplitOptions.None);

        var map = new Dictionary<string, string>();

        // The map structure is populated with all values from the "patterns" array as if no corresponding value exists
        // in the "replacements" array the current value from the "pattern" array is used a a replacement value. The
        // result is no replacement is done in the "sqlInput" string if the given "pattern" is matched.
        for (int index = 0; index < patterns.Length; index++)
        {
            map[patterns[index]] = index < replacements.Length ? replacements[index] : patterns[index];
        }

        returnValue = Regex.Replace(sqlInput.Value, String.Join("|", patterns.Select(patern => Regex.Replace(patern, @"\(|\)|\||\.", @"\$&")).OrderByDescending(patern => patern.Length).ToArray()), match =>
        {
            string currentValue;

            if (!map.TryGetValue(match.Value, out currentValue))
            {
                currentValue = match.Value;
            }

            return currentValue; 
        });
    }

    return new SqlString(returnValue);
}

这篇关于如何使用sql server中的查找表替换列中的子字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆