Microsoft SQL 2005中的自然(人类字母数字)排序 [英] Natural (human alpha-numeric) sort in Microsoft SQL 2005

查看:73
本文介绍了Microsoft SQL 2005中的自然(人类字母数字)排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个大型数据库,在数据库上可以进行DB端分页.这很快,只需几秒钟即可返回数百万条记录中的50行的页面.

We have a large database on which we have DB side pagination. This is quick, returning a page of 50 rows from millions of records in a small fraction of a second.

用户可以定义自己的排序方式,基本上选择要作为排序依据的列.列是动态的-一些具有数字值,一些日期和一些文本.

Users can define their own sort, basically choosing what column to sort by. Columns are dynamic - some have numeric values, some dates and some text.

大多数按预期方式排序的文本均以愚蠢的方式排序.好吧,我说这很愚蠢,对计算机来说很有意义,但会让用户感到沮丧.

While most sort as expected text sorts in a dumb way. Well, I say dumb, it makes sense to computers, but frustrates users.

例如,按字符串记录ID排序将显示类似以下内容:

For instance, sorting by a string record id gives something like:

rec1
rec10
rec14
rec2
rec20
rec3
rec4

...等等.

我希望它考虑到这个数字,所以:

I want this to take account of the number, so:

rec1
rec2
rec3
rec4
rec10
rec14
rec20

我无法控制输入(否则我只能以前导000的格式进行格式化),并且我不能依赖单一格式-有些内容例如"{alpha code}-{dept code}-{rec id }".

I can't control the input (otherwise I'd just format in leading 000s) and I can't rely on a single format - some are things like "{alpha code}-{dept code}-{rec id}".

我知道在C#中执行此操作的几种方法,但是不能拉下所有记录来对它们进行排序,因为那样会很慢.

I know a few ways to do this in C#, but can't pull down all the records to sort them, as that would be to slow.

有人知道一种在SQL Server中快速应用自然排序的方法吗?

Does anyone know a way to quickly apply a natural sort in Sql server?

我们正在使用:

ROW_NUMBER() over (order by {field name} asc)

然后我们按此进行分页.

And then we're paging by that.

我们可以添加触发器,尽管不能.他们所有的输入都是经过参数设置的,但我无法更改格式-如果将它们分别放入"rec2"和"rec10",则它们会以自然顺序返回.

We can add triggers, although we wouldn't. All their input is parametrised and the like, but I can't change the format - if they put in "rec2" and "rec10" they expect them to be returned just like that, and in natural order.

我们有有效的用户输入,这些输入针对不同的客户端采用不同的格式.

We have valid user input that follows different formats for different clients.

一个人可能会去rec1,rec2,rec3,... rec100,rec101

One might go rec1, rec2, rec3, ... rec100, rec101

另一个可能会出现:grp1rec1,grp1rec2,... grp20rec300,grp20rec301

While another might go: grp1rec1, grp1rec2, ... grp20rec300, grp20rec301

当我说我们无法控制输入时,我的意思是我们不能强迫用户更改这些标准-它们具有类似grp1rec1的值,并且我无法将其重新格式化为grp01rec001,因为那将改变所使用的内容用于查找并链接到外部系统.

When I say we can't control the input I mean that we can't force users to change these standards - they have a value like grp1rec1 and I can't reformat it as grp01rec001, as that would be changing something used for lookups and linking to external systems.

这些格式差异很大,但通常是字母和数字的混合.

These formats vary a lot, but are often mixtures of letters and numbers.

在C#中对它们进行排序很容易-只需将其分解为{ "grp", 20, "rec", 301 },然后依次比较序列值即可.

Sorting these in C# is easy - just break it up into { "grp", 20, "rec", 301 } and then compare sequence values in turn.

但是,可能有数百万条记录并且分页了数据,我需要在SQL Server上进行排序.

However there may be millions of records and the data is paged, I need the sort to be done on the SQL server.

SQL服务器按值排序,而不是比较-在C#中,我可以将值拆分出来进行比较,但是在SQL中,我需要一些逻辑(非常快速地)获得一个始终排序的单个值的逻辑.

SQL server sorts by value, not comparison - in C# I can split the values out to compare, but in SQL I need some logic that (very quickly) gets a single value that consistently sorts.

@moebius-您的答案可能有用,但是为所有这些文本值添加一个排序键确实让人感到不妥.

@moebius - your answer might work, but it does feel like an ugly compromise to add a sort-key for all these text values.

推荐答案

我看到的大多数基于SQL的解决方案都在数据变得足够复杂(例如其中一个或两个以上的数字)时中断.最初,我尝试在T-SQL中实现满足我的要求的NaturalSort函数(除了其他功能,还可以处理字符串中任意数量的数字),但是性能 way 太慢了.

Most of the SQL-based solutions I have seen break when the data gets complex enough (e.g. more than one or two numbers in it). Initially I tried implementing a NaturalSort function in T-SQL that met my requirements (among other things, handles an arbitrary number of numbers within the string), but the performance was way too slow.

最终,我用C#写了一个标量CLR函数,以实现自然排序,即使使用未经优化的代码,从SQL Server调用它的性能也令人眼花fast乱.具有以下特点:

Ultimately, I wrote a scalar CLR function in C# to allow for a natural sort, and even with unoptimized code the performance calling it from SQL Server is blindingly fast. It has the following characteristics:

  • 将正确排序前1,000个左右的字符(可以轻松地在代码中修改或设置为参数)
  • 正确地对小数进行排序,因此123.333排在123.45之前
  • 由于上述原因,可能无法正确排序IP地址之类的内容;如果您希望其他行为,请修改代码
  • 支持对其中包含任意数字的字符串进行排序
  • 将正确地对不超过25位数字的数字进行排序(可以轻松地在代码中修改或设置为参数)

代码在这里:

using System;
using System.Data.SqlTypes;
using System.Text;
using Microsoft.SqlServer.Server;

public class UDF
{
    [SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic=true)]
    public static SqlString Naturalize(string val)
    {
        if (String.IsNullOrEmpty(val))
            return val;

        while(val.Contains("  "))
            val = val.Replace("  ", " ");

        const int maxLength = 1000;
        const int padLength = 25;

        bool inNumber = false;
        bool isDecimal = false;
        int numStart = 0;
        int numLength = 0;
        int length = val.Length < maxLength ? val.Length : maxLength;

        //TODO: optimize this so that we exit for loop once sb.ToString() >= maxLength
        var sb = new StringBuilder();
        for (var i = 0; i < length; i++)
        {
            int charCode = (int)val[i];
            if (charCode >= 48 && charCode <= 57)
            {
                if (!inNumber)
                {
                    numStart = i;
                    numLength = 1;
                    inNumber = true;
                    continue;
                }
                numLength++;
                continue;
            }
            if (inNumber)
            {
                sb.Append(PadNumber(val.Substring(numStart, numLength), isDecimal, padLength));
                inNumber = false;
            }
            isDecimal = (charCode == 46);
            sb.Append(val[i]);
        }
        if (inNumber)
            sb.Append(PadNumber(val.Substring(numStart, numLength), isDecimal, padLength));

        var ret = sb.ToString();
        if (ret.Length > maxLength)
            return ret.Substring(0, maxLength);

        return ret;
    }

    static string PadNumber(string num, bool isDecimal, int padLength)
    {
        return isDecimal ? num.PadRight(padLength, '0') : num.PadLeft(padLength, '0');
    }
}

要注册它以便可以从SQL Server调用它,请在查询分析器中运行以下命令:

To register this so that you can call it from SQL Server, run the following commands in Query Analyzer:

CREATE ASSEMBLY SqlServerClr FROM 'SqlServerClr.dll' --put the full path to DLL here
go
CREATE FUNCTION Naturalize(@val as nvarchar(max)) RETURNS nvarchar(1000) 
EXTERNAL NAME SqlServerClr.UDF.Naturalize
go

然后,您可以像这样使用它:

Then, you can use it like so:

select *
from MyTable
order by dbo.Naturalize(MyTextField)

注意:如果您在SQL Server中遇到错误,请执行 .NET Framework中的用户代码执行被禁用.启用启用了clr"配置选项.,请按照此处以启用它.确保这样做之前先考虑安全隐患.如果您不是数据库管理员,请确保在与服务器配置进行任何更改之前与管理员进行讨论.

Note: If you get an error in SQL Server along the lines of Execution of user code in the .NET Framework is disabled. Enable "clr enabled" configuration option., follow the instructions here to enable it. Make sure you consider the security implications before doing so. If you are not the db admin, make sure you discuss this with your admin before making any changes to the server configuration.

注意2 :此代码不正确支持国际化(例如,假定小数点标记为.",未针对速度进行优化等).欢迎提出改进建议!

Note2: This code does not properly support internationalization (e.g., assumes the decimal marker is ".", is not optimized for speed, etc. Suggestions on improving it are welcome!

编辑:由于该函数未进行任何实际排序,因此将该函数重命名为 Naturalize 而不是 NaturalSort .

Renamed the function to Naturalize instead of NaturalSort, since it does not do any actual sorting.

这篇关于Microsoft SQL 2005中的自然(人类字母数字)排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆