返回一个在两个给定字符串之间排序的新字符串 [英] Return a new string that sorts between two given strings

查看:89
本文介绍了返回一个在两个给定字符串之间排序的新字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出两个字符串a和b,其中a在字典上是< b,我想返回字符串c,使得< c < b。用例是在按此类键排序的数据库中插入节点。您可以根据需要指定a,b和c的格式,只要可以在插入时生成初始值和新值即可。

Given two strings a and b, where a is lexicographically < b, I'd like to return a string c such that a < c < b. The use case is inserting a node in a database sorted by such keys. You can specify the format for a, b, and c if you like, as long as it is possible to generate initial values as well as new values on insert.

推荐答案

最小化字符串长度

如果要使字符串长度最小,可以创建一个按字典顺序位于左右字符串中间的字符串,这样就有空间插入其他字符串,而只能创建更长的字符串如果绝对必要,则为字符串。

If you want to keep the string lengths to a minimum, you could create a string that is lexicographically halfway between the left and right strings, so that there is room to insert additional strings, and only create a longer string if absolutely necessary.

我将假设使用字母[a-z]和字典顺序,其中在'a'之前有一个空格,例如 先于 abc。

I will assume an alphabet [a-z], and a lexicographical ordering where an empty space comes before 'a', so that e.g. "ab" comes before "abc".

基本情况

从字符串的开头复制字符开始,直到遇到第一个差异,可能是两个不同的字符,或者是左字符串的结尾:

You start by copying the characters from the beginning of the strings, until you encounter the first difference, which could be either two different characters, or the end of the left string:

abcde ~ abchi  ->  abc  +  d ~ h  
abc   ~ abchi  ->  abc  +  _ ~ h  

然后通过添加字母中间的字符来创建新字符串在左字符(或字母的开头)和右字符之间:

The new string is then created by appending the character that is halfway in the alphabet between the left character (or the beginning of the alphabet) and the right character:

abcde ~ abchi  ->  abc  +  d ~ h  ->  abcf  
abc   ~ abchi  ->  abc  +  _ ~ h  ->  abcd  

连续字符

如果两个不同的字符在字典上是连续的,请首先复制左侧的字符,然后在左侧字符串中的下一个字符与字母结尾之间的中间添加该字符:

If the two different characters are lexicographically consecutive, first copy the left character, and then append the character halfway between the next character from the left string and the end of the alphabet:

abhs ~ abit  ->  ab  +  h ~ i  ->  abh  +  s ~ _  ->  abhw
abh  ~ abit  ->  ab  +  h ~ i  ->  abh  +  _ ~ _  ->  abhn

如果左字符串中的下一个字符是一个或多个z,则将其复制并在第一个非z字符和字母结尾之间添加字符:

If the next character(s) in the left string are one or more z's, then copy them and append the character halfway between the first non-z character and the end of the alphabet:

abhz   ~ abit  ->  ab  +  h ~ i  ->  abh  +  z ~ _  ->  abhz  +  _ ~ _  ->  abhzn  
abhzs  ~ abit  ->  ab  +  h ~ i  ->  abh  +  z ~ _  ->  abhz  +  s ~ _  ->  abhzw  
abhzz  ~ abit  ->  ab  +  h ~ i  ->  abh  +  z ~ _  ->  ... ->  abhzz  +  _ ~ _  ->  abhzzn

右字符是a或b

永远不要通过在左边的字符串后面添加'a'来创建字符串,因为那样会创建两个按字典顺序连续的字符串,在这两个字符串之间不能再添加任何字符串。解决方案是始终在字母开头和右字符串中的下一个字符之间添加一个附加字符:

You should never create a string by appending an 'a' to the left string, because that would create two lexicographically consecutive strings, inbetween which no further strings could be added. The solution is to always append an additional character, halfway inbetween the beginning of the alphabet and the next character from the right string:

abc  ~ abcah   ->  abc  +  _ ~ a  ->  abca  +  _ ~ h  ->  abcad  
abc  ~ abcab   ->  abc  +  _ ~ a  ->  abca  +  _ ~ b  ->  abcaa  +  _ ~ _  ->  abcaan  
abc  ~ abcaah  ->  abc  +  _ ~ a  ->  abca  +  _ ~ a  ->  abcaa  +  _ ~ h  ->  abcaad  
abc  ~ abcb    ->  abc  +  _ ~ b  ->  abca  +  _ ~ _  ->  abcan

代码示例

下面的代码片段演示了该方法。这有点奇怪,因为JavaScript,但实际上并不复杂。要生成第一个字符串,请使用两个空字符串调用该函数。这将生成字符串 n。要在最左边的字符串之前或之后插入一个字符串,请使用该字符串和一个空字符串调用该函数。

Below is a code snippet which demonstrates the method. It's a bit fiddly because JavaScript, but not actually complicated. To generate a first string, call the function with two empty strings; this will generate the string "n". To insert a string before the leftmost or after the rightmost string, call the function with that string and an empty string.

function midString(prev, next) {
    var p, n, pos, str;
    for (pos = 0; p == n; pos++) {               // find leftmost non-matching character
        p = pos < prev.length ? prev.charCodeAt(pos) : 96;
        n = pos < next.length ? next.charCodeAt(pos) : 123;
    }
    str = prev.slice(0, pos - 1);                // copy identical part of string
    if (p == 96) {                               // prev string equals beginning of next
        while (n == 97) {                        // next character is 'a'
            n = pos < next.length ? next.charCodeAt(pos++) : 123;  // get char from next
            str += 'a';                          // insert an 'a' to match the 'a'
        }
        if (n == 98) {                           // next character is 'b'
            str += 'a';                          // insert an 'a' to match the 'b'
            n = 123;                             // set to end of alphabet
        }
    }
    else if (p + 1 == n) {                       // found consecutive characters
        str += String.fromCharCode(p);           // insert character from prev
        n = 123;                                 // set to end of alphabet
        while ((p = pos < prev.length ? prev.charCodeAt(pos++) : 96) == 122) {  // p='z'
            str += 'z';                          // insert 'z' to match 'z'
        }
    }
    return str + String.fromCharCode(Math.ceil((p + n) / 2)); // append middle character
}

var strings = ["", ""];
while (strings.length < 100) {
    var rnd = Math.floor(Math.random() * (strings.length - 1));
    strings.splice(rnd + 1, 0, midString(strings[rnd], strings[rnd + 1]));
    document.write(strings + "<br>");
}

下面是对C的直接转换。用空的以null终止的字符串调用该函数以生成第一个字符串,或者在最左边或最右边的字符串之后插入。字符串缓冲区 buf 应该足够大以容纳一个额外的字符。

Below is a straightforward translation into C. Call the function with empty null-terminated strings to generate the first string, or insert before the leftmost or after the rightmost string. The string buffer buf should be large enough to accomodate one extra character.

int midstring(const char *prev, const char *next, char *buf) {
    char p = 0, n = 0;
    int len = 0;
    while (p == n) {                                           // copy identical part
        p = prev[len] ? prev[len] : 'a' - 1;
        n = next[len] ? next[len] : 'z' + 1;
        if (p == n) buf[len++] = p;
    }
    if (p == 'a' - 1) {                                        // end of left string
        while (n == 'a') {                                     // handle a's
            buf[len++] = 'a';
            n = next[len] ? next[len] : 'z' + 1;
        }
        if (n == 'b') {                                        // handle b
            buf[len++] = 'a';
            n = 'z' + 1;
        }
    }
    else if (p + 1 == n) {                                     // consecutive characters
        n = 'z' + 1;
        buf[len++] = p;
        while ((p = prev[len] ? prev[len] : 'a' - 1) == 'z') { // handle z's
            buf[len++] = 'z';
        }
    }
    buf[len++] = n - (n - p) / 2;                              // append middle character
    buf[len] = '\0';
    return len;
}

平均字符串长度

最好的情况是按随机顺序插入元素。实际上,以伪随机顺序生成65,536个字符串时,平均字符串长度约为4.74个字符(理论上的最小值(在移至更长​​的字符串之前使用所有组合,该最小值为3.71))。

The best case is when the elements are inserted in random order. In practice, when generating 65,536 strings in pseudo-random order, the average string length is around 4.74 characters (the theoretical minimum, using every combination before moving to longer strings, would be 3.71).

最坏的情况是按顺序插入元素,并始终生成新的最右边或最左边的字符串;

The worst case is when inserting the elements in order, and always generating a new rightmost or leftmost string; this will lead to a recurring pattern:

n, u, x, z, zn, zu, zx, zz, zzn, zzu, zzx, zzz, zzzn, zzzu, zzzx, zzzz...  
n, g, d, b, an, ag, ad, ab, aan, aag, aad, aab, aaan, aaag, aaad, aaab...  

在每第四个字符串后添加一个额外的字符。

with an extra character being added after every fourth string.

如果您有要为其生成密钥的现有有序列表,请使用算法生成按字典顺序排列的等距密钥类似于下面的代码,然后在插入新元素时使用上述算法生成新密钥。

If you have an existing ordered list for which you want to generate keys, generate lexicographically equally-spaced keys with an algorithm like the one below, and then use the algorithm described above to generate a new key when inserting a new element.

该代码检查需要多少个字符,最低有效数字需要多少个不同的字符,然后在字母的两个选择之间切换以获取正确的数字键。例如。具有两个字符的键可以具有676个不同的值,因此,如果要求提供1600个键,则每个两个字符的组合需要增加1.37个额外的键,因此在每个两个字符的键之后再增加一个('n')或两个('j' ,'r')字符后缀,即: aan ab abj abr ac acn ad ad ae aej aer af afn ... (跳过首字母'aa')。

The code checks how many charactes are needed, how many different characters are needed for the least significant digit, and then switches between two selections from the alphabet to get the right number of keys. E.g. keys with two character can have 676 different values, so if you ask for 1600 keys, that is 1.37 extra keys per two-character combination, so after each two-character key an additional one ('n') or two ('j','r') characters are appended, i.e.: aan ab abj abr ac acn ad adn ae aej aer af afn ... (skipping the initial 'aa').

function seqString(num) {
    var chars = Math.floor(Math.log(num) / Math.log(26)) + 1;
    var prev = Math.pow(26, chars - 1);
    var ratio = chars > 1 ? (num + 1 - prev) / prev : num;
    var part = Math.floor(ratio);
    var alpha = [partialAlphabet(part), partialAlphabet(part + 1)];
    var leap_step = ratio % 1, leap_total = 0.5;
    var first = true;
    var strings = [];
    generateStrings(chars - 1, "");
    return strings;

    function generateStrings(full, str) {
        if (full) {
            for (var i = 0; i < 26; i++) {
                generateStrings(full - 1, str + String.fromCharCode(97 + i));
            }
        }
        else {
            if (!first) strings.push(stripTrailingAs(str));
            else first = false;
            var leap = Math.floor(leap_total += leap_step);
            leap_total %= 1;
            for (var i = 0; i < part + leap; i++) {
                strings.push(str + alpha[leap][i]);
            }
        }
    }
    function stripTrailingAs(str) {
        var last = str.length - 1;
        while (str.charAt(last) == 'a') --last;
        return str.slice(0, last + 1);
    }
    function partialAlphabet(num) {
        var magic = [0, 4096, 65792, 528416, 1081872, 2167048, 2376776, 4756004,
                     4794660, 5411476, 9775442, 11097386, 11184810, 22369621];
        var bits = num < 13 ? magic[num] : 33554431 - magic[25 - num];
        var chars = [];
        for (var i = 1; i < 26; i++, bits >>= 1) {
            if (bits & 1) chars.push(String.fromCharCode(97 + i));
        }
        return chars;
    }

}
document.write(seqString(1600).join(' '));

这篇关于返回一个在两个给定字符串之间排序的新字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆