基数排序对字符串数组? [英] Radix Sort on an Array of Strings?

查看:168
本文介绍了基数排序对字符串数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在研究四周,虽然我已经想通了,使用基数排序按字母顺序排列的字符串数组的总体思路,我知道我走错方向。

I've been researching around, and while I've figured out the general idea of using Radix Sort to alphabetize an array of strings, I know I'm going the wrong direction.

这是我迄今为止:

void radixSort(string* sortMe, int l)
{
    queue<string>* sections = new queue<string>[27];    //Have a-z, and also one for strings that are null terminated.
    for(int i = 0; i < numElements; i++)
    {
        if(!(sortMe[i][l] == 32))
            sections[sortMe[i][l]-96].push(sortMe[i]);      //-96 because the ascii code for a is 97. If, for example a is the character, it will be placed at 1. 0 is left for null characters
    }

    for(int i =0; i < 26; i++)
    {
        while(!sections[i].empty())
        {
            temp.push_back(sections[i].front());
            sections[i].pop();
        }
    }
}

我迄今所进行排序的第一个字符的所有字符串,并且我知道我然后要经过,使剩余的字符子阵列和排序他们,但我怎么能高效地实现它?该字符串是可变大小的,可以包含空格,例如:

What I have so far sorts all the strings by the first character, and I know that I then have to go through and make subarrays of the remaining characters and sort them, but how can I implement it efficiently? The strings are of variable size and can include spaces, for example:


  • 细分

  • 主要街道

  • 裤子

  • 刺穿非殖民化

  • 泥质

  • 轴向satisfactoriness

  • 气质

  • 过敏


  • hairbreadths

  • 乳霜潮

  • unlaboured

  • 印第安纳州

  • buggiest

  • 毛里塔尼亚

  • emanators

  • 赞歌

  • zouaves迪什潘

  • 拖曳

  • solarisms

  • remunerativeness




  • ooziness

  • toastier

  • 波特

  • 后缀

  • 无能为力堤顶

  • disassimilated

  • 喘气

  • flirtier


这是我发现,好像这将是使用:
http://algs4.cs.princeton.edu/lectures/51DemoKeyIndexedCounting.pdf

This is something I found that seems like it will be of use: http://algs4.cs.princeton.edu/lectures/51DemoKeyIndexedCounting.pdf

推荐答案

您已经找到了幻灯片是伟大的!但是,在没有这些队列来自于你的code?

The slides you've found are great! But where did those queues come from in your code?

反正你在这里(活生生的例子):

template <typename E>
size_t bin(const E& elem, size_t digit)
{
    return elem.size() > digit ? size_t(elem[digit]) + 1 : 0;
}

template <size_t R, typename C, typename P>
void radix_sort(P& pos, const C& data, size_t digit)
{
    using A = std::array<size_t, R + 1>;
    A count = {};
    P prev(pos);

    for (auto i : prev)
        ++count[bin(data[i], digit)];

    A done = {}, offset = {{0}};
    std::partial_sum(count.begin(), count.end() - 1, offset.begin() + 1);

    for (auto i : prev)
    {
        size_t b = bin(data[i], digit);
        pos[offset[b] + done[b]++] = i;
    }
}

struct shorter
{
    template <typename A>
    bool operator()(const A& a, const A& b) { return a.size() < b.size(); }
};

template <size_t R, typename C>
std::vector<size_t> radix_sort(const C& data)
{
    std::vector<size_t> pos(data.size());
    std::iota(pos.begin(), pos.end(), 0);

    size_t width = std::max_element(data.begin(), data.end(), shorter())->size();

    for (long digit = long(width) - 1; digit >= 0; --digit)
        radix_sort<R>(pos, data, size_t(digit));

    return pos;
}

您可以使用像

int main()
{
    std::vector<std::string> data = generate();
    std::vector<size_t> pos = radix_sort<128>(data);
    for (auto i : pos)
        std::cout << data[i] << std::endl;
}

其中,生成()包含在活生生的例子,并产生在你的问题中给出的字符串。

where generate() is included in the live example and generates the strings given in your question.

我并不是试图解释这是如何工作在这里,我想你可以找出因为你对这个问题的工作。但也有少数评论是为了。

I am not trying to explain how this works here, I assume you can figure out since you are working on the problem. But a few comments are in order.


  • 我们既不排序就地输入序列,也没有返回一个排序的复印件;我们只是返回排序序列中输入元素的位置的序列。

我们正在从右到左处理字符串。

We are processing strings from right to left.

的复杂性是 O(LW),其中是输入长度(输入串号)和是W 是最大输入宽度(最大所有输入字符串的长度)。所以这个算法是有道理的,如果该字符串的宽度不会变化太多。

The complexity is O(lw) where l is the input length (number of input strings) and w is the maximum input width (max. length of all input strings). So this algorithm makes sense if the string width does not vary too much.

第一个模板参数研究 radix_sort()的可能值的每一位数(字母)输入。例如。 R = 128 意味着可能的值是 0..127 。这应该是罚款您的输入。我还没有尝试做任何巧妙相对于ASCII codeS,但你可以自定义功能斌()

The first template parameter R of radix_sort() is the number of possible values for each digit (letter) in the input. E.g. R = 128 means that possible values are 0..127. This should be fine for your input. I haven't tried to do anything clever with respect to ASCII codes, but you can customize function bin() for that.

的出纸槽(),值 0 被保留的意思是我们过去这串的结束。这样的字符串放在其他人仍然再继续。

In the output of bin(), value 0 is reserved to mean "we are past the end of this string". Such strings are placed before others that are still continuing.

我试图给不言自明的名称变量和函数,并使用常见任务的标准库调用在可能的情况。

I have tried to give self-explanatory names to variables and functions, and use standard library calls for common tasks where possible.

在code是通用的,如它可以任意排序随机访问容器包含随机存取容器,不只是字符串的向量。

The code is generic, e.g. it can sort any random access container containing random access containers, not just vectors of strings.

我使用C ++ 11的特性在这里和那里的便利,但没有什么是真的有必要:一是可以很容易地做同样的只是C ++ 03

I am using C++11 features here and there for convenience, but nothing is really necessary: one could easily do the same just with C++03.

这篇关于基数排序对字符串数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆