模仿 C# 的 List<List<int>> 的 C 数据结构? [英] C data structure to mimic C#&#39;s List&lt;List&lt;int&gt;&gt;?

查看:23
本文介绍了模仿 C# 的 List<List<int>> 的 C 数据结构?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望将 c# 方法重构为 c 函数以尝试获得一些速度,然后在 c# 中调用 c dll 以允许我的程序使用该功能.

I am looking to refactor a c# method into a c function in an attempt to gain some speed, and then call the c dll in c# to allow my program to use the functionality.

目前,c# 方法接受一个整数列表并返回一个整数列表列表.该方法计算整数的幂集,因此 3 个整数的输入将产生以下输出(在此阶段,整数的值并不重要,因为它用作内部权重值)

Currently the c# method takes a list of integers and returns a list of lists of integers. The method calculated the power set of the integers so an input of 3 ints would produce the following output (at this stage the values of the ints is not important as it is used as an internal weighting value)

1
2
3
1,2
1,3
2,3
1,2,3

其中每一行代表一个整数列表.输出指示第一个列表的索引(偏移量为 1),而不是值.所以 1,2 表示索引 0 和 1 处的元素是幂集的元素.

Where each line represents a list of integers. The output indicates the index (with an offset of 1) of the first list, not the value. So 1,2 indicates that the element at index 0 and 1 are an element of the power set.

我不熟悉 c,那么对于允许 c# 访问返回数据的数据结构,我的最佳选择是什么?

I am unfamiliar with c, so what are my best options for data structures that will allow the c# to access the returned data?

提前致谢

更新

感谢大家到目前为止的评论.这是问题本质的一些背景.

Thank you all for your comments so far. Here is a bit of a background to the nature of the problem.

计算集合幂集的迭代方法相当简单.两个循环和一些位操作就是它真正的全部.它只是被调用...很多次(如果集合的大小足够大,实际上会调用数十亿次).

The iterative method for calculating the power set of a set is fairly straight forward. Two loops and a bit of bit manipulation is all there is to it really. It just get called..a lot (in fact billions of times if the size of the set is big enough).

我对使用 c(正如人们所指出的 c++)的看法是它为性能调优提供了更多空间.直接端口可能不会提供任何增加,但它为更复杂的方法开辟了道路,以提高速度.即使每次迭代的小幅增加也等同于可测量的增加.

My thoughs around using c (c++ as people have pointed out) are that it gives more scope for performance tuning. A direct port may not offer any increase, but it opens the way for more involved methods to get a bit more speed out of it. Even a small increase per iteration would equate to a measurable increase.

我的想法是移植一个直接版本,然后努力增加它.然后随着时间的推移重构它(在 SO 的每个人的帮助下).

My idea was to port a direct version and then work to increase it. And then refactor it over time (with help from everyone here at SO).

更新 2

jalf 的另一个公平点,我不必使用列表或等效项.如果有更好的方法,那么我愿意接受建议.列表的唯一原因是每组结果的大小不一样.

Another fair point from jalf, I dont have to use list or equivelent. If there is a better way then I am open to suggestions. The only reason for the list was that each set of results is not the same size.

到目前为止的代码...

public List<List<int>> powerset(List<int> currentGroupList)
{
    _currentGroupList = currentGroupList;
    int max;
    int count;

    //Count the objects in the group
    count = _currentGroupList.Count;
    max = (int)Math.Pow(2, count);

    //outer loop
    for (int i = 0; i < max; i++)
    {
        _currentSet = new List<int>();

        //inner loop
        for (int j = 0; j < count; j++)
        {              
            if ((i & (1 << j)) == 0)
            {
                _currentSetList.Add(_currentGroupList.ElementAt(j));                          
            }
        }
        outputList.Add(_currentSetList);
    }   
    return outputList;
}

如您所见,内容不多.它只是绕来绕去很多!

As you can see, not a lot to it. It just goes round and round a lot!

我承认创建和构建列表可能不是最有效的方式,但我需要某种方式以可管理的方式返回结果.

I accept that the creating and building of lists may not be the most efficient way, but I need some way of providing the results back in a manageable way.

更新 2

感谢所有投入和实施工作.只是为了澄清提出的几点:我不需要输出处于自然顺序",而且我对返回的空集也不感兴趣.

Thanks for all the input and implementation work. Just to clarify a couple of points raised: I dont need the output to be in 'natural order', and also I am not that interested in the empty set being returned.

hughdbrown 的实现正在测试中,但我认为我需要在某个时候存储结果(或至少是其中的一个子集).听起来内存限制将在运行时间成为真正问题之前很久就适用.部分是因为这个,我想我可以避免使用字节而不是整数,从而提供更多的潜在存储空间.

hughdbrown's implementation is intesting but i think that i will need to store the results (or at least a subset of them) at some point. It sounds like memory limitiations will apply long before running time becomes a real issue. Partly because of this, I think I can get away with using bytes instead of integers, giving more potential storage.

真正的问题是:我们在 C# 中是否达到了此计算的最大速度?非托管代码选项是否提供更多范围.我知道在很多方面答案都是徒劳的,因为即使我们有足够的时间运行,它也只会允许原始集合中的额外值.

The question really is then: Have we reached the maximum speed for this calcualtion in C#? Does the option of unmanaged code provide any more scope. I know in many respects the answer is futile, as even if we havled the time to run, it would only allow an extra values in the original set.

推荐答案

这一次返回一组 powerset.它基于此处的python代码.它适用于超过 32 个元素的 powerset.如果您需要少于 32 个,您可以将 long 更改为 int.它非常快——比我以前的算法快,比(我修改为使用收益返回版本的)P Daddy 的代码快.

This returns one set of a powerset at a time. It is based on python code here. It works for powersets of over 32 elements. If you need fewer than 32, you can change long to int. It is pretty fast -- faster than my previous algorithm and faster than (my modified to use yield return version of) P Daddy's code.

static class PowerSet4<T>
{
    static public IEnumerable<IList<T>> powerset(T[] currentGroupList)
    {
        int count = currentGroupList.Length;
        Dictionary<long, T> powerToIndex = new Dictionary<long, T>();
        long mask = 1L;
        for (int i = 0; i < count; i++)
        {
            powerToIndex[mask] = currentGroupList[i];
            mask <<= 1;
        }

        Dictionary<long, T> result = new Dictionary<long, T>();
        yield return result.Values.ToArray();

        long max = 1L << count;
        for (long i = 1L; i < max; i++)
        {
            long key = i & -i;
            if (result.ContainsKey(key))
                result.Remove(key);
            else
                result[key] = powerToIndex[key];
            yield return result.Values.ToArray();
        }
    }
}

您可以在此处下载我测试过的所有最快版本.

You can download all the fastest versions I have tested here.

我真的认为使用收益回报是使计算大型幂集成为可能的变化.预先分配大量内存会显着增加运行时间,并导致算法很早就因内存不足而失败.原始海报应该弄清楚他一次需要多少套powerset.拥有 >24 个元素并不是真正的选择.

I really think that using yield return is the change that makes calculating large powersets possible. Allocating large amounts of memory upfront increases runtime dramatically and causes algorithms to fail for lack of memory very early on. Original Poster should figure out how many sets of a powerset he needs at once. Holding all of them is not really an option with >24 elements.

这篇关于模仿 C# 的 List<List<int>> 的 C 数据结构?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆