数组速度或... [英] Array for speed or...

查看:79
本文介绍了数组速度或...的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个实质性的(数字,长整数)数据随机流(来自几个SCADA来源的
)需要通过一个函数传递

tallies每个数字中出现几种不同的位模式

(实际上是32位行,其中一些(最左边)是伪造的)。


速度执行在优先级列表上很高。


我想到了这些方面的解决方案,避免相对慢的

开关/案例或几个if / elseif组合:


如果数据很短(或整数)我就能做到这一点:


#include< stdio.h>

#define BOGUS_BIT_MASK SHORT_MAX& (SHORTMAX> 3)

//假设我们使用短的3个伪位


#define BATCH_SIZE 1024


typedef unsigned long long ull;

typedef FILE * fileptr;


ull frequency [SHORT_MAX];

int batch [BATCH_SIZE];

fileptr stream;


void tally(const ull batch [],const int batch_size,int freq []);

void zero_freq(ull freq);

int getdata(int batch [],const int BATCH_SIZE,const fileptr fp);


int main (void)

{

//将频率数组中的所有单元格归零(省略)

while(getdata(batch,BATCH_SIZE,stream)) {

//这里省略了函数定义

计数(批处理,BATCH_SIZE,频率);

}

process_results (频率);

//这里省略了函数定义

返回0;

}


void tally(const ull batch [],const int batch_size,int freq []);

{

int i;

for(i = 0; i< batch_size; i ++){

freq [batch [i]& BOGUS_BIT_MASK& BIT_PATTERN_01] ++;

freq [batch [i]& BOGUS_BIT_MASK& BIT_PATTERN_02] ++;

freq [batch [i]& BOGUS_BIT_MASK& BIT_PATTERN_03] ++;

freq [batch [i]& BOGUS_BIT_MASK& BIT_PATTERN_04] ++;

//等等。实际上bitpatterns将存储在一个数组中

//并且每次运行循环时都会应用伪造掩码。

}

}


不幸的是,int32数据使用了大部分空的所需数组

大小(最大位模式值非常接近2 ** 29)

挑战,如果不是不可能在给定的机器/操作系统上。


总计,它可能会使用几百个图案,但是我不能保证它不会在以后扩展。


我很快放弃了if / then方法,因为很明显它会导致一个凌乱,缓慢,难以调试/调整的意大利面。


有没有人对巨大阵列有更好的想法?挤出阵列中所有未使用过的单元格是否容易(以b / b $ b $维护)方式?有些

可能是哪种映射?我试图找出一个列表方法,但是

或多或少搁浅了。此外,我看不到一种简单的方法来以类似于数组方法的方式定位列表中的

右链。如果

确实我的方法值得进一步发展,你是否期望通过在特定于操作系统的ASM中编写taly函数来获得更大的速度


版?我知道这是特定于平台的,就像感受一下你对现代编译器成功优化的看法是什么。


我是完全的意识到我可能完全偏离了这个错误的脚,在

这种情况​​下,我很高兴被引导向正确的方向发展。请不要这么多b $ b让上面的半码可能输入错误让你失望,这不是真正的b
因为它还处于试验阶段,相当复杂,

部分由其他人传递给我,并且如同地狱一样凌乱而难以辨认。


感谢您的帮助,非常感谢。

Sh。

PS如果您有兴趣,最终的程序将作为

数据中心用于(在 - 房屋开发)物流规划包和

反馈给SCADA设备和会计数据库。所有的努力

sofar已经陷入困境,现在我被要求烧掉我的手指......有趣的是

还有挑战性。

I have a substantial random stream of (numerical, long int) data (coming
from several SCADA sources) that needs to be passed thru a function that
tallies the occurrences of several different bit patterns in each number
(or in fact 32 bit row, of which a few (leftmost) are bogus).

Speediness of execution is high on the priority list.

I thought of a solution along these lines, avoiding relatively slow
switch/case or several if/elseif combinations:

If the data would have been short (or int) I''d be able to do this:

#include <stdio.h>
#define BOGUS_BIT_MASK SHORT_MAX & (SHORTMAX >3)
//assume 3 bogus bits in case we use short

#define BATCH_SIZE 1024

typedef unsigned long long ull;
typedef FILE* fileptr;

ull frequency[SHORT_MAX];
int batch[BATCH_SIZE];
fileptr stream;

void tally (const ull batch[], const int batch_size, int freq[]);
void zero_freq (ull freq);
int getdata (int batch[], const int BATCH_SIZE, const fileptr fp);

int main (void)
{
// zero all cells in frequency array (omitted)
while (getdata(batch,BATCH_SIZE,stream)){
// function definition omitted here
tally (batch,BATCH_SIZE,frequency);
}
process_results (frequency);
// function definition omitted here
return 0;
}

void tally (const ull batch[], const int batch_size, int freq[]);
{
int i;
for(i=0;i<batch_size;i++) {
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_01]++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_02]++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_03]++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_04]++;
// etc. etc. in reality bitpatterns will be stored in an array
// and bogus mask is applied once each run of the loop.
}
}

Unfortunately, with int32 data using a mostly empty array of the desired
size (largest bitpattern value comes pretty close to 2**29) is
challenging, if not impossible on a given machine/os.

In total, it''s likely up to a few hundred bitpatterns may be used, but I
can''t guarantee yet it won''t expand later.

I quickly abandoned the if/then approach, as it became obvious it would
result in a messy, slow, hard to debug/adapt spaghetti.

Has anyone got a better idea for the huge array ? Is there an easy (to
maintain) way of squeezing out all the non-used cells in the array? Some
kind of mapping perhaps ? I tried figuring out a list approach, but that
more or less stranded. Besides I could not see an easy way to target the
right chain in the list in a way similar to the array approach. If
indeed my approach deserves further developing, do you expect a big
further speed gain from coding the taly function in a OS-specific ASM
version? I know this is platform specific, just like to get a feel of
what your opninions are on how succesful modern compilers are in optimizing.

I am fully aware I may be completely off on the wrong foot here, in
which case I am glad to be steered in the right direction. Please don''t
let a possible typo in above semi-code put you off, it''s not the real
thing, as it still is in experimental stage, rather more complicated,
partly handed to my by others, and messy and illegible as hell.

Thanks for your help in advance, much appreciated.
Sh.
PS in case you''re interested, the final program will be put to use as
data hub for an (in-house developed) logistics planning package and
feedback to both SCADA devices and an accounting database. All efforts
sofar have stranded, now I am asked to burn my fingers... Interesting
yet challenging.

推荐答案




Schraalhans Keukenmeester在08/23/06 17:44写道:


Schraalhans Keukenmeester wrote On 08/23/06 17:44,:

[......很多...]

void tally(const ull batch [],const int batch_size,int freq []);

{

int i;

for(i = 0; i< batch_size; i ++){

freq [batch [i]& BOGUS_BIT_MASK& BIT_PATTERN_01] ++;

freq [batch [i]& BOGUS_BIT_MASK& BIT_PATTERN_02] ++;

freq [batch [i]& BOGUS_BIT_MASK& BIT_PATTERN_03] ++;

freq [batch [i]& BOGUS_BIT_MASK& BIT_PATTERN_04] ++;

//等等。实际上bitpatterns将存储在一个数组中

//并且每次运行循环时都会应用伪造掩码。

}

}


不幸的是,int32数据使用了大部分空的所需数组

大小(最大位模式值非常接近2 ** 29)

挑战,如果不是不可能在给定的机器/操作系统上。


总计,它可能会使用几百个图案,但是我不能保证它不会在以后扩展。
[... much snipped ...]
void tally (const ull batch[], const int batch_size, int freq[]);
{
int i;
for(i=0;i<batch_size;i++) {
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_01]++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_02]++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_03]++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_04]++;
// etc. etc. in reality bitpatterns will be stored in an array
// and bogus mask is applied once each run of the loop.
}
}

Unfortunately, with int32 data using a mostly empty array of the desired
size (largest bitpattern value comes pretty close to 2**29) is
challenging, if not impossible on a given machine/os.

In total, it''s likely up to a few hundred bitpatterns may be used, but I
can''t guarantee yet it won''t expand later.



你到底想要做什么?你说你试图计算事件发生的时间。比特模式,但是

并不是代码所做的。例如,假设

只是一位模式,BIT_PATTERN_01 = 0x1。然后freq [0]

将计算所有偶数数据值,freq [1]将计算所有奇数值的
。 freq [1]计数不仅包括数据值0x1(匹配BIT_PATTERN_01),还包括所有

值0x3,0x51,0xffff,...


如果一个BIT_PATTERN_xx是一个

子集,事情会变得更加困惑。另一个:BIT_PATTERN_01 = 0x1,BIT_PATTERN_02 = 0x3,

例如。


请说明什么样的理货你想要计算。


-
Er *** ****** @ sun.com

Exactly what are you trying to do? You said you were
trying to "tally the occurrences" of bit patterns, but that
is not quite what the code does. For example, suppose there
is just one bit pattern, BIT_PATTERN_01 = 0x1. Then freq[0]
will count all the even data values and freq[1] will count
all the odd values. The freq[1] count will include not only
the data value 0x1 (matching BIT_PATTERN_01), but also all the
values 0x3, 0x51, 0xffff, ...

Things get even more confused if one BIT_PATTERN_xx is a
"subset" of another: BIT_PATTERN_01 = 0x1, BIT_PATTERN_02 = 0x3,
for example.

Please clarify what kind of "tally" you want to compute.

--
Er*********@sun.com


文章< 44 ************ **********@news.xs4all.nl>,

Schraalhans Keukenmeester< fi ******************* *@xsfourall.ennelwrote:
In article <44**********************@news.xs4all.nl>,
Schraalhans Keukenmeester <fi********************@xsfourall.ennelwrote:

>我有一个(数字,长整数)数据的大量随机流(来自几个SCADA来源),需要通过一个函数传递,该函数可以计算每个数字中几种不同位模式的出现次数
>I have a substantial random stream of (numerical, long int) data (coming
from several SCADA sources) that needs to be passed thru a function that
tallies the occurrences of several different bit patterns in each number


>执行的速度很快在优先级列表上。
>Speediness of execution is high on the priority list.


>不幸的是,int32数据使用了大多数所需的

大小的空数组(最大的bitpattern值非常漂亮)接近2 ** 29)对于给定的机器/操作系统即使不是不可能也具有挑战性。
>Unfortunately, with int32 data using a mostly empty array of the desired
size (largest bitpattern value comes pretty close to 2**29) is
challenging, if not impossible on a given machine/os.


>总的来说,它可能会使用几百个模式,但我不能保证它以后不会扩展。
>In total, it''s likely up to a few hundred bitpatterns may be used, but I
can''t guarantee yet it won''t expand later.



是否有必要使位模式易于改变?


我想到了两种方法:


1)如果在模式

改变时旋转新的可执行文件是可以接受的,那么编写一个读取位模式的程序,

和吐出适当匹配的代码。当

模式文件发生变化时,重建那一段代码并重新链接。


2)如果旋转新的可执行文件是不可接受的,那么你可能

能够从数据文件中找到适合加载

a匹配函数的数据表表示。这可能比上面的结果慢得多b $ b,因为你可能最终解释了一点。

Is it necessary that the bit patterns be easily changable?

Two approaches come to mind:

1) If it is acceptable to spin a new executable when the patterns
change, then write a program that reads the bit patterns,
and spits out code that matches appropriately. When the
pattern file changes, rebuild that one bit of code and relink.

2) If it is not acceptable to spin a new executable, then you might
be able to find a data table representation suitable for loading
a match function from a data file. This might end up
slower than the above, as you might end up "interpreting" a bit.


freq [batch [i]& BOGUS_BIT_MASK& BIT_PATTERN_01] ++;

freq [batch [i]& BOGUS_BIT_MASK& BIT_PATTERN_02] ++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_01]++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_02]++;



这看起来不对 - 例如,全0位模式

将增加freq [0]对于每个BIT_PATTERN_ *。

如果你有几百有点模式,你肯定会以这种冲突结束




你不需要更接近的东西:


freq [1] + = batch [i]& BIT_MASK_01 == BIT_PATTERN_01;

freq [2] + = batch [i]& BIT_MASK_02 == BIT_PATTERN_02;

-

好​​的,只有流行语。两个音节,上衣。 - Laurie Anderson

That doesn''t seem right -- for example, the all-0 bit pattern
is going to increment freq[0] for every BIT_PATTERN_* .
If you have "a few hundred" bit patterns, you are definitely going
to end up with these kind of clashes.

Don''t you need something closer to:

freq[1] += batch[i] & BIT_MASK_01 == BIT_PATTERN_01;
freq[2] += batch[i] & BIT_MASK_02 == BIT_PATTERN_02;
--
Okay, buzzwords only. Two syllables, tops. -- Laurie Anderson


Eric Sosman写道:
Eric Sosman wrote:

>

Schraalhans Keukenmeester在08/23/06 17:44写道:
>
Schraalhans Keukenmeester wrote On 08/23/06 17:44,:

> [...多次剪断...]
void tally(const ull batch [],const int batch_size,int freq []);
{i / 0; i< batch_size; i ++){
freq [batch] [i]& BOGUS_BIT_MASK& BIT_PATTERN_01] ++;
freq [batch [i]& BOGUS_BIT_MASK& BIT_PATTERN_02] ++;
freq [batch [i]& BOGUS_BIT_MASK& BIT_PATTERN_03] ++;
freq [batch [i]& BOGUS_BIT_MASK& BIT_PATTERN_04] ++;
//等等。实际上bitpatterns将存储在一个数组中
//每次运行循环时都会应用伪造的掩码。
} }

不幸的是,对于int32数据,使用大多数所需
大小的空数组(最大位模式值非常接近2 ** 29)是否具有挑战性,如果不是在给定的机器/操作系统上是不可能的。

总的来说,它可能会使用几百个模式,但我不能保证它会赢得'' t稍后扩展。
>[... much snipped ...]
void tally (const ull batch[], const int batch_size, int freq[]);
{
int i;
for(i=0;i<batch_size;i++) {
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_01]++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_02]++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_03]++;
freq[batch[i] & BOGUS_BIT_MASK & BIT_PATTERN_04]++;
// etc. etc. in reality bitpatterns will be stored in an array
// and bogus mask is applied once each run of the loop.
}
}

Unfortunately, with int32 data using a mostly empty array of the desired
size (largest bitpattern value comes pretty close to 2**29) is
challenging, if not impossible on a given machine/os.

In total, it''s likely up to a few hundred bitpatterns may be used, but I
can''t guarantee yet it won''t expand later.



你到底想要做什么?你说你试图计算事件发生的时间。比特模式,但是

并不是代码所做的。例如,假设

只是一位模式,BIT_PATTERN_01 = 0x1。然后freq [0]

将计算所有偶数数据值,freq [1]将计算所有奇数值的
。 freq [1]计数不仅包括数据值0x1(匹配BIT_PATTERN_01),还包括所有

值0x3,0x51,0xffff,...


如果一个BIT_PATTERN_xx是一个

子集,事情会变得更加困惑。另一个:BIT_PATTERN_01 = 0x1,BIT_PATTERN_02 = 0x3,

例如。


请说明什么样的理货你想要计算。


Exactly what are you trying to do? You said you were
trying to "tally the occurrences" of bit patterns, but that
is not quite what the code does. For example, suppose there
is just one bit pattern, BIT_PATTERN_01 = 0x1. Then freq[0]
will count all the even data values and freq[1] will count
all the odd values. The freq[1] count will include not only
the data value 0x1 (matching BIT_PATTERN_01), but also all the
values 0x3, 0x51, 0xffff, ...

Things get even more confused if one BIT_PATTERN_xx is a
"subset" of another: BIT_PATTERN_01 = 0x1, BIT_PATTERN_02 = 0x3,
for example.

Please clarify what kind of "tally" you want to compute.



你是对的,但在实践中,实际的位掩码是相互的/ b $ b独家或数据不符合(对某些人来说)额外要求)将

重新制作成另一种形式。正如我所说的那样,实际上它更复杂。

在处理阶段,只有使用位掩码索引的''单元'是

,其余的是丢弃。我完全知道额外的单元格会增加
。我承认,我的声明数据是随机的并不完全是

true。


如果上述要求无法满足所有数据/掩码和一些

组合导致其他有效模式计数递增,

流将必须在子流中分割。实际上,数据已经基于原始数据流中的标识符在子流中排序(

来自source_identifier,packet_id,packetsize,data,data,

[...,]数据,路线,crc)


现在我主要对可用性感兴趣 - 一般 -

建议考虑速度的算法。我希望你能帮助我!


感谢您的智慧和投入的时间!

Rgds

Sh 。

You are right, yet in practice either the actual bitmasks are mutually
exclusive or data not conforming (to some extra requirements) will be
reworked into another form. As I said, it''s more complicated in reality.
In the processing stage only the ''cells'' indexed by the bitmasks are
used, the rest is discarded. I am fully aware additional cells will be
incremented. I admit, my statement the data was random isn''t completely
true.

If the above requirements cannot be met for all data/masks and some
combinations lead to other valid pattern counts being incremented, the
stream will have to be split in substreams. Actually the data is already
sorted in substreams based on identifiers in the raw data stream (which
comes in the form source_identifier, packet_id, packetsize, data, data,
[...,] data, alignment, crc)

Right now I am mainly interested in the usability -in general- of the
suggested algoritm with speed in mind. I hope you can help me with that!

Thanks for your words of wisdom and your time invested!
Rgds
Sh.


这篇关于数组速度或...的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆