行方式与列方式图像处理 [英] Row-wise vs. column-wise image processing

查看:75
本文介绍了行方式与列方式图像处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,


我目前正在实施一个相当简单的算法。它扫描一个

灰度图像,并计算一个像素的新值作为其原始值的b
的函数。两次传球,第一次水平传球,第二次传球
。我遇到的问题是垂直传递比水平慢3到4个b $ b倍,虽然在两种情况下代码都是_exactly_

相同?!


代码本身非常简单。有2个循环可以扫描行

和colums(水平传递),或者列和行(垂直传递)

,具体取决于传递。处理部分很简单,一行包含
。 ''pixel''是指向当前像素的指针。首先更新当前像素的新

值,然后将指针增加到下一个像素,即下一列或

下一行取决于传球。我应该补充说,图像存储为一个大的一维向量,即每行的值在内存中是连续的
。以下是代码的简化版本:


####################

/ / HORIZONTAL

//每行循环

....

//然后每列

for(col = firstCol + 1; col< = lastCol-1; ++ col)

{

* pixel =(* pixel)* 2/3;

像素++;

}


//垂直

//每列循环

....

//然后是每一行

for(row = firstRow + 1; row< = lastRow-1; ++ row)

{

* pixel =(* pixel)* 2/3;

像素+ = imgWidth;

}

##################


对于这么少量的代码,时间如下: br />
- 水平= 0.035秒。

- 垂直= * * 0.135秒。


现在,如果我们只是删除每种情况下线更新像素

指针(即像素++;和像素+ = imgWidth;),然后定时

等于0.081。这个简单的指令是造成大量时间损失的原因。我不知道为什么。


我唯一的猜测与内存管理问题有关。由于图像是按行存储的
,因此在水平传递过程中,当前和下一个值在物理上位于

内存中。另一方面,对于垂直

传递,下一个值存储在下一行中,它们之间的距离变为''image_width''。我的猜测是,在这种情况下,下一个像素

值不够接近,无法存储在处理器中。\\ b
缓存或寄存器。处理器必须从内存中获取它,因此速度大幅下降。这只是一个猜测。


如果有人能就这个话题给我启发,我真的很感激。


提前谢谢,

恩里克

解决方案

Enrique Cruiz写道:


大家好,


我目前正在实施一个相当简单的算法。它扫描一个

灰度图像,并计算一个像素的新值作为其原始值的b
的函数。两次传球,第一次水平传球,第二次传球
。我遇到的问题是垂直传递比水平慢3到4个b $ b倍,虽然在两种情况下代码都是_exactly_

相同?!


代码本身非常简单。有2个循环可以扫描行

和colums(水平传递),或者列和行(垂直传递)

,具体取决于传递。处理部分很简单,一行包含
。 ''pixel''是指向当前像素的指针。首先更新当前像素的新

值,然后将指针增加到下一个像素,即下一列或

下一行取决于传球。我应该补充说,图像存储为一个大的一维向量,即每行的值在内存中是连续的
。以下是代码的简化版本:


####################

/ / HORIZONTAL

//每行循环

...

//然后为每一列

for (col = firstCol + 1; col< = lastCol-1; ++ col)

{

* pixel =(* pixel)* 2/3;

像素++;

}


//垂直

//每列循环

...

//然后是每一行

for(row = firstRow + 1; row< = lastRow-1; ++ row)
{

* pixel =(* pixel)* 2/3;

像素+ = imgWidth;

}

##################


对于这么少量的代码,时间如下:

- 水平= 0.035秒。

- 垂直= 0.135秒。

现在我们只需删除每行更新像素的行

指针(即像素++;和像素+ = imgWidth;),然后定时

等于0.081。这个简单的指令是造成大量时间损失的原因。我不知道为什么。


我唯一的猜测与内存管理问题有关。由于图像是按行存储的
,因此在水平传递过程中,当前和下一个值在物理上位于

内存中。另一方面,对于垂直

传递,下一个值存储在下一行中,它们之间的距离变为''image_width''。我的猜测是,在这种情况下,下一个像素

值不够接近,无法存储在处理器中。\\ b
缓存或寄存器。处理器必须从内存中获取它,因此速度大幅下降。但这只是一个猜测。


如果有人能就这个话题给我启发,我真的很感激。


先谢谢,



如果没有一个完整的,最小的,可编译的例子,我们的猜测和你的一样好,这可能是一个很好的猜测。使用时间程序

观察代码花费的时间和系统代码花费的时间。如果您的猜测是正确的话,那将会给出一个指示。


无论如何,C标准没有说明代码的效率
生成
。如果您的应用程序的延迟太小,那么

您将需要分析您的代码以找出它所在的区域

拖延并查看什么要做的事情。


1月25日11:00,Enrique Cruiz< jni6l03mdo6n ... @ jetable.orgwrote:
< blockquote class =post_quotes>
我目前正在实施一个相当简单的算法。它扫描一个

灰度图像,并计算一个像素的新值作为其原始值的b
的函数。两次传球,第一次水平传球,第二次传球
。我遇到的问题是垂直传递比水平慢3到4个b $ b倍,虽然在两种情况下代码都是_exactly_

相同?!



[Snip]


这不是关于C语言的问题。一般论坛

如comp.programming可能比C语言更合适

论坛。


我唯一的猜测与内存管理问题有关。



< Off-topic>

你几乎肯定是对的,而且可能没有很多你可以

用这种处理方式来改变它。

在水平传递中你可以在拉动之前处理一个缓存的价值数据。下一个缓存。在最坏的情况下,你的

垂直传递可能会为每个访问的像素都有一个缓存未命中。

< / Off-topic>


Enrique Cruiz写道:


现在我们只需删除更新像素的行

指针(即pixel ++;和pixel + = imgWidth;),然后定时

等于0.081。这个简单的指令是造成大量时间损失的原因。我不明白为什么。



我很确定这是因为您正在编译的处理器
有一个INC命令,它增加了一个给定的数字非常快。你的

" pixel ++"被编译成这样的INC,而pixel + = imgWidth被编译成这样的INC。可以

只能翻译成一个需要更长时间的ADD命令。


我不认为你可以在这里做很多特定于C的事。也许,如果imgWidth是2的两倍,那么就有机会使用一些奇怪的& s和| s,但是我想知道...... b / b ... >

问候

Steffen


Hello all,

I am currently implementing a fairly simple algorithm. It scans a
grayscale image, and computes a pixel''s new value as a function of its
original value. Two passes are made, first horizontally and second
vertically. The problem I have is that the vertical pass is 3 to 4
times slower than the horizontal, although the code is _exactly_ the
same in both cases?!

The code itself is very simple. There are 2 loops to scan through rows
and colums (horizontal pass), or columns and rows (vertical pass)
depending on the pass. The processing part is simple and is contained
in a single line. ''pixel'' is a pointer to the current pixel. The new
value of the current pixel is first updated, and the pointer is then
incremented to the next pixel, which is either in the next column or in
the next row depending on the pass. I should add that the image is
stored as a large 1-D vector, i.e. the values of each rows are
contiguous in memory. Here is a simplified version of the code:

####################
// HORIZONTAL
// loop for every row
....
// then for every column
for(col=firstCol+1 ; col<=lastCol-1 ; ++col)
{
*pixel = (*pixel) * 2 / 3;
pixel++;
}

// VERTICAL
// loop for every column
....
// then for every row
for(row=firstRow+1 ; row<=lastRow-1 ; ++row)
{
*pixel = (*pixel) * 2 / 3;
pixel+=imgWidth;
}
##################

For this small amount of code, timings are as follow:
- horizontal = 0.035 sec.
- vertical =* *0.135 sec.

Now if we simply remove in each case the line updating the pixel
pointer (i.e. "pixel++;" and "pixel+=imgWidth;"), the timings then
becomes equal at 0.081. This simple instruction is responsible for the
massive loss of time. And I have no idea why.

My only guess relates to memory management issues. Since the image is
stored row-wise, the current and next values are physically next in
memory during the horizontal pass. On the other hand, for the vertical
pass, the next value is stored in the next row, and the distance
between them becomes ''image_width''. My guess is that the next pixel
value in such a case is not close enough to be stored in the processor
cache or register. The processor has to fetch it from memory, hence the
massive loss in speed. This is however just a guess.

I would really appreciate if anybody could enlighten me on this topic.

Thanks in advance,
Enrique

解决方案

Enrique Cruiz wrote:

Hello all,

I am currently implementing a fairly simple algorithm. It scans a
grayscale image, and computes a pixel''s new value as a function of its
original value. Two passes are made, first horizontally and second
vertically. The problem I have is that the vertical pass is 3 to 4
times slower than the horizontal, although the code is _exactly_ the
same in both cases?!

The code itself is very simple. There are 2 loops to scan through rows
and colums (horizontal pass), or columns and rows (vertical pass)
depending on the pass. The processing part is simple and is contained
in a single line. ''pixel'' is a pointer to the current pixel. The new
value of the current pixel is first updated, and the pointer is then
incremented to the next pixel, which is either in the next column or in
the next row depending on the pass. I should add that the image is
stored as a large 1-D vector, i.e. the values of each rows are
contiguous in memory. Here is a simplified version of the code:

####################
// HORIZONTAL
// loop for every row
...
// then for every column
for(col=firstCol+1 ; col<=lastCol-1 ; ++col)
{
*pixel = (*pixel) * 2 / 3;
pixel++;
}

// VERTICAL
// loop for every column
...
// then for every row
for(row=firstRow+1 ; row<=lastRow-1 ; ++row)
{
*pixel = (*pixel) * 2 / 3;
pixel+=imgWidth;
}
##################

For this small amount of code, timings are as follow:
- horizontal = 0.035 sec.
- vertical = 0.135 sec.

Now if we simply remove in each case the line updating the pixel
pointer (i.e. "pixel++;" and "pixel+=imgWidth;"), the timings then
becomes equal at 0.081. This simple instruction is responsible for the
massive loss of time. And I have no idea why.

My only guess relates to memory management issues. Since the image is
stored row-wise, the current and next values are physically next in
memory during the horizontal pass. On the other hand, for the vertical
pass, the next value is stored in the next row, and the distance
between them becomes ''image_width''. My guess is that the next pixel
value in such a case is not close enough to be stored in the processor
cache or register. The processor has to fetch it from memory, hence the
massive loss in speed. This is however just a guess.

I would really appreciate if anybody could enlighten me on this topic.

Thanks in advance,

Without a complete, minimal, compilable example, our guess is as good
as yours, which is probably a good guess. Use the time program to
observe time spent by your code and time spent in system code. That''ll
give an indication if your guess is right.

In anycase, the C standard says nothing about efficiency of the code
generated. If the small delay is too much for your application, then
you''ll have to profile your code to find out the area where it is
stalling and look at what to do about it.


On 25 Jan, 11:00, Enrique Cruiz <jni6l03mdo6n...@jetable.orgwrote:

I am currently implementing a fairly simple algorithm. It scans a
grayscale image, and computes a pixel''s new value as a function of its
original value. Two passes are made, first horizontally and second
vertically. The problem I have is that the vertical pass is 3 to 4
times slower than the horizontal, although the code is _exactly_ the
same in both cases?!

[Snip]

This is not really a question about the C language. A general forum
such as comp.programming may be more appropriate than a C language
forum.

My only guess relates to memory management issues.

<Off-topic>
You are almost certainly right, and there''s probably not a lot you can
do to change it with this processing style.
In your horizontal pass you can process a cache''s worth of data at a
time before pulling the next cachefull. In the worst case, your
vertical pass could have a cache miss for each pixel accessed.
</Off-topic>


Enrique Cruiz wrote:

Now if we simply remove in each case the line updating the pixel
pointer (i.e. "pixel++;" and "pixel+=imgWidth;"), the timings then
becomes equal at 0.081. This simple instruction is responsible for the
massive loss of time. And I have no idea why.

I''m pretty sure it''s because the processor for which you are compiling
has an INC-command which increments a given number quite fast. Your
"pixel++" is compiled into such an INC, whereas "pixel+=imgWidth" can
only be translated into an ADD-command which takes much longer.

I don''t think you can do much C-specific here. Maybe, if imgWidth is a
power of two, there is a chance of using some strange &s and |s, but I
wonder...

Regards
Steffen


这篇关于行方式与列方式图像处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆