Sed:降低数据处理速度 [英] Sed: Decreasing speed of data processing
问题描述
我有大文件(10-20 GB),在使用Gnuplot绘制数据之前,我用Sed对其进行了预处理.地块另存为.png图像. data
文件由大小为matrix_size x matrix_size
的images
个矩阵组成.大小为matrix_size=2
的两个(images=3
)矩阵的data
文件如下所示:
I have large files (10-20 GB) which I preprocess with Sed before I plot the data using Gnuplot. The plots are saved as .png image. The data
file consists of images
matrices of size matrix_size x matrix_size
. The data
file for two (images=3
) matrices of size matrix_size=2
looks like:
1 2
3 2
1 5
3 4
5 2
2 3
我使用Sed提取data
文件的每个矩阵.刚开始时,这种情况发生得非常快,我的脚本每秒产生一张图像.但是过了一会儿,每个图像的时间最多增加25秒.为什么会这样呢?这是我的代码:
I use Sed to extract each matrix of the data
file. At the beginning this happens really fast and my script produces one image per second. But after a while the time increases up to 25 seconds per image. Why is this the case? Here is my code:
unset border
unset key
unset xtics
unset ytics
unset ztics
unset colorbox
set autoscale fix
set size ratio -1
file = 'data'
matrix_size = 1000
images = 1000
sizeX = matrix_size
sizeY = matrix_size
set xrange [1:matrix_size]
set yrange [1:matrix_size]
set terminal png size sizeX, sizeY
getMatrix(fileName, n, i) = sprintf("<sed -n '%d,%dp;%dq' '%s'", (i-1)*n + 1, i*n, i*n+1, fileName)
do for [i=1:images]{
t0 = strftime('%s', time(0))
set output sprintf('%05d_%s.png', i, file)
plot getMatrix(file, matrix_size, i) matrix with image
t1 = strftime('%s', time(0))
print(sprintf('%d %d', t1-t0, i))
}
这是每张图像绘制所需的时间(以秒为单位).一开始非常快,然后越来越慢:
Here is the time it takes in seconds for every image to plot. At the beginning very fast and then slower and slower:
推荐答案
我建议您使用split
一次性将所有矩阵提取到单个文件中:
I would suggest you use split
to extract all your matrices to individual files, up front, in a single pass:
split -a 4 -d -l matrix_size data matrix-
这将把每个矩阵放在一个单独的文件中,该文件称为matrix-0000
,如果我理解您的文件格式,则为matrix-0001
.
That will put each matrix in a separate file called matrix-0000
, matrix-0001
if I understood your file format.
这篇关于Sed:降低数据处理速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!