单个文件中有多个数据块,每个块有一个绘图+标记 [英] Multiple data blocks in a single file and a single plot + Markers for each block

查看:123
本文介绍了单个文件中有多个数据块,每个块有一个绘图+标记的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数据文件:

"curve 0"
0 0.7800
10 0.333
12 0.5136
24 0.2096
26 -0.066
40 -0.674
42 -1.123


"curve 1"
0 0.876
2 0.73
4 0.693
6 0.672
10 0.70
12 0.88
16 0.95
148 -0.75


"curve 2"
8 2.2305
10 2.144
12 2.13
76 1.26
78 0.39
98 -0.97

我想使用gnuplot独立绘制每个数据块.这是我用于此目的的代码:

plot 'file' i 0 u 1:2 w lines title columnheader(1),\
'file' i 1 u 1:2 w lines title columnheader(1),\
'file' i 2 u 1:2 w lines title columnheader(1),\
'file' i 3 u 1:2 w lines title columnheader(1)

它工作正常.

现在,我想确定每个数据块中具有最大y值的点(x,y),并用与该数据块对应的曲线具有相同颜色的标记对其进行绘制.我尝试使用

max_y = GPVAL_DATA_Y_MAX
replot 'file' u ($2 == max_y ? $2 : 1/0):1

在上一个代码之后,但似乎在整个第二列(包括所有块)中找到了最大值.

我想做的第二件事是:对于每个数据块,并使用形状不同于最大值的标记但形状(颜色相同)的标记(曲线的颜色),绘制该块的第一行.

使用gnuplot和我绘制曲线的方式(columnheader)是否可以完成这两项任务?

解决方案

这可以完成.它将广泛使用stats命令和一个临时文件.在gnuplot 5中,可以使用命名数据块(请参见help datablocks)在内存中创建临时文件.

此外,由于您的plot命令在很大程度上是重复性的,因此您可以将plot用作语法

plot for[in=0:2] 'file' i in u 1:2 w lines t columnheader(1)

这将使用变量in中的值0到2重复plot命令(您提供的命令使用四个数据块,但是您提供的数据文件只有3个).

以下脚本将完成您想要的操作:

stats 'file' u 1:2 nooutput
blocks = STATS_blocks

set print 'tempfile'

first_y = ""
first_x = ""
do for[i=0:blocks-1] {
    stats 'file' index i u (first_x=($0==1)?sprintf("%s %f",first_x,$1):first_x,first_y=($0==1)?sprintf("%s %f",first_y,$2):first_y,$1):2 nooutput
    print sprintf("%f %f",STATS_pos_max_y,STATS_max_y) 
}

print ""
print ""
do for[i=1:blocks] {
    print sprintf("%s %s",word(first_x,i),word(first_y,i))
}
set print

plot for[i=0:blocks-1] 'file' i i u 1:2 w lines title columnheader(1),\
     for[i=0:1] 'tempfile' i i u 1:2:($0+1) w points pt (i==0?7:9) lc variable not

这会产生(带有您提供的数据文件)

在曲线0和2的情况下,第一个点和最大点相同,因此符号被遮盖.

将其重新绘制,但更改规格以将第一个点标记向上移动0.1,我们可以看到它们显示在应有的位置.


这部分内容会很长,但是我将分解代码并尽可能地逐行详细解释它,因为这里有一些微妙的内容.

前两行

stats 'file' u 1:2 nooutput
blocks = STATS_blocks

在文件上运行stats命令.由于指定了列标题,因此如果不指定使用规范,则stats函数将失败,因此我们将其指定为u 1:2规范. nooutput选项告诉stats命令捕获结果,但不输出结果.在这里,我们只关心获取块数.我们将其存储在变量 blocks 中(因为以后的stats命令将覆盖该变量).我们可以给定一个命名前缀,但是那样可以保存所有变量,没有理由.代替这两个命令,在恰好3个块的情况下,我们可以将值3替换为以下所有 block 出现的情况,但是这样就不会对块的数量进行硬编码. /p>

接下来,我们使用set print 'tempfile'将打印命令重定向到一个临时文件.我们将建立一个新的数据文件,其中包含最高点和第一点.

下一部分代码

first_y = ""
first_x = ""
do for[i=0:blocks-1] {
    stats 'file' index i u (first_x=($0==1)?sprintf("%s %f",first_x,$1):first_x,first_y=($0==1)?sprintf("%s %f",first_y,$2):first_y,$1):2 nooutput
    print sprintf("%f %f",STATS_pos_max_y,STATS_max_y) 
}

是最困难的,大多数魔术发生的地方.我们将创建具有两个数据块的临时文件.第一个是最大值,第二个是第一个值.我们将计算内存中的第一个点,并在创建第一个数据块后将它们相加. x坐标和y坐标将存储在以空格分隔的字符串变量中.

我们遍历所有数据块并为其计算stats命令.表达式

(first_x=($0==1)?sprintf("%s %f",first_x,$1):first_x,first_y=($0==1)?sprintf("%s %f",first_y,$2):first_y,$1)

为每个读入的点重新分配两个字符串变量.为此,它首先检查该点是否是序列中的第一个(由于0值对应于标题行,因此$ 0的值将为1).如果是这样,它将通过将第一列的值添加到字符串变量中来重建字符串变量(对于y坐标也是如此).否则,它只会将相同的内容重新分配给变量.最后,它返回第一列中的值.将表达式放在括号中并以逗号分隔时,将依次评估每个表达式并返回最终值.

因此stats命令的行为与以前一样

stats 'file' index i u 1:2 nooutput

但是,这个小技巧使我们能够读取第一行值并在它们进入时进行存储.最后,将打印出最大y值的点.这将进入临时文件.

现在,我们需要将第一个点添加到临时文件中作为新的数据块.因此,首先我们打印两条空白行,然后再次遍历正在运行的块的数量

print sprintf("%s %s",word(first_x,i),word(first_y,i))

每个块的

(其中i是块的编号). word函数将字符串变量视为空格分隔的单词列表,并提取所请求的单词.此时,我们的字符串变量看起来像

 0.000000 0.000000 8.000000 # first_x
 0.780000 0.876000 2.230500 # first_y

最后,我们发布set print,它将恢复打印命令以打印到控制台.现在我们已经建立了一个临时文件,看起来像

0.000000 0.780000
16.000000 0.950000
8.000000 2.230500


0.000000 0.780000
0.000000 0.876000
8.000000 2.230500

其中第一个数据块是y值最大的点,第二个数据块是第一个点.

最后,我们用

进行绘图

plot for[i=0:blocks-1] 'file' i i u 1:2 w lines title columnheader(1),\
     for[i=0:1] 'tempfile' i i u 1:2:($0+1) w points pt (i==0?7:9) lc variable not

第一部分与之前相同,只是使用了blocks变量,而不是硬编码块数.

接下来,我们用索引0和索引1绘制两次临时文件.线条颜色根据行号(在这种情况下为0到2)是可变的.我们加一来强制通常从0开始的行号为1到3.这将与之前的数据块相对应.我们用点进行绘制,然后根据要绘制的数据块选择点类型.在这种情况下,它可以是实心圆(最大值)或实心三角形(第一点).

I have a data file that looks like this :

"curve 0"
0 0.7800
10 0.333
12 0.5136
24 0.2096
26 -0.066
40 -0.674
42 -1.123


"curve 1"
0 0.876
2 0.73
4 0.693
6 0.672
10 0.70
12 0.88
16 0.95
148 -0.75


"curve 2"
8 2.2305
10 2.144
12 2.13
76 1.26
78 0.39
98 -0.97

I would like to plot each block of data independently of the others using gnuplot. Here's the code I'm using for this purpose :

plot 'file' i 0 u 1:2 w lines title columnheader(1),\
'file' i 1 u 1:2 w lines title columnheader(1),\
'file' i 2 u 1:2 w lines title columnheader(1),\
'file' i 3 u 1:2 w lines title columnheader(1)

It works fine.

Now, I would like to determine in each data block the point (x,y) that has the maximum y-value, and plot it with a marker which has the same color as the curve corresponding to this data block. I tried to use

max_y = GPVAL_DATA_Y_MAX
replot 'file' u ($2 == max_y ? $2 : 1/0):1

after the previous code, but it seems that this finds the maximum over the whole second column including all blocks.

The second thing I would like to do is : for each data block and with a marker that has a different shape but the same color (that of the curve) than the marker for the maximums, plot the first line of that block.

Are these two tasks possible with gnuplot and with the way I'm plotting the curves (columnheader)?

解决方案

This can be done. It will use the stats command extensively, and a temporary file. In gnuplot 5, the temporary file can be created in memory using a named data block (see help datablocks).

Additionally, as your plot command is largely repetitive, you can use the plot for syntax

plot for[in=0:2] 'file' i in u 1:2 w lines t columnheader(1)

which will repeat the plot command using the values 0 through 2 for the variable in (your provided command uses four data blocks, but your provided data file only has 3).

The following script will accomplish what you want:

stats 'file' u 1:2 nooutput
blocks = STATS_blocks

set print 'tempfile'

first_y = ""
first_x = ""
do for[i=0:blocks-1] {
    stats 'file' index i u (first_x=($0==1)?sprintf("%s %f",first_x,$1):first_x,first_y=($0==1)?sprintf("%s %f",first_y,$2):first_y,$1):2 nooutput
    print sprintf("%f %f",STATS_pos_max_y,STATS_max_y) 
}

print ""
print ""
do for[i=1:blocks] {
    print sprintf("%s %s",word(first_x,i),word(first_y,i))
}
set print

plot for[i=0:blocks-1] 'file' i i u 1:2 w lines title columnheader(1),\
     for[i=0:1] 'tempfile' i i u 1:2:($0+1) w points pt (i==0?7:9) lc variable not

This produces (with your provided datafile)

In the case of curve 0 and 2, the first and maximum points are the same, so the symbols are obscured.

Replotting this, but altering the specification to move the first point markers up by 0.1, we can see that they show up where they should.


This section is going to be long, but I will break down the code and explain it in detail, as close to line by line as possible, because there are a few subtle things in here.

The first two lines

stats 'file' u 1:2 nooutput
blocks = STATS_blocks

run the stats command over the file. Because of the named column headers, the stats function will fail if we don't specify a using spec, so we give it the u 1:2 spec. The nooutput option tells the stats command to capture the results, but do not output them. Here we only care about getting the number of blocks. We store this in the variable blocks (as later stats commands will overwrite the variable). We could have given a named prefix, but that would have saved all variables and there is no reason for that. Instead of these two commands, in the case of exactly 3 blocks, we could have just substituted the value 3 for all occurrences of blocks below, but this way the number of blocks is not hard-coded.

Next, we use set print 'tempfile' to redirect print commands to a temporary file. We will build up a new datafile that contains the maximum points and the first points.

The next section of code

first_y = ""
first_x = ""
do for[i=0:blocks-1] {
    stats 'file' index i u (first_x=($0==1)?sprintf("%s %f",first_x,$1):first_x,first_y=($0==1)?sprintf("%s %f",first_y,$2):first_y,$1):2 nooutput
    print sprintf("%f %f",STATS_pos_max_y,STATS_max_y) 
}

is the most difficult and where most of the magic happens. We are going to create our temporary file to have two datablocks. The first is the maximum values and the second is the first values. We will compute the first points in memory and add them after we have created that first data block. The x coordinates and y coordinates will be stored in a space separated string variable.

We iterate over all the data blocks and compute a stats command for it. The expression

(first_x=($0==1)?sprintf("%s %f",first_x,$1):first_x,first_y=($0==1)?sprintf("%s %f",first_y,$2):first_y,$1)

reassigns the two string variables for each point read in. To do this, it first checks if the point is the first one in series (the value of $0 will be 1 since the 0 value corresponds to the header line). If it is, it rebuilds the string variable by adding the value of the first column to it (and similarly for the y coordinates). Otherwise, it just reassigns the same thing to the variable. Finally, it returns the value in the first column. When expressions are put in parentheses and comma separated like this, each expression is evaluated in turn, and the final value is returned.

Thus the stats command behaves like it was

stats 'file' index i u 1:2 nooutput

but this little trick allows us to read the first line values and store them when they come in. Finally the point with the maximum y value is printed out. This will go into the temporary file.

Now we need to add the first points to the temporary file as a new datablock. So first we print two blank lines and then we again iterate over the number of blocks running

print sprintf("%s %s",word(first_x,i),word(first_y,i))

for each block (where i is the number of the block). The word function treats a string variable as a space separated list of words and pulls off the requested word. At this point our string variables look like

 0.000000 0.000000 8.000000 # first_x
 0.780000 0.876000 2.230500 # first_y

Finally, we issue set print which restores the print command to print to the console. We have now built a temporary file which looks like

0.000000 0.780000
16.000000 0.950000
8.000000 2.230500


0.000000 0.780000
0.000000 0.876000
8.000000 2.230500

where the first datablock are the points with the maximum y-value and the second datablock are the first points.

Finally, we plot with

plot for[i=0:blocks-1] 'file' i i u 1:2 w lines title columnheader(1),\
     for[i=0:1] 'tempfile' i i u 1:2:($0+1) w points pt (i==0?7:9) lc variable not

The first part of this is identical to before, just with the blocks variable used instead of hard-coding the number of blocks.

Next we plot the temporary file twice with index 0 and index 1. The line color is variable based on the line number (0 through 2 in this case). We add one to force the normally 0 based line number to be 1 through 3. This will correspond with the datablocks from before. We plot with points and select the point type based on the datablock we are plotting. It is either a filled circle (for the maximums) or filled triangle (for the first points) in this case.

这篇关于单个文件中有多个数据块,每个块有一个绘图+标记的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆