如何从2列中提取数值范围并将两列中的范围打印为元组? [英] How can I extract numeric ranges from 2 columns and print the range from both columns as tuples?
问题描述
我对bash脚本和python编程还很陌生;目前有2列,其中包含以下数字序列:
I'm quite new on bash scripting and on python programing; at the moment have 2 columns which contains numeric sequences as follow:
Col 1:
1
2
3
5
7
8
Col 2:
101
102
103
105
107
108
需要从两列中提取数字范围,并根据这两列中任何一列的顺序中断发生率进行打印,结果应如下所示:
Need to extract the numeric ranges from both columns and print them according to the sequence break occrance on any of those 2 columns and the result should be as follow:
1,3,101,103
5,5,105,105
7,8,107,108
已收到有关如何使用awk从一列中提取数值范围的有用信息:- $ awk'NR == 1 || sqrt(($ 0-p)*($ 0-p))> 1 {打印p; printf%s",$ 0,"} {p = $ 0} END {print $ 0}'文件-;但是现在问题变得更加复杂了,因为必须包含第二个具有其他数字序列的列,并且结果需要在两列中任何一个出现序列中断的地方,从列的范围开始.
Already received a useful information on how to extract numeric ranges from one column using awk: - $ awk 'NR==1||sqrt(($0-p)*($0-p))>1{print p; printf "%s", $0 ", "} {p=$0} END{print $0}' file - ; but now the problem got a bit more complex as have to include a second column with another numeric sequence and requires as a result the ranges from the columns wherever the sequence breaks occurs on any of the 2 columns.
为了增加一些复杂性,序列可以升序和/或降序.
To add a bit more complexity the sequences can be ascending and/or descending.
尝试使用pandas(数据框)和python的numpy库找到解决方案.
Trying to find a solution using pandas (data frames) and numpy libraries for python.
预先感谢.
MaxU您好,谢谢您的答复,很遗憾,我遇到了以下情况的问题:
Hello MaxU thanks for your reply, unfortunately I'm hitting an issue for the following case:
颜色1:
7
8
9
10
11
Col 2:
52
51
47
46
45
第二列中的数字序列从头开始降序;结果是:
Where numeric sequence in the second column is descending from the begining; it generates as a result:
7,11,45,52
7,11,45,52
代替:
7,8,51,52
7,8,51,52
8,11,45,47
8,11,45,47
干杯.
推荐答案
更新:
In [103]: df
Out[103]:
Col1 Col2
0 7 52
1 8 51
2 9 47
3 10 46
4 11 45
In [104]: (df.groupby((df.diff().abs() != 1).any(1).cumsum()).agg(['min','max']))
Out[104]:
Col1 Col2
min max min max
1 7 8 51 52
2 9 11 45 47
老答案:
这是在熊猫中做这件事的一种方法(
Here is one way (among many) to do it in Pandas:
数据:
In [314]: df
Out[314]:
Col1 Col2
0 1 101
1 2 102
2 3 103
3 5 105
4 8 108
5 7 107
6 6 106
7 9 109
注意:注意-索引为(4,5,6)的行是降序排列
NOTE: pay attention - rows with indexes (4,5,6) is a descending sequence
解决方案:
In [350]: rslt = (df.groupby((df.diff().abs() != 1).all(1).cumsum())
...: .agg(['min','max']))
...:
In [351]: rslt
Out[351]:
Col1 Col2
min max min max
1 1 3 101 103
2 5 5 105 105
3 6 8 106 108
4 9 9 109 109
现在您可以轻松地将其保存到CSV文件:
now you can easily save it to CSV file:
rslt.to_csv(r'/path/to/file_name.csv', index=False, header=None)
或仅打印它:
In [333]: print(rslt.to_csv(index=False, header=None))
1,3,101,103
5,5,105,105
6,8,106,108
9,9,109,109
这篇关于如何从2列中提取数值范围并将两列中的范围打印为元组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!