如何从2列中提取数值范围并将两列中的范围打印为元组? [英] How can I extract numeric ranges from 2 columns and print the range from both columns as tuples?

查看:106
本文介绍了如何从2列中提取数值范围并将两列中的范围打印为元组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对bash脚本和python编程还很陌生;目前有2列,其中包含以下数字序列:

I'm quite new on bash scripting and on python programing; at the moment have 2 columns which contains numeric sequences as follow:

Col 1:
1
2
3
5
7
8

Col 2:

101
102
103
105
107
108

需要从两列中提取数字范围,并根据这两列中任何一列的顺序中断发生率进行打印,结果应如下所示:

Need to extract the numeric ranges from both columns and print them according to the sequence break occrance on any of those 2 columns and the result should be as follow:

1,3,101,103

5,5,105,105

7,8,107,108

已收到有关如何使用awk从一列中提取数值范围的有用信息:- $ awk'NR == 1 || sqrt(($ 0-p)*($ 0-p))> 1 {打印p; printf%s",$ 0,"} {p = $ 0} END {print $ 0}'文件-;但是现在问题变得更加复杂了,因为必须包含第二个具有其他数字序列的列,并且结果需要在两列中任何一个出现序列中断的地方,从列的范围开始.

Already received a useful information on how to extract numeric ranges from one column using awk: - $ awk 'NR==1||sqrt(($0-p)*($0-p))>1{print p; printf "%s", $0 ", "} {p=$0} END{print $0}' file - ; but now the problem got a bit more complex as have to include a second column with another numeric sequence and requires as a result the ranges from the columns wherever the sequence breaks occurs on any of the 2 columns.

为了增加一些复杂性,序列可以升序和/或降序.

To add a bit more complexity the sequences can be ascending and/or descending.

尝试使用pandas(数据框)和python的numpy库找到解决方案.

Trying to find a solution using pandas (data frames) and numpy libraries for python.

预先感谢.

MaxU您好,谢谢您的答复,很遗憾,我遇到了以下情况的问题:

Hello MaxU thanks for your reply, unfortunately I'm hitting an issue for the following case:

颜色1:

 7
 8
 9
10
11


Col 2:

52
51
47
46
45

第二列中的数字序列从头开始降序;结果是:

Where numeric sequence in the second column is descending from the begining; it generates as a result:

7,11,45,52

7,11,45,52

代替:

7,8,51,52

7,8,51,52

8,11,45,47

8,11,45,47

干杯.

推荐答案

更新:

In [103]: df
Out[103]:
   Col1  Col2
0     7    52
1     8    51
2     9    47
3    10    46
4    11    45

In [104]: (df.groupby((df.diff().abs() != 1).any(1).cumsum()).agg(['min','max']))
Out[104]:
  Col1     Col2
   min max  min max
1    7   8   51  52
2    9  11   45  47

老答案:

这是在熊猫中做这件事的一种方法(

Here is one way (among many) to do it in Pandas:

数据:

In [314]: df
Out[314]:
   Col1  Col2
0     1   101
1     2   102
2     3   103
3     5   105
4     8   108
5     7   107
6     6   106
7     9   109

注意:注意-索引为(4,5,6)的行是降序排列

NOTE: pay attention - rows with indexes (4,5,6) is a descending sequence

解决方案:

In [350]: rslt = (df.groupby((df.diff().abs() != 1).all(1).cumsum())
     ...:           .agg(['min','max']))
     ...:

In [351]: rslt
Out[351]:
  Col1     Col2
   min max  min  max
1    1   3  101  103
2    5   5  105  105
3    6   8  106  108
4    9   9  109  109

现在您可以轻松地将其保存到CSV文件:

now you can easily save it to CSV file:

rslt.to_csv(r'/path/to/file_name.csv', index=False, header=None)

或仅打印它:

In [333]: print(rslt.to_csv(index=False, header=None))
1,3,101,103
5,5,105,105
6,8,106,108
9,9,109,109

这篇关于如何从2列中提取数值范围并将两列中的范围打印为元组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆