如何使用numpy或pandas仅在python中的两行之间读取数据? [英] How to read data only between two lines in python using numpy or pandas?

查看:76
本文介绍了如何使用numpy或pandas仅在python中的两行之间读取数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的数据文件:

I have a datafile like this:

# column 1 is the angle of incidence (degrees)
# column 2 is the wavelength (microns)
# column 3 is the transmission probability
# column 4 is the reflection probability
      14.2000     0.300000  0.01     0.999920
      14.2000     0.301000  0.02     0.999960
      14.2000     0.302000  0.03     0.999980
      14.2000     0.303000  0.04     0.999980
      14.2000     0.304000  0.06     0.999980
      14.2000     0.305000  0.08     0.999970
      14.2000     0.306000  0.2     0.999950
      14.2000     0.307000  0.4     0.999910
      14.2000     0.308000  0.8     0.999860
      14.2000     0.309000  0.9     0.999960
      14.2000     0.310000  0.8     0.999990
      14.2000     0.311000  0.4     0.999980
      14.2000     0.312000  0.2     0.999960
      14.2000     0.313000  0.06     0.999940
      14.2000     0.314000  0.03     0.999930
      14.2000     0.315000  0.02     1.00000
      14.2000     0.316000  0.01     1.00000

所需的输出文件output.csv是这样的:

Required output file output.csv is this:

# column 1 is the angle of incidence (degrees)
# column 2 is the wavelength (microns)
# column 3 is the transmission probability
# column 4 is the reflection probability
      14.2000     0.304000  0.06     0.999980
      14.2000     0.305000  0.08     0.999970
      14.2000     0.306000  0.2     0.999950
      14.2000     0.307000  0.4     0.999910
      14.2000     0.308000  0.8     0.999860
      14.2000     0.309000  0.9     0.999960
      14.2000     0.310000  0.8     0.999990
      14.2000     0.311000  0.4     0.999980
      14.2000     0.312000  0.2     0.999960
      14.2000     0.313000  0.06     0.999940
      14.2000     0.314000  0.03     0.999930


      # conditions are: 
      # output first element of column3 >= 0.05   i.e. 0.06
      # output last  element of column3  < 0.05   i.e. 0.03

      # for the second may be we need to get the index of second 0.06 and 
      #     get the value of next index.

我们如何在 python pandas 或 numpy 中这样做?

How can we do so in python pandas or numpy?

我最初的尝试是这样的:

My initial attempt is this:

#!/usr/bin/env python
# -*- coding: utf-8 -*- 
# Author    : Bhishan Poudel 
# Date      : June 16, 2016 


# Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#==============================================================================
# read in a file
infile = 'filter_2.txt'
colnames = ['angle', 'wave','trans', 'refl']
print('{} {} {} {}'.format('\nreading file : ', infile, '','' ))
df = pd.read_csv(infile,sep='\s+', header = None,skiprows = 0,
                 comment='#',names=colnames,usecols=(0,1,2,3))

print(df)

# find value of wavelength just above 0.05
print("\n")
df         = df[(df['trans'] >=  0.05) ]
print(df)

一些类似的链接如下:
如何在python中的2个特定行之间读取

推荐答案

我会完全跳过 pandas 或 numpy

I'd skip pandas or numpy altogether

fo = open('filter_3.txt', 'w')
with open('filter_2.txt', 'r') as fi:
    line = fi.readline()
    while line:
        split = line.split()
        if (split[0] == '#') or (float(split[2]) >= 0.027):
            print line,
            fo.write(line)

        line = fi.readline()

fo.close()

# column 1 is the angle of incidence (degrees)
# column 2 is the wavelength (microns)
# column 3 is the transmission probability
# column 4 is the reflection probability
      14.2000     0.302000  0.028     0.999980
      14.2000     0.303000  0.030     0.999980
      14.2000     0.304000  0.032     0.999980
      14.2000     0.305000  0.030     0.999970
      14.2000     0.306000  0.028     0.999950

<小时>

添加一行的新代码

fo = open('filter_3.txt', 'w')
with open('filter_2.txt', 'r') as fi:
    new_line = fi.readline()
    old_line = None
    while new_line:
        split_new = new_line.split()
        if old_line is not None:
            split_old = old_line.split()

        cond0 = False if old_line is None else (split_old[0] == '#')
        cond1 = split_new[0] == '#'
        cond2 = float(split_new[2]) >= 0.05
        cond3 = False if old_line is None else (float(split_old[2]) >= 0.05)

        if (cond1 or cond2) or (cond3 and not cond0):
            print new_line,
            fo.write(new_line)
            printed_old = True

        old_line = new_line
        new_line = fi.readline()

fo.close()

# column 1 is the angle of incidence (degrees)
# column 2 is the wavelength (microns)
# column 3 is the transmission probability
# column 4 is the reflection probability
      14.2000     0.304000  0.06     0.999980
      14.2000     0.305000  0.08     0.999970
      14.2000     0.306000  0.2     0.999950
      14.2000     0.307000  0.4     0.999910
      14.2000     0.308000  0.8     0.999860
      14.2000     0.309000  0.9     0.999960
      14.2000     0.310000  0.8     0.999990
      14.2000     0.311000  0.4     0.999980
      14.2000     0.312000  0.2     0.999960
      14.2000     0.313000  0.06     0.999940
      14.2000     0.314000  0.03     0.999930

这篇关于如何使用numpy或pandas仅在python中的两行之间读取数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆