基于特定的索引字符串值选择记录,然后通过python删除后续字段 [英] Select records based on the specific index string value and then remove subsequent fields by python

查看:176
本文介绍了基于特定的索引字符串值选择记录,然后通过python删除后续字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为 file01.csv 的.csv文件,其中包含许多记录。有些记录是必需的,有些则不是。我发现所需的记录有一个字符串变量Mi,但它不存在于不必要的记录中。所以,我想根据字符串值Mi在每个记录的字段中选择所需的记录。

I have a .csv file named file01.csv that contains many records. Some records are required and some are not. I find that the required records has a string variable "Mi", but it is not exist into the unnecessary records. So, I want to select the required records based on string value "Mi" in the field for every records.

最后,我要从包含值Mi的字段中删除每个记录的后续字段。

Finally I want to delete the subsequent fields of each record from the field that contains value "Mi". Any suggestion and advice is appreciated.

可选:


  1. 另外,我要删除第一列。

  1. In addition, I want to delete the first column.

将列BB拆分为两列,名为a_id和c_id。用_(下划线)分隔值,左侧将转到a_id,右侧转到c_id。

Split column BB into two column named as a_id, and c_id. Separate the value by _ (underscore) and left side will go to a_id, and right side will go to c_id.

我的 fileO.csv 如下:

   AA      BB       CC       DD     EE      FF    GG
   1       1_1.csv  (=0      =10"   27"     =57   "Mi"
   0.97    0.9      0.8      NaN    0.9     od    0.2
   2       1_3.csv  (=0      =10"   27"     "Mi"  0.5
   0.97    0.5      0.8      NaN    0.9     od    0.4
   3       1_6.csv  (=0      =10"   "Mi"     =53  cnt
   0.97    0.9      0.8      NaN    0.9     od    0.6
   4       2_6.csv  No Bi    000    000     000   000
   5       2_8.csv  No Bi    000    000     000   000
   6       6_9.csv  less     000    000     000   000
   7       7_9.csv  s(=0     =26"   =46"    "Mi"  121     

我的预期结果文件(outFile.csv) / p>

My Expected results files (outFile.csv):

a_id    b_id    CC    DD    EE    FF    GG             
1       1       0     10    27    57              
1       3       0     10    27
1       6       0     10 
7       9       0     26    46  


推荐答案

以下方法应该可以使用Python csv 模块正常工作:

import csv
import re
import string

output_header = ['a_id', 'b_id', 'CC', 'DD', 'EE', 'FF', 'GG']

sanitise_table = string.maketrans("","")
nodigits_table = sanitise_table.translate(sanitise_table, string.digits)

def find_mi(row):
    for index, col in enumerate(row):
        if col.find('Mi') != -1:
            return index
    return -1

def sanitise_cell(cell):
    return cell.translate(sanitise_table, nodigits_table)       # Keep digits

f_input = open('fileO.csv', 'rb')
f_output = open('outFile.csv', 'wb')

csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)

input_header = next(f_input)
csv_output.writerow(output_header)

for row in csv_input:
    #print '%2d  %s' % (len(row), row)  
    if len(row) >= 2:
        bb = re.match(r'(\d+)__(\d+).0\.csv', row[1])
        mi = find_mi(row)

        if bb and mi != -1:
            row[:] = row[:mi] + [''] * (len(row) - mi)
            row[:] = [sanitise_cell(col) for col in row]
            row[0] = bb.group(1)
            row[1] = bb.group(2)
            csv_output.writerow(row)

f_input.close()
f_output.close()

outFile.csv 将包含以下内容:

a_id,b_id,CC,DD,EE,FF,GG
1,1,0,10,27,57,
1,3,0,10,27,,
1,6,0,10,,,
7,9,0,26,46,,

使用Python 2.6.6测试

Tested using Python 2.6.6

这篇关于基于特定的索引字符串值选择记录,然后通过python删除后续字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆