使用列表推导为两个不同的条件创建元组列表 [英] Use list comprehension to create a list of tuples for two different conditionals

查看:75
本文介绍了使用列表推导为两个不同的条件创建元组列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有一种方法可以使用列表理解来创建具有两个不同条件的元组列表.

Is there a way to use list comprehension to create a list of tuples with two different conditions.

我正在通过Pandas DF进行交互,如果它符合任一条件,我想以元组返回整行.第一个是DF在任何列中是否都有nan值.另一个是如果DF中名为 ODFS_FILE_CREATE_DATETIME 的列与date列的正则表达式模式不匹配.date列应该具有如下所示的输出: 2005242132 .10位数字.因此,如果df返回类似2004dg的内容,则应将其选择为错误,并将该行添加到我的元组列表中.

I am interacting through a Pandas DF and I want to return an entire row in tuple if it matches either condition. The first is if the DF has nan values in any column. The other is if a column in the DF called ODFS_FILE_CREATE_DATETIME doesn't match the regex pattern for the date column. The date column is supposed to have an output that looks like this: 2005242132. 10 number digits. So if the df returns something like 2004dg, it should be picked up as an error and the row should be added to my list of tuples

我可悲的可悲尝试:

[tuple(x) for x in odfscsv_df[odfscsv_df.isna().any(1)].values or x in odfscdate_re.search(str(odfscsv_df['ODFS_FILE_CREATE_DATETIME'])) ]

包含两个独立的元组列表的完整函数:

Full Function that contains the two seperate list of tuples:

def process_csv_formatting(csv):
    odfscsv_df = pd.read_csv(csv, header=None,names=['ODFS_LOG_FILENAME', 'ODFS_FILE_CREATE_DATETIME', 'LOT', 'TESTER', 'WAFER_SCRIBE'])
    odfscsv_df['CSV_FILENAME'] = csv.name
    odfscdate_re = re.compile(r"\d{10}")
    #print(odfscsv_df)
    #odfscsv_df = odfscsv_df.replace('', np.nan)
    errortup = [(odfsname, "Bad_ODFS_FILE_CREATE_DATETIME= " + str(cdatetime), csv.name) for odfsname,cdatetime in zip(odfscsv_df['ODFS_LOG_FILENAME'], odfscsv_df['ODFS_FILE_CREATE_DATETIME']) if not odfscdate_re.search(str(cdatetime))]
    emptypdf = pd.DataFrame(columns=['ODFS_LOG_FILENAME', 'ODFS_FILE_CREATE_DATETIME', 'LOT', 'TESTER', 'WAFER_SCRIBE'])
 
    print([tuple(x) for x in odfscsv_df[odfscsv_df.isna().any(1)].values])

    [tuple(x) for x in odfscsv_df[odfscsv_df.isna().any(1)].values or x in odfscdate_re.search(str(odfscsv_df['ODFS_FILE_CREATE_DATETIME'])) ]
    #print(odfscsv_df[(odfscsv_df[column_name].notnull()) & (odfscsv_df[column_name] != u'')].index)
    for index, row in odfscsv_df.iterrows():
        #print((row['WAFER_SCRIBE']))
        print((row['ODFS_FILE_CREATE_DATETIME']))
    #errortup = [x for x in odfscsv_df['ODFS_FILE_CREATE_DATETIME']]
    if len(errortup) != 0:
        #print(errortup)  #put this in log file statement somehow
        #print(errortup[0][2])
        return emptypdf
    else:

        return odfscsv_df

示例CSV数据.逗号分隔了单元格:

Sample CSV Data. The commas delienate the cells:

2005091432_943SK1J.00J.SK1J-23.FPD.FMGN520.Jx6D36ny5EO53qAtX4.log,,W943SK10,MGN520,0Z0RK072TCD2
2005230137_014SF1J.00J.SF1J-23.WCPC.FMGN520.XlwHcgyP5eFCpZm5cf.log,,W014SF10,MGN520,DM4MU129SEC1
2005240909_001914J.E0J.914J-15.WRO3PC.FMGN520.nZKn7OvjGKw1i4pxiu.log,,K001914E,MGN520,DM3FZ226SEE3
2005242132_001914J.E0J.914J-15.WRO4PC.FMGN520.V8dcLhEgygRj2rP2Df.log,2005242132,K001914E,MGN520,DM3FZ226SEE3
2005251037_001914J.E0J.914J-15.WRO4PC.FMGN520.dyixmQ5r4SvbDFkivY.log,2005251037,K001914E,MGN520,DM3FZ226SEE3
2005251215_949949J.E0J.949J-21.WRO2PP.FMGN520.yp1i4e7a7D1ighkdB7.log,2005251215,K949949E,MGN520,DG2KV122SEF6
2005251231_949949J.E0J.949J-25.WRO2PP.FMGN520.oLQGhc2whAlhC3dSuR.log,2005251231,K949949E,MGN520,DG2KV333SEF3
2005260105_001914J.E0J.914J-15.WRO4PC.FMGN520.wOQMUOfZgkQK9iHJS5.log,2005260105,K001914E,MGN520,DM3FZ226SEE3
2006111130_950909J.00J.909J-22.FPC.FMGN520.UuqeGtw9xP6lLDUW9N.log,2006111130,K9509090,MGN520,DG7LW031SEE7
2006111612_950909J.00J.909J-22.FPC.FMGN520.hoDl3QSNPKhcs4oA2N.log,2006111612,K9509090,MGN520,DG7LW031SEE7
2006120638_006914J.E0J.914J-15.CZPC.FMGN520.qCgFUH2H21ieT641i9.log,2006120638,K006914E,MGN520,DM8KJ568SEC3
2006122226_006914J.E0J.914J-15.CZPC.FMGN520.nSHSp7klxjrQlVTcCu.log,2006122226,K006914E,MGN520,DM8KJ568SEC3
2006130919_006914J.E0J.914J-15.CZPC.FMGN520.Zd6DrMUsCjuEVBFwvn.log,2006130919,K006914E,MGN520,DM8KJ568SEC3
2006140457_007911J.E0J.911J-25.RDR2PC.FMGN520.QPX9r59TnXObXyfibv.log,2006140457,K007911E,MGN520,DN4AU351SED1
2006141722_007911J.E0J.911J-25.WCPC.FMGN520.dNQLkvQlPTplEjJspB.log,2006141722,K007911E,MGN520,DN4AU351SED1
2006160332_007911J.E0J.911J-25.WCPC.FMGN520.DQiH82Ze9fCoaLVbDE.log,2006160332,K007911E,MGN520,DN4AU351SED1
2006170539_007911J.E0J.911J-25.WCPC.FMGN520.TjakhXkmhmlGhfLheo.log,2006170539,K007911E,MGN520,DN4AU351SED1

推荐答案

添加 dtype 参数以在调用 read_csv <时将'ODFS_FILE_CREATE_DATETIME'作为dtype字符串导入/code>

Add dtype parameter to import 'ODFS_FILE_CREATE_DATETIME' as dtype string when you call read_csv

odfscsv_df = pd.read_csv(csv, header=None,
                              names=['ODFS_LOG_FILENAME', 'ODFS_FILE_CREATE_DATETIME', 'LOT', 'TESTER', 'WAFER_SCRIBE'],
                              dtype={'ODFS_FILE_CREATE_DATETIME': str})

m1 = odfscsv_df.isna().any(1)
s = odfscsv_df['ODFS_FILE_CREATE_DATETIME']
m2 = ~s.astype(str).str.isnumeric()
m3 = s.astype(str).str.len().ne(10)

[tuple(x) for x in odfscsv_df[m1 | m2 | m3].values]

这篇关于使用列表推导为两个不同的条件创建元组列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆