如何在带有遮罩条件的pandas / python中合并三个不同的数据框? [英] How to merge three different dataframe in pandas/python with mask condition?

查看:155
本文介绍了如何在带有遮罩条件的pandas / python中合并三个不同的数据框?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试合并三个不同的大数据框(1400,000行),两个数据框是正常的,第三个数据框来自此 mask =(df ['a']。lt( 25)& df ['a']。gt(10))| df [’b’]。gt(0.2)| df [’c’]。gt(500)
df [mask]& df ['e']。eq(0)`,根据我下面的示例数据,

I am trying to merge three different large data frame(1400,000 rows), two dataframe are normal, and the third dataframe are from this mask = (df['a'].lt(25) & df['a'].gt(10)) | df['b'].gt(0.2) | df['c'].gt(500) df[mask] & df['e'].eq(0)`, accoring to my below sample data,

a      b        c       dt                   e   h i j k
35   0.1      234   2020/6/15 14:27:00       0   ........
1    0.1      554   2020/6/15 15:28:00       1   ........
2    0.2      654   2020/6/15 16:29:00       0   ........
23   0.4      2345  2020/6/15 17:26:00       0   ........
34   0.8      245   2020/6/15 18:25:00       0   ........
8    0.9      123   2020/6/15 18:26:00       0
7    0.1      22    2020/6/15 18:27:00       0
2    0.3      99    2020/6/15 18:28:00       0
219  0.2      17    2020/6/15 19:26:00       0

下面的代码将进入许多无用和重复的列,有没有办法合并三个不同的大数据?

Below code will get to many useless and duplicated columns, is there any way to merge three different large data?

import pandas as pd
from functools import reduce
df1 = pd.read_csv('test1.csv')
df2 = pd.read_csv('test2.csv')
df = pd.read_csv('test.csv', usecols = ['a', 'b', 'c', 'dt', 'e'])
mask = (df['a'].lt(25) & df['a'].gt(10)) | df['b'].gt(0.2) | df['c'].gt(500)
df['x'] = mask.astype(int)
dfs = [df1, df2, df]
df_full = reduce(lambda left,right: pd.merge(left,right, on=['id']), dfs)


推荐答案

是否可以添加有关数据框的某些信息?它们是否具有相同的列?还是每个框架都有不同的列和不同的行数?我不是在问数据本身,而是在问它的结构。

Could you add some information about the dataframes? Do they have the same columns? Or does each frame have different columns and different number of rows? I am not asking about the data itself but its structure.

此外,我对此不太确定,但请尝试包含'id',该列将要从第三个数据框读取,因为您似乎正在尝试合并该索引上的所有三个数据框。

Also, I'm not too sure about this, but try to include 'id' in the columns to be read from the third dataframe, because it looks like you are trying to merge all three dataframes on that index.

usecols = ['id', 'a', 'b', 'c', 'dt', 'e']

(编辑:我尝试不带该'id'进行合并,但对我不起作用。)

( I tried merging without that 'id' and it didn't work for me.)

import pandas as pd

df1 = pd.read_csv('test1.csv', delimiter=";")
df2 = pd.read_csv('test2.csv', delimiter=";")
df = pd.read_csv('test3.csv', delimiter=";", usecols = ['id', 'a', 'b', 'c', 'dt', 'e'])

mask = (df['a'].gt(10) & df['a'].lt(25)) | df['b'].gt(0.2) | df['c'].gt(500)
df['x'] = mask.astype(int)
dfs = [df1, df2, df]

df_merged = df1.merge(df2, on=['id'])
df_merged = df_merged.merge(df, on=['id'])

我尝试了自己的CSV文件:

I tried with my own CSV files:


  • test1.csv有一些列(id,a,b,c )

  • test2.csv具有不同的列(id,d,e,f)

  • test3.csv具有与您的问题相同的列: (id,a,b,c,dt,
    e)

这篇关于如何在带有遮罩条件的pandas / python中合并三个不同的数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆