如何在带有遮罩条件的pandas / python中合并三个不同的数据框? [英] How to merge three different dataframe in pandas/python with mask condition?
问题描述
我正在尝试合并三个不同的大数据框(1400,000行),两个数据框是正常的,第三个数据框来自此 mask =(df ['a']。lt( 25)& df ['a']。gt(10))| df [’b’]。gt(0.2)| df [’c’]。gt(500)
df [mask]& df ['e']。eq(0)`,根据我下面的示例数据,
I am trying to merge three different large data frame(1400,000 rows), two dataframe are normal, and the third dataframe are from this mask = (df['a'].lt(25) & df['a'].gt(10)) | df['b'].gt(0.2) | df['c'].gt(500)
df[mask] & df['e'].eq(0)`, accoring to my below sample data,
a b c dt e h i j k
35 0.1 234 2020/6/15 14:27:00 0 ........
1 0.1 554 2020/6/15 15:28:00 1 ........
2 0.2 654 2020/6/15 16:29:00 0 ........
23 0.4 2345 2020/6/15 17:26:00 0 ........
34 0.8 245 2020/6/15 18:25:00 0 ........
8 0.9 123 2020/6/15 18:26:00 0
7 0.1 22 2020/6/15 18:27:00 0
2 0.3 99 2020/6/15 18:28:00 0
219 0.2 17 2020/6/15 19:26:00 0
下面的代码将进入许多无用和重复的列,有没有办法合并三个不同的大数据?
Below code will get to many useless and duplicated columns, is there any way to merge three different large data?
import pandas as pd
from functools import reduce
df1 = pd.read_csv('test1.csv')
df2 = pd.read_csv('test2.csv')
df = pd.read_csv('test.csv', usecols = ['a', 'b', 'c', 'dt', 'e'])
mask = (df['a'].lt(25) & df['a'].gt(10)) | df['b'].gt(0.2) | df['c'].gt(500)
df['x'] = mask.astype(int)
dfs = [df1, df2, df]
df_full = reduce(lambda left,right: pd.merge(left,right, on=['id']), dfs)
推荐答案
是否可以添加有关数据框的某些信息?它们是否具有相同的列?还是每个框架都有不同的列和不同的行数?我不是在问数据本身,而是在问它的结构。
Could you add some information about the dataframes? Do they have the same columns? Or does each frame have different columns and different number of rows? I am not asking about the data itself but its structure.
此外,我对此不太确定,但请尝试包含'id'
,该列将要从第三个数据框读取,因为您似乎正在尝试合并该索引上的所有三个数据框。
Also, I'm not too sure about this, but try to include 'id'
in the columns to be read from the third dataframe, because it looks like you are trying to merge all three dataframes on that index.
usecols = ['id', 'a', 'b', 'c', 'dt', 'e']
(编辑:我尝试不带该'id'进行合并,但对我不起作用。)
( I tried merging without that 'id' and it didn't work for me.)
import pandas as pd
df1 = pd.read_csv('test1.csv', delimiter=";")
df2 = pd.read_csv('test2.csv', delimiter=";")
df = pd.read_csv('test3.csv', delimiter=";", usecols = ['id', 'a', 'b', 'c', 'dt', 'e'])
mask = (df['a'].gt(10) & df['a'].lt(25)) | df['b'].gt(0.2) | df['c'].gt(500)
df['x'] = mask.astype(int)
dfs = [df1, df2, df]
df_merged = df1.merge(df2, on=['id'])
df_merged = df_merged.merge(df, on=['id'])
我尝试了自己的CSV文件:
I tried with my own CSV files:
- test1.csv有一些列(id,a,b,c )
- test2.csv具有不同的列(id,d,e,f)
- test3.csv具有与您的问题相同的列: (id,a,b,c,dt,
e)
这篇关于如何在带有遮罩条件的pandas / python中合并三个不同的数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!