如何使用不同的列将选定的列附加到df的pandas数据框中 [英] How to append selected columns to pandas dataframe from df with different columns
问题描述
我希望能够将df1,df2和df3附加到一个df_All中,但是由于每个数据帧都有不同的列.我该如何在for循环中执行此操作(我在for循环中必须执行其他操作)?
I want to be able to append df1 df2, df3 into one df_All , but since each of the dataframe has different column. How could I do this in for loop ( I have others stuff that i have to do in the for loop ) ?
import pandas as pd
import numpy as np
df1 = pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])])
df2 = pd.DataFrame.from_items([('B', [5, 6, 7]), ('A', [8, 9, 10])])
df3 = pd.DataFrame.from_items([('C', [5, 6, 7]), ('D', [8, 9, 10]), ('A',[1,2,3]), ('B',[4,5,7])])
list = ['df1','df2','df3']
df_All = pd.DataFrame()
for i in list:
# doing something else as well ---
df_All = df_All.append(i)
我希望我的df_All仅具有(A& B),在上面的循环中有没有办法解决此问题?像只追加这两列?
I want my df_All to only have ( A & B ) only, is there a way to this in loop above ? something like append only this two columns ?
推荐答案
如果我了解您想要的内容,则只需从df3
中选择列"A"和"B",然后使用
If I understand what you want then you need to select just columns 'A' and 'B' from df3
and then use pd.concat
:
In [35]:
df1 = pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])])
df2 = pd.DataFrame.from_items([('B', [5, 6, 7]), ('A', [8, 9, 10])])
df3 = pd.DataFrame.from_items([('C', [5, 6, 7]), ('D', [8, 9, 10]), ('A',[1,2,3]), ('B',[4,5,7])])
df_list = [df1,df2,df3[['A','B']]]
pd.concat(df_list, ignore_index=True)
Out[35]:
A B
0 1 4
1 2 5
2 3 6
3 8 5
4 9 6
5 10 7
6 1 4
7 2 5
8 3 7
请注意,在您的原始代码中,这是不好的做法:
Note that in your original code this is poor practice:
list = ['df1','df2','df3']
这掩盖了内置类型list
,即使它实际上是一个有效的var名称(如df_list
),您也创建了一个字符串列表而不是dfs列表.
This shadows the built in type list
plus even if it was actually a valid var name like df_list
you've created a list of strings and not a list of dfs.
如果要确定公共列,则可以使用列上的np.intersection
方法来确定:
If you want to determine the common columns then you can determine this using the np.intersection
method on the columns:
In [39]:
common_cols = df1.columns.intersection(df2.columns).intersection(df3.columns)
common_cols
Out[39]:
Index(['A', 'B'], dtype='object')
这篇关于如何使用不同的列将选定的列附加到df的pandas数据框中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!