Pandas:合并两个数据框时控制新列名? [英] Pandas: control new column names when merging two dataframes?
问题描述
我想将两个 Pandas 数据框合并在一起并控制新列值的名称.
I would like to merge two Pandas dataframes together and control the names of the new column values.
我最初从 CSV 文件创建了数据框.原始 CSV 文件如下所示:
I originally created the dataframes from CSV files. The original CSV files looked like this:
# presents.csv
org,name,items,spend...
12A,Clerkenwell,151,435,...
12B,Liverpool Street,37,212,...
...
# trees.csv
org,name,items,spend...
12A,Clerkenwell,0,0,...
12B,Liverpool Street,2,92,...
...
现在我有两个数据框:
df_presents = pd.read_csv(StringIO(presents_txt))
df_trees = pd.read_csv(StringIO(trees_txt))
我想将它们合并在一起以获得最终数据框,加入 org
和 name
值,然后使用适当的前缀为所有其他列添加前缀.
I want to merge them together to get a final data frame, joining on the org
and name
values, and then prefixing all other columns with an appropriate prefix.
org,name,presents_items,presents_spend,trees_items,trees_spend...
12A,Clerkenwell,151,435,0,0,...
12B,Liverpool Street,37,212,2,92,...
我一直在阅读有关合并和加入的文档.这似乎正确合并并导致正确数量的列:
I've been reading the documentation on merging and joining. This seems to merge correctly and result in the right number of columns:
ad = pd.DataFrame.merge(df_presents, df_trees,
on=['practice', 'name'],
how='outer')
但是然后执行 print list(aggregate_data.columns.values)
会显示以下列:
But then doing print list(aggregate_data.columns.values)
shows me the following columns:
[org', u'name', u'spend_x', u'spend_y', u'items_x', u'items_y'...]
如何将 spend_x
重命名为 presents_spend
等?
How can I rename spend_x
to be presents_spend
, etc?
推荐答案
合并函数中的 suffixes
选项就是这样做的.默认是后缀=('_x', '_y')
.
The suffixes
option in the merge function does this. The defaults are suffixes=('_x', '_y')
.
通常,重命名列可以使用 重命名方法.
In general, renaming columns can be done with the rename method.
这篇关于Pandas:合并两个数据框时控制新列名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!