如果键是字符串/整数,则在字典中合并多个 pandas 数据帧 [英] Merge multiple pandas data frames in a dictionary if keys are strings/integers

查看:75
本文介绍了如果键是字符串/整数,则在字典中合并多个 pandas 数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用的数据如下:

The data that I'm using looks like this:

csv1 = pd.DataFrame({'D': [1-10, 2-10, 3-10, 4-10,...], #dates
...:                'C': [#, #, #, #,...]} #values

csv2 = pd.DataFrame({'D': [3-10, 4-10, 5-10, 6-10,...], #dates
...:                'C': [#, #, #, #,...]} #values

csv3 = pd.DataFrame({'D': [5-10, 6-10, 7-10, 8-10,...], #dates
...:                'C': [#, #, #, #,...]} #values
.
.
.
csv100 = pd.DataFrame({'D': [5-10, 6-10, 7-10, 8-10,...], #dates
...:                'C': [#, #, #, #,...]} #values

我想要一个这样的数据框:

I want a data frame like this:

df_merged = pd.DataFrame({'D': [1-10,2-10,3-10,4-10,5-10,6-10...] #dates
...:                  'C1': [#, #, #, #, #, #...]} #values
                      'C2': [#, #, #, #, #, #...]} #values
                      'C3': [#, #, #, #, #, #...]} #values
                      .
                      .
                      .
                      'C100': [#, #, #, #, #, #]} #values

我一直在尝试合并大约100个具有相同列但不同行(它们没有相同顺序)的多个数据框,我想通过"date"列来合并(合并具有相同日期的每一行).因为数据帧的数量很大,并且随着时间的推移而变化(今天我可以有110个,明天我可以有90个...),所以使用循环合并每个数据帧的方法太慢了.通过研究解决方案,我发现共识是使用字典.我将此解决方案应用到了我的代码中,但是出现了错误,而且我不知道如何解决.代码如下

I have been trying to merge multiple data frames, around 100, that have the same columns but different rows (they don’t have the same order), I would like to do it by the column 'date' (to merge every row with the same date). Because the amount of data frames is high, and changes over time (today I could have 110, tomorrow I could have 90...), the method of using a loop to merge each one of them is too slow. By researching for a solution, I found that the consensus is to use dictionaries. I applied this solution to my code but I got an error and I don’t know how to solve it. The code is the following

import pandas as pd
import subprocess
import os
from functools import reduce

path=r'C:\Users\ra\Desktop\Px\a' #Folder 'a' path

df = {} #Dictionary of data frames from csv files in Folder 'a'
x = [#vector that contains the name of the csv file as string]
i = 0
for j in range(len(x)):
    df['df%s' %j] = (pd.read_csv(os.path.join(path,r'%s.csv' % x[i]))) #Assigns a key to the data frame Ex.:'df1' (the key is a string and I think this is the problem)
    df['df%s' %j].rename(columns={'C': '%s' % x[i]}, inplace=True) #Renames the column 'C' of every data frame to the name of the file
    i += 1

df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['D'],how='outer'),df) #Merges every data frame to a single data frame 'df_merged' by column 'D' that represents the date.

问题出在最后一行,输出如下:

The problem is in the last line, the output is the following:

---> df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['D'],how='outer'),df)
.
.
.
ValueError: can not merge DataFrame with instance of type <class 'str'>

如果将键从字符串更改为整数(通过将向量x更改为简单数字'j'),则会得到以下输出:

If I change the key from string to integer (by changing the vector x to simple numbers 'j') I get the following output:

---> df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['D'],how='outer'),df)
.
.
.
ValueError: can not merge DataFrame with instance of type <class 'int'>

为使代码正常工作,我试图找到一种将字符串键转换为名称的方法.但是,显然,这是一种罪过.另外,根据@AnkitMalik所说,"reduce"方法不能与字典一起使用.如果字典中的键是字符串/整数,如何以pythonic方式将所有这些数据帧都合并到列"D"中?或者,如果数据帧的数量根据文件夹"a"中的csv文件的数量随时间变化,该如何动态创建数据帧列表?

To make the code work, I tried to find a way to convert the string keys to names. But, apparently, that is a sin. Also, according to @AnkitMalik the 'reduce' method can't be used with dictionaries. How can I merge all this data frames by the column 'D' in a pythonic way if the keys in the dictionary are strings/integers? Or, How can I make a dynamic list of data frames if their number changes over time depending on the amount of csv files in folder 'a'?

推荐答案

首先,我要感谢所有帮助我找到解决方案的人.我不得不说这是我第一次在stackoverflow中发布问题,经验非常好.我还要感谢@AnkitMalik和@NoticeMeSenpai,因为他们的努力帮助我找到了一个很好的解决方案.

First of all, I want to thank every one that helped me to find a solution. I have to say that this is my first time posting a question in stackoverflow and the experience has been very nice. I also want to thank @AnkitMalik and @NoticeMeSenpai because their effort helped me to find a very good solution.

我的问题是关于使用functools.reduce()合并dictionary {}中的数据帧.但是,正如@AnkitMalik指出的那样,这仅适用于lists []. @NoticeMeSenpai建议使用pandas.concat()来完成这项工作.下面的代码对我有用:

My question was about merging data frames in a dictionary {} by using functools.reduce(). But, as was pointed out by @AnkitMalik, this only works for lists []. @NoticeMeSenpai recomended the use of pandas.concat() in order to make this work. The code below is the one that works for me:

import pandas as pd
import subprocess
import os

path='C:\Users\ra\Desktop\Px\a'

df = [] #makes a list of data frames
x = [#vector that contains the name of the csv files as strings]
for j in x:
    df.append((pd.read_csv(os.path.join(path,r'%s.csv' % j))).set_index('D').rename(columns={'C':'%s' % j}), axis=1)) #appends every csv file in folder 'a' as a data frame in list 'df', sets the column 'D' as index and renames the column 'C' as the name of csv file.

df_concat = pd.concat(df, axis=1) #concats every data frame in the list 'df'
df_concat.to_csv(os.path.join(path,r'xxx.csv')) # saves the concatenated data frame in the 'xxx' csv file in folder 'a'.

这篇关于如果键是字符串/整数,则在字典中合并多个 pandas 数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆