尝试合并2个数据框,但出现ValueError [英] Trying to merge 2 dataframes but get ValueError

查看:50
本文介绍了尝试合并2个数据框,但出现ValueError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的两个数据帧,分别保存在两个变量中:

These are my two dataframes saved in two variables:

> print(df.head())
>
          club_name  tr_jan  tr_dec  year
    0  ADO Den Haag    1368    1422  2010
    1  ADO Den Haag    1455    1477  2011
    2  ADO Den Haag    1461    1443  2012
    3  ADO Den Haag    1437    1383  2013
    4  ADO Den Haag    1386    1422  2014
> print(rankingdf.head())
>
           club_name  ranking  year
    0    ADO Den Haag    12    2010
    1    ADO Den Haag    13    2011
    2    ADO Den Haag    11    2012
    3    ADO Den Haag    14    2013
    4    ADO Den Haag    17    2014

我正在尝试使用以下代码合并这两个代码:

I'm trying to merge these two using this code:

new_df = df.merge(ranking_df, on=['club_name', 'year'], how='left')

添加了how ='left'的原因是,我的ranking_df中的数据点少于标准df中的数据点.

The how='left' is added because I have less datapoints in my ranking_df than in my standard df.

预期的行为是这样的:

> print(new_df.head()) 
> 

      club_name  tr_jan  tr_dec  year    ranking
0  ADO Den Haag    1368    1422  2010    12
1  ADO Den Haag    1455    1477  2011    13
2  ADO Den Haag    1461    1443  2012    11
3  ADO Den Haag    1437    1383  2013    14
4  ADO Den Haag    1386    1422  2014    17

但是我得到这个错误:

ValueError:您正在尝试合并object和int64列.如果 您希望继续进行操作,应该使用pd.concat

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

但是我不希望使用concat,因为我想合并树而不仅仅是添加它们.

But I do not wish to use concat since I want to merge the trees not just add them on.

我想到的另一种行为是,如果我将第一个df保存到.csv,然后将该.csv加载到数据帧中,则我的代码可以正常工作.

Another behaviour that's weird in my mind is that my code works if I save the first df to .csv and then load that .csv into a dataframe.

该代码:

df = pd.DataFrame(data_points, columns=['club_name', 'tr_jan', 'tr_dec', 'year'])
df.to_csv('preliminary.csv')

df = pd.read_csv('preliminary.csv', index_col=0)

ranking_df = pd.DataFrame(rankings, columns=['club_name', 'ranking', 'year'])

new_df = df.merge(ranking_df, on=['club_name', 'year'], how='left')

我认为这与index_col = 0参数有关.但是我不知道有没有保存它就修复它的想法,这没什么大不了,但是我不得不这样做是很烦人的.

I think that it has to do with the index_col=0 parameter. But I have no idea to fix it without having to save it, it doesn't matter much but is kind of an annoyance that I have to do that.

推荐答案

在您的一个数据框中,年份是一个字符串,而另一个则是int64 您可以先进行转换,然后再加入(例如df['year']=df['year'].astype(int)或RafaelC建议的df.year.astype(int))

In one of your dataframes the year is a string and the other it is an int64 you can convert it first and then join (e.g. df['year']=df['year'].astype(int) or as RafaelC suggested df.year.astype(int))

这篇关于尝试合并2个数据框,但出现ValueError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆