pandas DataFrame理解 [英] Pandas DataFrame Comprehensions

查看:55
本文介绍了 pandas DataFrame理解的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

示例代码:

我创建了一个名为df的数据框,其中包含一些瞳孔信息

I create a DataFrame called df with some pupil information

data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
        'year': [2012, 2012, 2013, 2014, 2014], 
        'reports': [4, 24, 31, 2, 3]}
df = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa Cruz', 
        'Maricopa', 'Yuma'])

然后是另一个名为df_extra的DataFrame,它具有年份的字符串表示形式:

Then a second DataFrame called df_extra which has a string representation of the year:

extra_data = {'year': [2012, 2013, 2014],
       'yr_string': ['twenty twelve','twenty thirteen','twenty fourteen']}
df_extra = pd.DataFrame(extra_data)

现在如何将值 yr_string 作为新列添加到df中,其中数字年份在一行代码中匹配?

Now how to add the values yr_string as a new column to df where the numerical years match in one line of code?

我可以通过几个for循环轻松地做到这一点,但真的想知道是否可以在一行中做到这一点,类似于列表推导吗?

I can easily do this with a couple of for loops, but would really like to know if this is possible to do in one line, similar to list comprehensions?

我已经在这里搜索了问题,但是没有讨论基于一行条件从另一个DataFrame向现有DataFrame添加新列的问题.

I have searched questions already on here, but there is nothing discussing adding a new column to an existing DataFrame from another DataFrame based on a condition in one line.

推荐答案

您可以

You can merge the dataframe on the year column.

df.merge(df_extra, how='left', on=['year'])
#     name  reports  year        yr_string
# 0  Jason        4  2012    twenty twelve
# 1  Molly       24  2012    twenty twelve
# 2   Tina       31  2013  twenty thirteen
# 3   Jake        2  2014  twenty fourteen
# 4    Amy        3  2014  twenty fourteen

基本上是这样说的:将 df_extra 中的数据拖入 df 中与 df year 列匹配的任何位置".请注意,这将返回副本,而不是修改数据框.

Basically this says "pull the data from df_extra into df anywhere that the year column matches in df". Note this will return a copy, not modify the dataframe in place.

列表理解仍然是Python循环(在技术上可能并不完全准确).使用 pandas.merge() 方法,您可以利用Pandas用于对其数据帧进行操作的矢量化,优化的后端代码.应该更快.

List comprehensions are still Python loops (that might not be totally technically accurate). With the pandas.merge() method, you get to take advantage of the vectorized, optimized backend code that Pandas uses to operate on its dataframes. Should be faster.

这篇关于 pandas DataFrame理解的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆