pandas DataFrame理解 [英] Pandas DataFrame Comprehensions
问题描述
示例代码:
我创建了一个名为df的数据框,其中包含一些瞳孔信息
I create a DataFrame called df with some pupil information
data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
'year': [2012, 2012, 2013, 2014, 2014],
'reports': [4, 24, 31, 2, 3]}
df = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa Cruz',
'Maricopa', 'Yuma'])
然后是另一个名为df_extra的DataFrame,它具有年份的字符串表示形式:
Then a second DataFrame called df_extra which has a string representation of the year:
extra_data = {'year': [2012, 2013, 2014],
'yr_string': ['twenty twelve','twenty thirteen','twenty fourteen']}
df_extra = pd.DataFrame(extra_data)
现在如何将值 yr_string
作为新列添加到df中,其中数字年份在一行代码中匹配?
Now how to add the values yr_string
as a new column to df where the numerical years match in one line of code?
我可以通过几个for循环轻松地做到这一点,但真的想知道是否可以在一行中做到这一点,类似于列表推导吗?
I can easily do this with a couple of for loops, but would really like to know if this is possible to do in one line, similar to list comprehensions?
我已经在这里搜索了问题,但是没有讨论基于一行条件从另一个DataFrame向现有DataFrame添加新列的问题.
I have searched questions already on here, but there is nothing discussing adding a new column to an existing DataFrame from another DataFrame based on a condition in one line.
推荐答案
You can merge
the dataframe on the year column.
df.merge(df_extra, how='left', on=['year'])
# name reports year yr_string
# 0 Jason 4 2012 twenty twelve
# 1 Molly 24 2012 twenty twelve
# 2 Tina 31 2013 twenty thirteen
# 3 Jake 2 2014 twenty fourteen
# 4 Amy 3 2014 twenty fourteen
基本上是这样说的:将 df_extra
中的数据拖入 df
中与 df
中 year
列匹配的任何位置".请注意,这将返回副本,而不是修改数据框.
Basically this says "pull the data from df_extra
into df
anywhere that the year
column matches in df
". Note this will return a copy, not modify the dataframe in place.
列表理解仍然是Python循环(在技术上可能并不完全准确).使用 pandas.merge()
方法,您可以利用Pandas用于对其数据帧进行操作的矢量化,优化的后端代码.应该更快.
List comprehensions are still Python loops (that might not be totally technically accurate). With the pandas.merge()
method, you get to take advantage of the vectorized, optimized backend code that Pandas uses to operate on its dataframes. Should be faster.
这篇关于 pandas DataFrame理解的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!