如何基于来自另一个数据框的条件创建一个新的数据框 [英] How to create a new data frame based on conditions from another data frame
问题描述
只是进入Python,所以希望我在这里不问一个愚蠢的问题...
Just getting into Python, so hopefully I'm not asking a stupid question here...
所以我有一个名为"df_complete"的熊猫数据框,假设它有100行,并且包含名为"type","writer","status","col a","col c"的列. /更新一个名为"temp_df"的新数据框,并根据条件使用"df_complete"值进行创建.
So I have a pandas dataframe named "df_complete' with let's say 100 rows, and containing columns named: "type", "writer", "status", 'col a', 'col c'. I want to create/update a new dataframe named "temp_df" and create it based on conditions using "df_complete" values.
temp_df = pandas.DataFrame()
if ((df_complete['type'] == 'NDD') & (df_complete['writer'] == 'Mary') & (df_complete['status'] != '7')):
temp_df['col A'] = df_complete['col a']
temp_df['col B'] = 'good'
temp_df['col C'] = df_complete['col c']
但是,当我这样做时,出现以下错误消息:
However, when I do this, I got the following error message:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I read this thread and changed my "and" to "&": Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
I also read this thread here to put everything in parenthesis: comparing dtyped [float64] array with a scalar of type [bool] in Pandas DataFrame
但是错误仍然存在.是什么原因造成的?以及我该如何解决?
But the error is still present. What is causing this? and how can I fix it?
**后续问题** 另外,如何获取满足条件的行的索引值?
** follow up question ** Also, how can I obtain the index values of those rows that met the condition?
推荐答案
我认为您需要 boolean indexing
与示例:
df_complete = pd.DataFrame({'type': ['NDD','NDD','NT'],
'writer':['Mary','Mary','John'],
'status':['4','5','6'],
'col a': [1,3,5],
'col b': [5,3,6],
'col c': [7,4,3]}, index=[3,4,5])
print (df_complete)
col a col b col c status type writer
3 1 5 7 4 NDD Mary
4 3 3 4 5 NDD Mary
5 5 6 3 6 NT John
temp_df = df_complete.ix[(df_complete['type'] == 'NDD') &
(df_complete['writer'] == 'Mary') &
(df_complete['status'] != '7'), ['col a','col c']]
print (temp_df)
col a col c
3 1 7
4 3 4
temp_df = temp_df.rename(columns={'col a':'col A','col c':'col C'})
#add new column
temp_df['col B'] = 'good'
#reorder columns
temp_df = temp_df[['col A','col B','col C']]
print (temp_df)
col A col B col C
3 1 good 7
4 3 good 4
这篇关于如何基于来自另一个数据框的条件创建一个新的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!