旋转包含字符串的Pandas数据框-'没有要聚合的数字类型'错误 [英] Pivoting a Pandas Dataframe containing strings - 'No numeric types to aggregate' error
问题描述
关于此错误,有很多问题,但是环顾四周之后,我仍然无法找到解决方案的想法. 我正在尝试使用字符串旋转数据框架,以使一些行数据变为列,但到目前为止还没有解决.
There is a good number of questions about this error, but after looking around I'm still not able to find/wrap my mind around a solution yet. I'm trying to pivot a data frame with strings, to get some row data to become columns, but not working out so far.
我的df形状
<class 'pandas.core.frame.DataFrame'>
Int64Index: 515932 entries, 0 to 515931
Data columns (total 5 columns):
id 515932 non-null object
cc_contact_id 515932 non-null object
Network_Name 515932 non-null object
question 515932 non-null object
response_answer 515932 non-null object
dtypes: object(5)
memory usage: 23.6+ MB
样本格式
id contact_id question response_answer
16 137519 2206 State Ca
17 137520 2206 State Ca
18 137521 2206 State Ca
19 137522 2206 State Ca
20 137523 2208 City Lancaster
21 137524 2208 City Lancaster
22 137525 2208 City Lancaster
23 137526 2208 City Lancaster
24 137527 2208 Trip_End Location Home
25 137528 2208 Trip_End Location Home
26 137529 2208 Trip_End Location Home
27 137530 2208 Trip_End Location Home
我想了解的内容
id contact_id State City Trip_End Location
16 137519 2206 Ca None None None
20 137523 2208 None Lancaster None None
24 137527 2208 None None None Home
etc. etc.
question 值成为列,其中 response_answer 在其对应列中,并保留ID
Where the question values become the columns, with the response_answer being in it's corresponding column, and retaining the ids
我尝试过的
unified_df = pd.DataFrame(unified_data, columns=target_table_headers, dtype=object)
pivot_table = unified_df.pivot_table('response_answer',['id','cc_contact_id'],'question')
# OR
pivot_table = unified_df.pivot_table('response_answer','question')
DataError:没有要聚合的数字类型
用字符串值旋转数据框的方法是什么?
What is the way to pivot a data frame with string values?
推荐答案
pivot_table
中的默认aggfunc
是np.sum
,它不知道如何处理字符串,您也没有指出索引是什么应该是正确的.尝试类似的东西:
The default aggfunc
in pivot_table
is np.sum
and it doesn't know what to do with strings and you haven't indicated what the index should be properly. Trying something like:
pivot_table = unified_df.pivot_table(index=['id', 'contact_id'],
columns='question',
values='response_answer',
aggfunc=lambda x: ' '.join(x))
这显式地为每个id, contact_id
对设置一行,并在question
上旋转response_answer
值的集合. aggfunc
只是确保如果您在原始数据中对同一问题有多个答案,我们只需将它们与空格连接在一起即可. pivot_table
的语法可能因您的熊猫版本而异.
This explicitly sets one row per id, contact_id
pair and pivots the set of response_answer
values on question
. The aggfunc
just assures that if you have multiple answers to the same question in the raw data that we just concatenate them together with spaces. The syntax of pivot_table
might vary depending on your pandas version.
这是一个简单的例子:
In [24]: import pandas as pd
In [25]: import random
In [26]: df = pd.DataFrame({'id':[100*random.randint(10, 50) for _ in range(100)], 'question': [str(random.randint(0,3)) for _ in range(100)], 'response': [str(random.randint(100,120)) for _ in range(100)]})
In [27]: df.head()
Out[27]:
id question response
0 3100 1 116
1 4500 2 113
2 5000 1 120
3 3900 2 103
4 4300 0 117
In [28]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 0 to 99
Data columns (total 3 columns):
id 100 non-null int64
question 100 non-null object
response 100 non-null object
dtypes: int64(1), object(2)
memory usage: 3.1+ KB
In [29]: df.pivot_table(index='id', columns='question', values='response', aggfunc=lambda x: ' '.join(x)).head()
Out[29]:
question 0 1 2 3
id
1000 110 120 NaN 100 NaN
1100 NaN 106 108 104 NaN
1200 104 113 119 NaN 101
1300 102 NaN 116 108 120
1400 NaN NaN 116 NaN
这篇关于旋转包含字符串的Pandas数据框-'没有要聚合的数字类型'错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!