pandas |将具有类似列表/数组的字段的json文件读取到布尔列 [英] pandas | Read json file with list/array-like fields to Boolean columns
问题描述
这是一个JSON字符串,其中包含一个对象列表,每个对象都嵌入了另一个列表.
Here is a JSON string that contains a list of objects with each having another list embedded.
[
{
"name": "Alice",
"hobbies": [
"volleyball",
"shopping",
"movies"
]
},
{
"name": "Bob",
"hobbies": [
"fishing",
"movies"
]
}
]
使用pandas.read_json()
可以将其转换为如下所示的DataFrame:
Using pandas.read_json()
this turns into a DataFrame like this:
name hobbies
--------------------------------------
1 Alice [volleyball, shopping, movies]
2 Bob [fishing, movies]
但是,我想将列表展平为这样的布尔列:
However, I would like to flatten the lists into Boolean columns like this:
name volleyball shopping movies fishing
----------------------------------------------------
1 Alice True True True False
2 Bob False False True True
即当列表包含值时,对应列中的字段将填充布尔值True
,否则将填充False
.
I.e. when the list contains a value, the field in the corresponding column is filled with a Boolean True
, otherwise with False
.
我也研究了pandas.io.json.json_normalize()
,但这似乎也不支持这个想法.是否有任何内置方式(Python3或pandas)来执行此操作?
I have also looked into pandas.io.json.json_normalize()
, but that does not seem support this idea either. Is there any built-in way (either Python3, or pandas) to do this?
(PS.我意识到,您可以在将整个列表加载到DataFrame中之前编写自己的代码以规范化"字典对象,但是我可能会对此进行重新发明,并且效率可能很低方式).
推荐答案
您可以使用 crosstab
,由astype
:
You can use crosstab
with cast to bool
by astype
:
df = pd.io.json.json_normalize(data, 'hobbies', ['name']).rename(columns={0:'hobby'})
print df
hobby name
0 volleyball Alice
1 shopping Alice
2 movies Alice
3 fishing Bob
4 movies Bob
print pd.crosstab(df.name, df.hobby).astype(bool)
hobby fishing movies shopping volleyball
name
Alice False True True True
Bob True True False False
这篇关于 pandas |将具有类似列表/数组的字段的json文件读取到布尔列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!