如何为 Pandas Dataframe 非规范化 YAML? [英] How to denormalize YAML for Pandas Dataframe?
问题描述
我正在尝试将 YAML 文件中的数据导入 Pandas DataFrame.以下面的例子data.yml
:
I am trying to get data from a YAML file into a Pandas DataFrame. Take the following example data.yml
:
---
- doc: "Book1"
reviews:
- reviewer: "Paul"
stars: "5"
- reviewer: "Sam"
stars: "2"
- doc: "Book2"
reviews:
- reviewer: "John"
stars: "4"
- reviewer: "Sam"
stars: "3"
- reviewer: "Pete"
stars: "2"
...
所需的 DataFrame 如下所示:
The desired DataFrame would look like this:
doc reviews.reviewer reviews.stars
0 Book1 Paul 5
1 Book1 Sam 2
2 Book2 John 4
3 Book2 Sam 3
4 Book2 Pete 2
我尝试以不同的方式将 YAML 数据提供给 Pandas(例如 with open('data.yml') as f: data = pd.DataFrame(yaml.load(f))
),但单元格总是包含嵌套的字典.这个解决方案适用于一般的 JSON 数据,但它的代码相当多,似乎可能存在更简单的 YAML 解决方案.
I've tried feeding the YAML data to Pandas different ways (like with open('data.yml') as f: data = pd.DataFrame(yaml.load(f))
), but the cells always contain the nested dicts. This solution works for general JSON data, but it's quite a bit of code and it seems like a simpler solution for YAML might exist.
是否有一种内置的或 Pythonic 的方式来对 YAML 进行非规范化以这种方式转换为 Pandas 数据帧?
Is there a built-in or Pythonic way to denormalize YAML for conversion to a Pandas Dataframe in this way?
推荐答案
你应该使用 json_normalize
在 YAML 加载后扁平化字典:
You should use json_normalize
to flatten the dictionary after YAML loads:
pd.io.json.json_normalize(yaml.load(f), 'reviews', 'doc')
reviewer stars doc
0 Paul 5 Book1
1 Sam 2 Book1
2 John 4 Book2
3 Sam 3 Book2
4 Pete 2 Book2
这篇关于如何为 Pandas Dataframe 非规范化 YAML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!