使用 pandas 数据框生成一系列图 [英] generate series of plots with pandas dataframe
问题描述
我必须生成一系列散点图(总共大约 100 个).
I have to generate a series of scatter plots (roughly 100 in total).
我创建了一个示例来说明问题.
I have created an example to illustrate the problem.
首先进行导入.
import pandas as pd
创建一个熊猫数据框.
# Create dataframe
data = {'name': ['Jason', 'Jason', 'Tina', 'Tina', 'Tina', 'Jason', 'Tina'],
'report_value': [4, 24, 31, 2, 3, 5, 10],
'coverage_id': ['m1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7']}
df = pd.DataFrame(data)
print(df)
输出:
coverage_id name report_value
0 m1 Jason 4
1 m2 Jason 24
2 m3 Tina 31
3 m4 Tina 2
4 m5 Tina 3
5 m6 Jason 5
6 m7 Tina 10
目标是在不使用 for 循环的情况下生成两个散点图.标题中应显示此人的姓名,Jason 或 Tina.report_value 应该在两个图中的 y 轴上,coverage_id(它是一个字符串)在 x 轴上.
The goal is generate two scatter plots without using a for-loop. The name of the person, Jason or Tina, should be displayed in the title. The report_value should be on the y-axis in both plots and the coverage_id (which is a string) on the x-axis.
我想我应该从:
df.groupby('name')
然后我需要将操作应用到每个组.
Then I need to apply the operation to every group.
这样我就有了按名称分组的数据框.我不知道如何继续并让 Python 为我制作这两个图.
This way I have the dataframe grouped by their names. I don't know how to proceed and get Python to make the two plots for me.
非常感谢您的帮助.
推荐答案
我认为你可以使用这个解决方案,但是首先必须将 string
列转换为数字,plot
和最后一组 xlabels
:
I think you can use this solution, but first is necessary convert string
column to numeric, plot
and last set xlabels
:
import matplotlib.pyplot as plt
u, i = np.unique(df.coverage_id, return_inverse=True)
df.coverage_id = i
groups = df.groupby('name')
# Plot
fig, ax = plt.subplots()
ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling
for name, group in groups:
ax.plot(group.coverage_id,
group.report_value,
marker='o',
linestyle='',
ms=12,
label=name)
ax.set(xticks=range(len(i)), xticklabels=u)
ax.legend()
plt.show()
另一个 seaborn
解决方案,带有 seaborn.pairplot
:
Another seaborn
solution with seaborn.pairplot
:
import seaborn as sns
u, i = np.unique(df.coverage_id, return_inverse=True)
df.coverage_id = i
g=sns.pairplot(x_vars=["coverage_id"], y_vars=["report_value"], data=df, hue="name", size=5)
g.set(xticklabels=u, xlim=(0, None))
这篇关于使用 pandas 数据框生成一系列图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!