如何用 pandas 数据框中的列(从第3列值)来标记折线图? [英] How to label line chart with column from pandas dataframe (from 3rd column values)?
问题描述
我有一个数据集已过滤为以下内容(样本数据):
Name Time l
1 1.129 1G-d
1 0.113 1G-a
1 3.374 1B-b
1 3.367 1B-c
1 3.374 1B-d
2 3.355 1B-e
2 3.361 1B-a
3 1.129 1G-a
在过滤数据框并将其转换为CSV文件后,我得到了以下数据:
# Assigns the new data frame to "df" with the data from only three columns
header = ['Names','Time','l']
df = pd.DataFrame(df_2, columns = header)
# Sorts the data frame by column "Names" as integers
df.Names = df.Names.astype(int)
df = df.sort_values(by=['Names'])
# Changes the data to match format after converting it to int
df.Time=df.Time.astype(int)
df.Time = df.Time/1000
csv_file = df.to_csv(index=False, columns=header, sep=" " )
现在,我正在尝试用标记为每个标签列数据/项目绘制线条.
我希望将列l
作为我的行名(标签)-每个列都作为新行,将Time
作为我的Y轴值,将Names
作为我的X轴值.
因此,在这种情况下,图形中将有7条带有以下标签的不同线:1G-d, 1G-a, 1B-b, 1B-c, 1B-d, 1B-e, 1B-a
.
到目前为止,我已经完成了以下附加设置,但是我不确定如何绘制线条.
plt.xlim(0, 60)
plt.ylim(0, 18)
plt.legend(loc='best')
plt.show()
我使用了sns.lineplot
,它带有色相,并且我不想为标签框命名.另外,在那种情况下,如果不添加样式的新列就无法拥有标记.
我也尝试了ply.plot,但是在那种情况下,我不确定如何增加行数.我只能给出只能创建一行的x和y值.
如果还有其他来源,请在下面告诉我.
谢谢
我想要的最终图形如下所示,但带有标记:
您可以对seaborn的lineplot
进行一些调整.由于您的样本还不够长,无法使用它来演示:
# Create data
np.random.seed(2019)
categories = ['1G-d', '1G-a', '1B-b', '1B-c', '1B-d', '1B-e', '1B-a']
df = pd.DataFrame({'Name':np.repeat(range(1,11), 10),
'Time':np.random.randn(100).cumsum(),
'l':np.random.choice(categories, 100)
})
# Plot
sns.lineplot(data=df, x='Name', y='Time', hue='l', style='l', dashes=False,
markers=True, ci=None, err_style=None)
# Temporarily removing limits based on sample data
#plt.xlim(0, 60)
#plt.ylim(0, 18)
# Remove seaborn legend title & set new title (if desired)
ax = plt.gca()
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:], title='New Title', loc='best')
plt.show()
- 要应用标记,必须指定一个
style
变量.这可以与hue
相同. - 您可能要删除
dashes
,ci
和err_style
- 要删除原始的图例标题,可以获取
handles
和labels
,然后重新添加图例而无需第一个手柄和标签.您还可以在此处指定位置,并根据需要设置新标题(或仅删除title=...
而不显示标题).
每个评论的编辑次数:
可以很容易地通过以下方法将数据过滤到仅一个级别类别的子集中:
categories = ['1G-d', '1G-a', '1B-b', '1B-c', '1B-d', '1B-e', '1B-a']
df = df.loc[df['l'].isin(categories)]
如果级别太多,
markers=True
将失败.如果您只对出于美学目的标记点感兴趣,只需将单个标记乘以您感兴趣的类别数即可(markers='o'*len(categories)
.您已经创建了该标记,可以将数据过滤到感兴趣的类别).. >
或者,您可以指定自定义词典以传递给markers
参数:
points = ['o', '*', 'v', '^']
mult = len(categories) // len(points) + (len(categories) % len(points) > 0)
markers = {key:value for (key, value)
in zip(categories, points * mult)}
这将返回类别-点组合的字典,在指定的标记点上循环,直到categories
中的每个项目都具有点样式.
I have a data set I filtered to the following (sample data):
Name Time l
1 1.129 1G-d
1 0.113 1G-a
1 3.374 1B-b
1 3.367 1B-c
1 3.374 1B-d
2 3.355 1B-e
2 3.361 1B-a
3 1.129 1G-a
I got this data after filtering the data frame and converting it to CSV file:
# Assigns the new data frame to "df" with the data from only three columns
header = ['Names','Time','l']
df = pd.DataFrame(df_2, columns = header)
# Sorts the data frame by column "Names" as integers
df.Names = df.Names.astype(int)
df = df.sort_values(by=['Names'])
# Changes the data to match format after converting it to int
df.Time=df.Time.astype(int)
df.Time = df.Time/1000
csv_file = df.to_csv(index=False, columns=header, sep=" " )
Now, I am trying to graph lines for each label column data/items with markers.
I want the column l
as my line names (labels) - each as a new line, Time
as my Y-axis values and Names
as my X-axis values.
So, in this case, I would have 7 different lines in the graph with these labels: 1G-d, 1G-a, 1B-b, 1B-c, 1B-d, 1B-e, 1B-a
.
I have done the following so far which is the additional settings, but I am not sure how to graph the lines.
plt.xlim(0, 60)
plt.ylim(0, 18)
plt.legend(loc='best')
plt.show()
I used sns.lineplot
which comes with hue and I do not want to have name for the label box. Also, in that case, I cannot have the markers without adding new column for style.
I also tried ply.plot but in that case, I am not sure how to have more lines. I can only give x and y values which create only one line.
If there's any other source, please let me know below.
Thanks
The final graph I want to have is like the following but with markers:
You can apply a few tweaks to seaborn's lineplot
. Using some created data since your sample isn't really long enough to demonstrate:
# Create data
np.random.seed(2019)
categories = ['1G-d', '1G-a', '1B-b', '1B-c', '1B-d', '1B-e', '1B-a']
df = pd.DataFrame({'Name':np.repeat(range(1,11), 10),
'Time':np.random.randn(100).cumsum(),
'l':np.random.choice(categories, 100)
})
# Plot
sns.lineplot(data=df, x='Name', y='Time', hue='l', style='l', dashes=False,
markers=True, ci=None, err_style=None)
# Temporarily removing limits based on sample data
#plt.xlim(0, 60)
#plt.ylim(0, 18)
# Remove seaborn legend title & set new title (if desired)
ax = plt.gca()
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:], title='New Title', loc='best')
plt.show()
- To apply markers, you have to specify a
style
variable. This can be the same ashue
. - You likely want to remove
dashes
,ci
, anderr_style
- To remove the seaborn legend title, you can get the
handles
andlabels
, then re-add the legend without the first handle and label. You can also specify the location here and set a new title if desired (or just removetitle=...
for no title).
Edits per comments:
Filtering your data to only a subset of level categories can be done fairly easily via:
categories = ['1G-d', '1G-a', '1B-b', '1B-c', '1B-d', '1B-e', '1B-a']
df = df.loc[df['l'].isin(categories)]
markers=True
will fail if there are too many levels. If you are only interested in marking points for aesthetic purposes, you can simply multiply a single marker by the number of categories you are interested in (which you have already created to filter your data to categories of interest): markers='o'*len(categories)
.
Alternatively, you can specify a custom dictionary to pass to the markers
argument:
points = ['o', '*', 'v', '^']
mult = len(categories) // len(points) + (len(categories) % len(points) > 0)
markers = {key:value for (key, value)
in zip(categories, points * mult)}
This will return a dictionary of category-point combinations, cycling over the marker points specified until each item in categories
has a point style.
这篇关于如何用 pandas 数据框中的列(从第3列值)来标记折线图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!