xlabel 和 ylabel 值未在 matplotlib 散点图中排序 [英] xlabel and ylabel values are not sorted in matplotlib scatterplot
问题描述
我已经在互联网上进行了大量繁琐的搜索工作,而且似乎还无法弄清楚如何提出正确的问题来获得我想做什么的答案.
我正在尝试创建一个散点图,其中 市盈率 在 y 轴上,股息收益率 在 x 轴上.我将数据放入一个 CSV 文件中,然后将每一列作为单独的列表导入 Python.
我的散点图如下所示.我很困惑为什么 x 轴和 y 轴没有按数字排序.我认为我必须将列表中的元素转换为浮点数,然后在将其转换为散点图之前进行某种 的操作.
我能想到的另一个选项是能够在创建散点图的过程中对值进行排序.
这些都没有奏效,我已经走到了死胡同.我们将不胜感激,因为我只能描述我的问题,但似乎无法在搜索中提出正确的问题.
import csv导入matplotlib.pyplot作为pltetf_data = csv.reader(open('xlv_xlu_combined_td.csv','r'))对于我,在etf_data.iterrows()中的行:符号.附加(行 [0])index.append(row [1])股息.追加(行 [2])pe.append(row [3])符号.pop(0)index.pop(0)股息.流行音乐(0)pe.pop(0)索引 = [i.split('%', 1)[0] for i in index]红利_收益 = [d.split('%', 1)[0] 用于分红中的 d]pe_ratio = [p.split('X',1)[0] for pe in pe]x =股息_收益[:5]y = pe_ratio[:5]plt.scatter(x, y, label='Healthcare P/E & Dividend', alpha=0.5)plt.xlabel('股息收益率')plt.ylabel('Pe 比率')plt.legend()plt.show()
xlv_xlu_combined_td.csv
符号,索引,股利,peJNJ,10.11%,2.81%,263.00XUNH,7.27%,1.40%,21.93XPFE,6.48%,3.62%,10.19XMRK,4.96%,3.06%,104.92XABBV,4.43%,4.01%,23.86倍AMGN,3.86%,2.72%,60.93XMDT,3.50%,2.27%,38.10XABT,3.26%,1.78%,231.74X黄金,2.95%,2.93%,28.69倍BMY,2.72%,2.81%,97.81XTMO,2.55%,0.32%,36.98XLLY,2.49%,2.53%,81.83X
- 问题在于值是
string
类型,因此它们以列表中给定的顺序绘制,而不是以数字顺序绘制. - 这些值必须从末尾删除符号,然后将其转换为数字类型.
使用 csv
模块添加到现有代码
- 鉴于现有代码,
I have done tedious amounts of searching on the internet and it seems that I have not been able to figure out how to ask the right question to get the answer for what I want to do.
I am trying to create a scatterplot with P/E ratio on the y-axis and Dividend Yield on the x-axis. I put the data into a CSV file and then imported each column into Python as individual lists.
Here is how my scatterplot turns out below. I am confused why the x- and y- axes are not sorted numerically. I think I have to turn the elements within the list into floats and then do some sort of sort before turning it into a scatterplot.
The other option I can think of is being able to sort the values in the process of creating the scatterplot.
Neither of these have worked out and I have reached a dead end. Any help or pointing in the right direction would be much appreciated as I can only describe my problem, but don't seem to be able to be asking the right questions in my search.
import csv import matplotlib.pyplot as plt etf_data = csv.reader(open('xlv_xlu_combined_td.csv', 'r')) for i, row in etf_data.iterrows(): symbol.append(row[0]) index.append(row[1]) dividend.append(row[2]) pe.append(row[3]) symbol.pop(0) index.pop(0) dividend.pop(0) pe.pop(0) indexes = [i.split('%', 1)[0] for i in index] dividend_yield = [d.split('%', 1)[0] for d in dividend] pe_ratio = [p.split('X', 1)[0] for p in pe] x = dividend_yield[:5] y = pe_ratio[:5] plt.scatter(x, y, label='Healthcare P/E & Dividend', alpha=0.5) plt.xlabel('Dividend yield') plt.ylabel('Pe ratio') plt.legend() plt.show()
xlv_xlu_combined_td.csv
symbol,index,dividend,pe JNJ,10.11%,2.81%,263.00X UNH,7.27%,1.40%,21.93X PFE,6.48%,3.62%,10.19X MRK,4.96%,3.06%,104.92X ABBV,4.43%,4.01%,23.86X AMGN,3.86%,2.72%,60.93X MDT,3.50%,2.27%,38.10X ABT,3.26%,1.78%,231.74X GILD,2.95%,2.93%,28.69X BMY,2.72%,2.81%,97.81X TMO,2.55%,0.32%,36.98X LLY,2.49%,2.53%,81.83X
解决方案- The issue is that the values are
string
type, so they are plotted in the order given in the list, not in numeric order. - The values must have the symbols removed from the end, and then converted to a numeric type.
Add-on to existing code using
csv
module- Given the existing code, it would be easy to
map()
the values in the lists to afloat
type.
indexes = [i.split('%', 1)[0] for i in index] dividend_yield = [d.split('%', 1)[0] for d in dividend] pe_ratio = [p.split('X', 1)[0] for p in pe] # add mapping values to floats after removing the symbols from the values indexes = list(map(float, indexes)) dividend_yield = list(map(float, dividend_yield)) pe_ratio = list(map(float, pe_ratio)) # plot x = dividend_yield[:5] y = pe_ratio[:5] plt.scatter(x, y, label='Healthcare P/E & Dividend', alpha=0.5) plt.xlabel('Dividend yield') plt.ylabel('Pe ratio') plt.legend(bbox_to_anchor=(1, 1), loc='upper left') plt.show()
Using
pandas
- Remove the symbol from the end of the strings in the columns with
col.str[:-1]
- Convert the columns to
float
type with.astype(float)
- Using
pandas v1.2.4
andmatplotlib v3.3.4
- This option reduces the required code from 23 lines to 4 lines.
import pandas as pd # read the file df = pd.read_csv('xlv_xlu_combined_td.csv') # remove the symbols from the end of the number and set the columns to float type df.iloc[:, 1:] = df.iloc[:, 1:].apply(lambda col: col.str[:-1]).astype(float) # plot the first five rows of the two columns ax = df.iloc[:5, 2:].plot(x='dividend', y='pe', kind='scatter', alpha=0.5, ylabel='Dividend yield', xlabel='Pe ratio', label='Healthcare P/E & Dividend') ax.legend(bbox_to_anchor=(1, 1), loc='upper left')
Plot output of both implementations
- Note the numbers are now ordered correctly.
这篇关于xlabel 和 ylabel 值未在 matplotlib 散点图中排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- The issue is that the values are