xlabel 和 ylabel 值未在 matplotlib 散点图中排序 [英] xlabel and ylabel values are not sorted in matplotlib scatterplot

查看:66
本文介绍了xlabel 和 ylabel 值未在 matplotlib 散点图中排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在互联网上进行了大量繁琐的搜索工作,而且似乎还无法弄清楚如何提出正确的问题来获得我想做什么的答案.

我正在尝试创建一个散点图,其中 市盈率 在 y 轴上,股息收益率 在 x 轴上.我将数据放入一个 CSV 文件中,然后将每一列作为单独的列表导入 Python.

我的散点图如下所示.我很困惑为什么 x 轴和 y 轴没有按数字排序.我认为我必须将列表中的元素转换为浮点数,然后在将其转换为散点图之前进行某种 的操作.

我能想到的另一个选项是能够在创建散点图的过程中对值进行排序.

这些都没有奏效,我已经走到了死胡同.我们将不胜感激,因为我只能描述我的问题,但似乎无法在搜索中提出正确的问题.

  import csv导入matplotlib.pyplot作为pltetf_​​data = csv.reader(open('xlv_xlu_combined_td.csv','r'))对于我,在etf_data.iterrows()中的行:符号.附加(行 [0])index.append(row [1])股息.追加(行 [2])pe.append(row [3])符号.pop(0)index.pop(0)股息.流行音乐(0)pe.pop(0)索引 = [i.split('%', 1)[0] for i in index]红利_收益 = [d.split('%', 1)[0] 用于分红中的 d]pe_ratio = [p.split('X',1)[0] for pe in pe]x =股息_收益[:5]y = pe_ratio[:5]plt.scatter(x, y, label='Healthcare P/E & Dividend', alpha=0.5)plt.xlabel('股息收益率')plt.ylabel('Pe 比率')plt.legend()plt.show()

xlv_xlu_combined_td.csv

 符号,索引,股利,peJNJ,10.11%,2.81%,263.00XUNH,7.27%,1.40%,21.93XPFE,6.48%,3.62%,10.19XMRK,4.96%,3.06%,104.92XABBV,4.43%,4.01%,23.86倍AMGN,3.86%,2.72%,60.93XMDT,3.50%,2.27%,38.10XABT,3.26%,1.78%,231.74X黄金,2.95%,2.93%,28.69倍BMY,2.72%,2.81%,97.81XTMO,2.55%,0.32%,36.98XLLY,2.49%,2.53%,81.83X

解决方案

  • 问题在于值是 string 类型,因此它们以列表中给定的顺序绘制,而不是以数字顺序绘制.
  • 这些值必须从末尾删除符号,然后将其转换为数字类型.

使用 csv 模块添加到现有代码

  • 鉴于现有代码,

    I have done tedious amounts of searching on the internet and it seems that I have not been able to figure out how to ask the right question to get the answer for what I want to do.

    I am trying to create a scatterplot with P/E ratio on the y-axis and Dividend Yield on the x-axis. I put the data into a CSV file and then imported each column into Python as individual lists.

    Here is how my scatterplot turns out below. I am confused why the x- and y- axes are not sorted numerically. I think I have to turn the elements within the list into floats and then do some sort of sort before turning it into a scatterplot.

    The other option I can think of is being able to sort the values in the process of creating the scatterplot.

    Neither of these have worked out and I have reached a dead end. Any help or pointing in the right direction would be much appreciated as I can only describe my problem, but don't seem to be able to be asking the right questions in my search.

    import csv
    import matplotlib.pyplot as plt
    
    etf_data = csv.reader(open('xlv_xlu_combined_td.csv', 'r'))
    
    for i, row in etf_data.iterrows():
        symbol.append(row[0])
        index.append(row[1])
        dividend.append(row[2])
        pe.append(row[3])
    
    symbol.pop(0)
    index.pop(0)
    dividend.pop(0)
    pe.pop(0)
    
    indexes = [i.split('%', 1)[0] for i in index]
    dividend_yield = [d.split('%', 1)[0] for d in dividend]
    pe_ratio = [p.split('X', 1)[0] for p in pe]
    
    x = dividend_yield[:5]
    y = pe_ratio[:5]
    
    plt.scatter(x, y, label='Healthcare P/E & Dividend', alpha=0.5)
    plt.xlabel('Dividend yield')
    plt.ylabel('Pe ratio')
    plt.legend()
    plt.show()
    

    xlv_xlu_combined_td.csv

    symbol,index,dividend,pe
    JNJ,10.11%,2.81%,263.00X
    UNH,7.27%,1.40%,21.93X
    PFE,6.48%,3.62%,10.19X
    MRK,4.96%,3.06%,104.92X
    ABBV,4.43%,4.01%,23.86X
    AMGN,3.86%,2.72%,60.93X
    MDT,3.50%,2.27%,38.10X
    ABT,3.26%,1.78%,231.74X
    GILD,2.95%,2.93%,28.69X
    BMY,2.72%,2.81%,97.81X
    TMO,2.55%,0.32%,36.98X
    LLY,2.49%,2.53%,81.83X
    

    解决方案

    • The issue is that the values are string type, so they are plotted in the order given in the list, not in numeric order.
    • The values must have the symbols removed from the end, and then converted to a numeric type.

    Add-on to existing code using csv module

    • Given the existing code, it would be easy to map() the values in the lists to a float type.

    indexes = [i.split('%', 1)[0] for i in index]
    dividend_yield = [d.split('%', 1)[0] for d in dividend]
    pe_ratio = [p.split('X', 1)[0] for p in pe]
    
    # add mapping values to floats after removing the symbols from the values
    indexes = list(map(float, indexes))
    dividend_yield = list(map(float, dividend_yield))
    pe_ratio = list(map(float, pe_ratio))
    
    # plot
    x = dividend_yield[:5]
    y = pe_ratio[:5]
    
    plt.scatter(x, y, label='Healthcare P/E & Dividend', alpha=0.5)
    plt.xlabel('Dividend yield')
    plt.ylabel('Pe ratio')
    plt.legend(bbox_to_anchor=(1, 1), loc='upper left')
    plt.show()
    

    Using pandas

    • Remove the symbol from the end of the strings in the columns with col.str[:-1]
    • Convert the columns to float type with .astype(float)
    • Using pandas v1.2.4 and matplotlib v3.3.4
    • This option reduces the required code from 23 lines to 4 lines.

    import pandas as pd
    
    # read the file
    df = pd.read_csv('xlv_xlu_combined_td.csv')
    
    # remove the symbols from the end of the number and set the columns to float type
    df.iloc[:, 1:] = df.iloc[:, 1:].apply(lambda col: col.str[:-1]).astype(float)
    
    # plot the first five rows of the two columns
    ax = df.iloc[:5, 2:].plot(x='dividend', y='pe', kind='scatter', alpha=0.5,
                              ylabel='Dividend yield', xlabel='Pe ratio',
                              label='Healthcare P/E & Dividend')
    ax.legend(bbox_to_anchor=(1, 1), loc='upper left')
    

    Plot output of both implementations

    • Note the numbers are now ordered correctly.

    这篇关于xlabel 和 ylabel 值未在 matplotlib 散点图中排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆