我可以在不转换相关数据的情况下模仿matplotlib中轴的对数刻度吗? [英] Can I mimic a log scale of an axis in matplotlib without transforming the associated data?

查看:102
本文介绍了我可以在不转换相关数据的情况下模仿matplotlib中轴的对数刻度吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试显示 Zipf图,该图通常显示在对数-对数刻度.

I am trying to display a Zipf plot, which is typically displayed on a log-log scale.

我正在使用一个库,该库给出线性标度的等级和对数标度的频率.我有以下代码可以相当正确地绘制我的数据:

I'm using a library which gives rank in linear scale and frequencies in log scale. I have the following code which plots my data fairly correctly:

ranks = [3541, 60219, 172644, 108926, 733215, 1297533, 1297534, 1297535]
# These frequencies are already log-scale
freqs = [-10.932271003723145, -15.213129043579102, -17.091760635375977, -16.27560806274414, 
        -19.482173919677734, -19.502029418945312, -19.502029418945312, -19.502029418945312]

data = {
    'ranks': ranks, 
    'freqs': freqs,
}

df = pd.DataFrame(data=data)

_, ax = plt.subplots(figsize=(7, 7))
ax.set(xscale="log", yscale="linear")
ax.set_title("Zipf plot")
sns.regplot("ranks", "freqs", data=df, ax=ax, fit_reg=False)
ax.set_xlabel("Frequency rank of token")
ax.set_ylabel("Absolute frequency of token")
ax.grid(True, which="both")
plt.show()

结果图为:

该图看起来不错,但是y标签很奇怪.我也希望以日志增量显示它.我当前的解决方法是将freqs列表中每个元素的功效提高10.即

The plot looks good, but the y-label is weird. I'd like it to be displayed in log-increments as well. My current workaround is to raise 10 to the power of each element in the freqs list; i.e.,

freqs = [10**freq for freq in freqs]
# ...

,然后将ax.set中的yscale更改为log;即

and change the yscale in ax.set to log; i.e.,

_, ax = plt.subplots(figsize=(7, 7))
ax.set(xscale="log", yscale="log")
ax.set_title("Zipf plot")
# ...

这给了我预期的图(如下),但是它需要对数据进行转换,这是a)相对昂贵,b)冗余,c)有损的.

This gives me the expected plot (below), but it requires a transform of the data which is a) relatively expensive, b) redundant, c) lossy.

有没有一种方法可以模仿matplotlib图中的轴的对数刻度而无需转换数据?

Is there a way to mimic the log scale of the axes in a matplotlib plot without transforming the data?

推荐答案

首先发表评论:就我个人而言,我更喜欢重新缩放数据的方法,因为它使一切变得更容易,但会消耗更多的内存/cpu.时间和准确无所谓

现在要提出的问题是如何在线性轴上模拟对数刻度

这不容易.将轴设置为对数刻度会在后台发生很大变化,并且需要模仿所有这些.

This is not easy. Setting the axes to log scale changes a lot in the background and one needs to mimic all of that.

  • 最简单的部分是使用matplotlib.ticker.MultipleLocator()
  • 将主要刻度线频率设置为1
  • 在看似对数的位置创建次要刻度线比较困难.我能想到的最好的解决方案是使用matplotlib.ticker.FixedLocator()
  • 手动设置它们
  • 最后,我们需要更改刻度线以代表实际数字,这意味着它们应看起来像10 ^(-x)而不是-x.我知道这里有两个选择:
    • 使用FuncFormatter以科学格式设置值10 ** x.
    • 使用FuncFormatter以Latex格式设置值10 ^ x.这看起来好多了,但是与其余的情节形成了对比.
    • The easy part is to set the major tickmark frequency to 1 by using matplotlib.ticker.MultipleLocator()
    • Creating the minor tickmarks at positions which look logarithmic is harder. The best solution I could come up with is to set them manually using the matplotlib.ticker.FixedLocator()
    • Last we need to change the tickmarks to represent the actual numbers, meaning that they should look like 10^(-x) instead of -x. I am aware of two options here:
      • Using a FuncFormatter that sets the values 10**x in scientific format.
      • Using a FuncFormatter that sets the values 10^x in Latex format. This looks much nicer but contrasts to the rest of the plot.

      对于最后一点,我不知道有任何更好的解决方案,但也许有人可以.

      I do not know any better solution for that last point, but maybe someone else does.

      这是代码及其外观.

      import matplotlib.pyplot as plt
      import seaborn as sns
      import pandas as pd
      import numpy as np
      from matplotlib.ticker import MultipleLocator, FixedLocator, FuncFormatter
      
      ###### Locators for Y-axis
      # set tickmarks at multiples of 1.
      majorLocator = MultipleLocator(1.)
      # create custom minor ticklabels at logarithmic positions
      ra = np.array([ [n+(1.-np.log10(i))]  for n in xrange(10,20) for i in [2,3,4,5,6,7,8,9][::-1]]).flatten()*-1.
      minorLocator = FixedLocator(ra)
      ###### Formatter for Y-axis (chose any of the following two)
      # show labels as powers of 10 (looks ugly)
      majorFormatter= FuncFormatter(lambda x,p: "{:.1e}".format(10**x) ) 
      # or using MathText (looks nice, but not conform to the rest of the layout)
      majorFormatter= FuncFormatter(lambda x,p: r"$10^{"+"{x:d}".format(x=int(x))+r"}$" ) 
      
      ranks = [3541, 60219, 172644, 108926, 733215, 1297533, 1297534, 1297535]
      # These frequencies are already log-scale
      freqs = [-10.932271003723145, -15.213129043579102, -17.091760635375977, -16.27560806274414, 
              -19.482173919677734, -19.502029418945312, -19.502029418945312, -19.502029418945312]
      
      data = {
          'ranks': ranks, 
          'freqs': freqs,
      }
      
      df = pd.DataFrame(data=data)
      
      _, ax = plt.subplots(figsize=(6, 6))
      ax.set(xscale="log", yscale="linear")
      ax.set_title("Zipf plot")
      
      sns.regplot("ranks", "freqs", data=df, ax=ax, fit_reg=False)
      
      # Set the locators
      ax.yaxis.set_major_locator(majorLocator)
      ax.yaxis.set_minor_locator(minorLocator)
      # Set formatter if you like to have the ticklabels consistently in power notation
      ax.yaxis.set_major_formatter(majorFormatter)
      
      ax.set_xlabel("Frequency rank of token")
      ax.set_ylabel("Absolute frequency of token")
      ax.grid(True, which="both")
      plt.show()
      

      我首先没有想到的另一种解决方案是使用两个不同的轴,一个轴的对数刻度看起来不错,并产生正确的标签和刻度,而另一个则将数据绘制到该轴上. /p>

      A different solution, of which I did not think in the first place, would be to use two different axes, one with a loglog scale which looks nice and produces the correct labels and ticks and anotherone to plot the data to.

      import matplotlib.pyplot as plt
      import seaborn as sns
      import pandas as pd
      import numpy as np
      
      ranks = [3541, 60219, 172644, 108926, 733215, 1297533, 1297534, 1297535]
      # These frequencies are already log-scale
      freqs = [-10.932271003723145, -15.213129043579102, -17.091760635375977, -16.27560806274414, 
              -19.482173919677734, -19.502029418945312, -19.502029418945312, -19.502029418945312]
      
      data = {
          'ranks': ranks, 
          'freqs': freqs,
      }
      
      df = pd.DataFrame(data=data)
      
      fig, ax = plt.subplots(figsize=(6, 6))
      # use 2 axes
      # ax is the log, log scale which produces nice labels and ticks
      ax.set(xscale="log", yscale="log")
      ax.set_title("Zipf plot")
      # ax2 is the axes where the values are plottet to
      ax2 = ax.twinx()
      
      #plot values to ax2
      sns.regplot("ranks", "freqs", data=df, ax=ax2, fit_reg=False)
      # set the limits of the log log axis to 10 to the power of the label of ax2
      ax.set_ylim(10**np.array(ax2.get_ylim())  )
      
      
      ax.set_xlabel("Frequency rank of token")
      ax.set_ylabel("Absolute frequency of token")
      # remove ticklabels and axislabel from ax2
      ax2.set_yticklabels([])
      ax2.set_ylabel("")
      ax.grid(True, which="both")
      plt.show()
      

      这篇关于我可以在不转换相关数据的情况下模仿matplotlib中轴的对数刻度吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆