Matplotlib对数标度为零 [英] Matplotlib logarithmic scale with zero value

查看:114
本文介绍了Matplotlib对数标度为零的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常庞大且稀疏的垃圾邮件Twitter帐户数据集,它要求我缩放x轴才能可视化各种变量(tweets_count,关注者/关注者等).

I have a very large and sparse dataset of spam twitter accounts and it requires me to scale the x axis in order to be able to visualize the distribution (histogram, kde etc) and cdf of the various variables (tweets_count, number of followers/following etc).

    > describe(spammers_class1$tweets_count)
  var       n   mean      sd median trimmed mad min    max  range  skew kurtosis   se
1   1 1076817 443.47 3729.05     35   57.29  43   0 669873 669873 53.23  5974.73 3.59

在此数据集中,值0具有很高的重要性(实际上0应该具有最高的密度).但是,使用对数标度时,这些值将被忽略.我曾想过将值更改为0.1,但是,有10个-1个关注者的垃圾邮件帐户是没有意义的.

In this dataset, the value 0 has a huge importance (actually 0 should have the highest density). However, with a logarithmic scale these values are ignored. I thought of changing the value to 0.1 for example, but it will not make sense that there are spam accounts that have 10^-1 followers.

那么,python和matplotlib中的解决方法是什么?

So, what would be a workaround in python and matplotlib ?

推荐答案

为每个x值添加1,然后然后记录日志:

Add 1 to each x value, then take the log:

import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as ticker

fig, ax = plt.subplots()
x = [0, 10, 100, 1000]
y = [100, 20, 10, 50]
x = np.asarray(x) + 1 
y = np.asarray(y)
ax.plot(x, y)
ax.set_xscale('log')
ax.set_xlim(x.min(), x.max())
ax.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(x-1)))
ax.xaxis.set_major_locator(ticker.FixedLocator(x))
plt.show()

使用

ax.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(x-1)))
ax.xaxis.set_major_locator(ticker.FixedLocator(x))

根据x的非对数值重新标记刻度线.

to relabel the tick marks according to the non-log values of x.

(我最初的建议是使用plt.xticks(x, x-1),但这会影响所有轴.为了隔离对一个特定轴的更改,我将所有命令调用都更改为ax,而不是对plt的调用.)

(My original suggestion was to use plt.xticks(x, x-1), but this would affect all axes. To isolate the changes to one particular axes, I changed all commands calls to ax, rather than calls to plt.)

matplotlib删除包含NaNinf-inf值的点.由于log(0)-inf,因此与x=0相对应的点将从对数图中删除.

matplotlib removes points which contain a NaN, inf or -inf value. Since log(0) is -inf, the point corresponding to x=0 would be removed from a log plot.

如果将所有x值都增加1,因为log(1) = 0,对应于x=0的点将不会在对数图上的x=log(1)=0处绘制.

If you increase all the x-values by 1, since log(1) = 0, the point corresponding to x=0 will not be plotted at x=log(1)=0 on the log plot.

剩余的x值也将移动一个,但对眼睛来说并不重要,因为log(x+1)对于x的较大值来说非常接近log(x).

The remaining x-values will also be shifted by one, but it will not matter to the eye since log(x+1) is very close to log(x) for large values of x.

这篇关于Matplotlib对数标度为零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆