解释 ggplot2 警告:“已删除包含缺失值的 k 行"; [英] Explain ggplot2 warning: "Removed k rows containing missing values"

查看:57
本文介绍了解释 ggplot2 警告:“已删除包含缺失值的 k 行";的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我尝试使用 ggplot 生成绘图时收到此警告.

I get this warning when I am trying to generate a plot with ggplot.

在网上研究了一段时间后,许多人认为我的数据库通常包含空值或缺失数据,但事实并非如此.

After researching online for a while many suggested that my database contains either null values or missing data in general, which was not the case.

在这个问题中 接受的答案如下:

警告表示某些元素因超出指定范围而被移除

The warning means that some elements are removed because they fall out of the specified range

我想知道这个范围到底指的是什么,以及如何手动增加这个范围以避免所有警告?

I was wondering what exactly does this range refer to and how can someone manually increase this range in order to avoid all warnings?

推荐答案

您看到的行为是由于 ggplot2 如何处理绘图轴范围之外的数据.您可以根据您是使用 scale_y_continuous(或等效地,ylim)还是 coord_cartesian 来设置轴范围来更改此行为,如下所述.

The behavior you're seeing is due to how ggplot2 deals with data that are outside the axis ranges of the plot. You can change this behavior depending on whether you use scale_y_continuous (or, equivalently, ylim) or coord_cartesian to set axis ranges, as explained below.

library(ggplot2)

# All points are visible in the plot
ggplot(mtcars, aes(mpg, hp)) + 
  geom_point()

在下面的代码中,hp = 335 的一个点在绘图的 y 范围之外.此外,由于我们使用scale_y_continuous 来设置y 轴范围,因此该点不包含在ggplot 计算的任何其他统计量或汇总度量中,例如线性回归线.

In the code below, one point with hp = 335 is outside the y-range of the plot. Also, because we used scale_y_continuous to set the y-axis range, this point is not included in any other statistics or summary measures calculated by ggplot, such as the linear regression line.

ggplot(mtcars, aes(mpg, hp)) + 
  geom_point() +
  scale_y_continuous(limits=c(0,300)) +  # Change this to limits=c(0,335) and the warning disappars
  geom_smooth(method="lm")

Warning messages:
1: Removed 1 rows containing missing values (stat_smooth). 
2: Removed 1 rows containing missing values (geom_point).

在下面的代码中,hp = 335 的点仍然在绘图的 y 范围之外,但该点仍然包含在 ggplot 计算的任何统计量或汇总度量中,例如线性回归线.这是因为我们使用coord_cartesian 来设置y 轴范围,并且该函数在对数据进行其他计算时不会排除绘图范围之外的点.

In the code below, the point with hp = 335 is still outside the y-range of the plot, but this point is nevertheless included in any statistics or summary measures that ggplot calculates, such as the linear regression line. This is because we used coord_cartesian to set the y-axis range, and this function does not exclude points that are outside the plot ranges when it does other calculations on the data.

如果你比较这个图和上一个图,你可以看到第二个图中的线性回归线有一个稍微陡峭的斜率,因为计算回归线时包含了hp=335的点,即使它不可见在情节中.

If you compare this and the previous plot, you can see that the linear regression line in the second plot has a slightly steeper slope, because the point with hp=335 is included when calculating the regression line, even though it's not visible in the plot.

ggplot(mtcars, aes(mpg, hp)) + 
  geom_point() +
  coord_cartesian(ylim=c(0,300)) +
  geom_smooth(method="lm")

这篇关于解释 ggplot2 警告:“已删除包含缺失值的 k 行";的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆