解释ggplot2警告:“删除包含缺失值的k行” [英] Explain ggplot2 warning: "Removed k rows containing missing values"

查看:2906
本文介绍了解释ggplot2警告:“删除包含缺失值的k行”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我试图用 ggplot 产生一个图时,我得到了这个警告。

I get this warning when I am trying to generate a plot with ggplot.

有很多人认为我的数据库通常包含空值或缺失数据,但情况并非如此。

After researching online for a while many suggested that my database contains either null values or missing data in general, which was not the case.

在这个问题中接受的答案如下:


警告意味着某些元素因为超出指定范围而被删除

The warning means that some elements are removed because they fall out of the specified range

我想知道这个范围究竟指的是什么,为了避免所有的警告,人们如何手动增加这个范围?

I was wondering what exactly does this range refer to and how can someone manually increase this range in order to avoid all warnings?

推荐答案

您看到的行为是由于 ggplot2 处理图的轴范围外的数据。您可以根据是否使用 scale_y_continuous (或等价地, ylim )或来更改此行为> coord_cartesian 来设置轴范围,如下所述。

The behavior you're seeing is due to how ggplot2 deals with data that are outside the axis ranges of the plot. You can change this behavior depending on whether you use scale_y_continuous (or, equivalently, ylim) or coord_cartesian to set axis ranges, as explained below.

library(ggplot2)

# All points are visible in the plot
ggplot(mtcars, aes(mpg, hp)) + 
  geom_point()

在下面的代码中,hp = 335的一个点在图的y范围之外。此外,因为我们使用 scale_y_continuous 来设置y轴范围,所以此点不包含在由ggplot计算的任何其他统计量或汇总度量中,例如线性回归线。

In the code below, one point with hp = 335 is outside the y-range of the plot. Also, because we used scale_y_continuous to set the y-axis range, this point is not included in any other statistics or summary measures calculated by ggplot, such as the linear regression line.

ggplot(mtcars, aes(mpg, hp)) + 
  geom_point() +
  scale_y_continuous(limits=c(0,300)) +  # Change this to limits=c(0,335) and the warning disappars
  geom_smooth(method="lm")

Warning messages:
1: Removed 1 rows containing missing values (stat_smooth). 
2: Removed 1 rows containing missing values (geom_point).

在下面的代码中,hp = 335的点仍然在图的y范围之外,但这一点仍然包含在ggplot计算的任何统计量或总结度量中,例如线性回归线。这是因为我们使用 coord_cartesian 设置y轴范围,并且此功能不会排除绘图范围之外的点,因为它会对数据执行其他计算。

In the code below, the point with hp = 335 is still outside the y-range of the plot, but this point is nevertheless included in any statistics or summary measures that ggplot calculates, such as the linear regression line. This is because we used coord_cartesian to set the y-axis range, and this function does not exclude points that are outside the plot ranges when it does other calculations on the data.

如果您将此图与前面的图比较,可以看到第二个图中的线性回归线具有稍微陡峭的斜率,因为hp = 335的点是包括在计算回归线时,即使它在图中不可见。

If you compare this and the previous plot, you can see that the linear regression line in the second plot has a slightly steeper slope, because the point with hp=335 is included when calculating the regression line, even though it's not visible in the plot.

ggplot(mtcars, aes(mpg, hp)) + 
  geom_point() +
  coord_cartesian(ylim=c(0,300)) +
  geom_smooth(method="lm")

这篇关于解释ggplot2警告:“删除包含缺失值的k行”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆