data.frame的可视化结构:NAs的位置等等 [英] visual structure of a data.frame: locations of NAs and much more

查看:181
本文介绍了data.frame的可视化结构:NAs的位置等等的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用一个颜色代码在单个图上表示一个数据框架(或矩阵,或数据表)的结构。我想这对于处理各种类型的数据的人来说可能是非常有用的,可以一目了然的。



也许有人已经开发了一个包来做,但我找不到(只是?它创建了一个类似的图像,尽管不是上面提到的所有细节,它不是基于R.在 R-ohjelmointi.org ,但文字在芬兰语。主要功能是 csvSormenjalki()。也许这可以进一步适应你的整个愿景?


I want to represent the structure of a data frame (or matrix, or data.table whatever) on a single plot with color-coding. I guess that could be very useful for many people handling various types of data, to visualize it in a single glance.

Perhaps someone have already developed a package to do it, but I couldn't find one (just this). So here is a rough mockup of my "vision", kind of a heatmap, showing in color codes:

  • the NA locations,
  • the class of variables (factors (how many levels?), numeric (with color gradient, zeros, outliers...), strings)
  • dimensions
  • etc.....

So far I have just written a function to plot the NA locations it goes like this:

ggSTR = function(data, alpha=0.5){
  require(ggplot2)
  DF <- data
  if (!is.matrix(data)) DF <- as.matrix(DF)

  to.plot <- cbind.data.frame('y'=rep(1:nrow(DF), each=ncol(DF)), 
                              'x'=as.logical(t(is.na(DF)))*rep(1:ncol(DF), nrow(DF)))
  size <- 20 / log( prod(dim(DF)) )  # size of point depend on size of table
  g <- ggplot(data=to.plot) + aes(x,y) +
        geom_point(size=size, color="red", alpha=alpha) +
        scale_y_reverse() + xlim(1,ncol(DF)) +
        ggtitle("location of NAs in the data frame")

  pc <- round(sum(is.na(DF))/prod(dim(DF))*100, 2) # % NA
  print(paste("percentage of NA data: ", pc))

  return(g)
}

It takes any data.frame in input and returns this image:

It's too big a challenge for me to achieve the first image.

解决方案

Have you encountered the CSV fingerprint service? It creates a similar image, althought not with all the details you have outlined above, and it's not based on R. There is an R version of a similar idea at R-ohjelmointi.org, but the text is in Finnish. The main function is csvSormenjalki(). Maybe that could be adapted further to fulfill your whole vision?

这篇关于data.frame的可视化结构:NAs的位置等等的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆