data.frame的可视化结构:NAs的位置等等 [英] visual structure of a data.frame: locations of NAs and much more
问题描述
也许有人已经开发了一个包来做,但我找不到(只是?它创建了一个类似的图像,尽管不是上面提到的所有细节,它不是基于R.在 R-ohjelmointi.org ,但文字在芬兰语。主要功能是 csvSormenjalki()
。也许这可以进一步适应你的整个愿景?
I want to represent the structure of a data frame (or matrix, or data.table whatever) on a single plot with color-coding. I guess that could be very useful for many people handling various types of data, to visualize it in a single glance.
Perhaps someone have already developed a package to do it, but I couldn't find one (just this). So here is a rough mockup of my "vision", kind of a heatmap, showing in color codes:
- the NA locations,
- the class of variables (factors (how many levels?), numeric (with color gradient, zeros, outliers...), strings)
- dimensions
- etc.....
So far I have just written a function to plot the NA locations it goes like this:
ggSTR = function(data, alpha=0.5){
require(ggplot2)
DF <- data
if (!is.matrix(data)) DF <- as.matrix(DF)
to.plot <- cbind.data.frame('y'=rep(1:nrow(DF), each=ncol(DF)),
'x'=as.logical(t(is.na(DF)))*rep(1:ncol(DF), nrow(DF)))
size <- 20 / log( prod(dim(DF)) ) # size of point depend on size of table
g <- ggplot(data=to.plot) + aes(x,y) +
geom_point(size=size, color="red", alpha=alpha) +
scale_y_reverse() + xlim(1,ncol(DF)) +
ggtitle("location of NAs in the data frame")
pc <- round(sum(is.na(DF))/prod(dim(DF))*100, 2) # % NA
print(paste("percentage of NA data: ", pc))
return(g)
}
It takes any data.frame in input and returns this image:
It's too big a challenge for me to achieve the first image.
Have you encountered the CSV fingerprint service? It creates a similar image, althought not with all the details you have outlined above, and it's not based on R. There is an R version of a similar idea at R-ohjelmointi.org, but the text is in Finnish. The main function is csvSormenjalki()
. Maybe that could be adapted further to fulfill your whole vision?
这篇关于data.frame的可视化结构:NAs的位置等等的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!