如何在R中的非结构化数据框架内定位数据结构化区域？ [英] How to locate a structured region of data inside of a not structured data frame in R?

查看：160 发布时间：2017/3/26 4:50:52 r dataframe subset data-cleaning

本文介绍了如何在R中的非结构化数据框架内定位数据结构化区域？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一些包含感兴趣的子集的数据框。 问题是这个子集在不同的数据帧之间是不一致的。尽管如此，在更抽象的层次中，遵循一般结构：数据框架内的一个矩形区域。
example1< - data.frame（x = c（name，129-2，NA，NA，acc，2,3,4，NA，NA） y = c（NA，NA，NA，NA，deb，3，2，5，NA，NA）， z = c（NA，NA，NA，NA，asset ，1，2，NA，NA）） print（example1） xyz 1名称< NA> < NA> 2 129-2< NA> < NA> 3< NA> < NA> < NA> 4< NA> < NA> < NA> 5分配ACC DEB资产 6分配2 3 1 7分配3 2 1 8 4 5 2 9版; NA> < NA> < NA> 10< NA> < NA> < NA>
example1 包含一个具有结构信息的矩形区域：
5 acc deb资产 6 2 3 1 7 3 2 1 8 4 5 2
如前所述，该地区不是总是一致的

列的位置并不总是相同的

$ b
这里是另一个 example2 ：
example2< - data.frame（x = c （name，129-2，wallabe＃23，NA，NA，acc，2,3,4，NA）， y = c（NA，NA，NA，NA，余额，债务，3，2，5，NA）， z = c（NA，NA，NA，NA，NA，资产，1,1,2，NA）， u = c（NA，NA，NA，货币：，NA，NA，NA，NA，NA，NA）， i = c（NA，NA，NA，USD，result ，2，3，1，NA）， o = c（NA，NA，NA，NA，NA，输入，2，2，1，NA）） print（example2） >示例2 X YžüI O 1名< NA> < NA> < NA> < NA> < NA> 2 129-2< NA> < NA> < NA> < NA> < NA> 3 wallabe＃23< NA> < NA> < NA> < NA> < NA> 4< NA> < NA> < NA>货币：USD< NA> 5< NA>余额< NA> < NA>结果< NA> 6 acc deb资产< NA>赢了 7 2 3 1< NA> 2 2 8 3 2 1< NA> 3 2 9 4 5 2< NA> 1 1 10< NA> < NA> < NA> < NA> < NA> < NA>
example2 包含一个明确矩形区域：
6 ACC DEB资产< NA>赢了 7 2 3 1< NA> 2 2 8 3 2 1< NA> 3 2 9 4 5 2< NA> 1 1
扫描此数据框以查找其中的这种区域的一种方法？

任何想法都赞赏
解决方案
想要尝试同样数量的 NA的最长序列 s：
findTable< - function（df）{ naSeq< - rowSums（is.na（df））＃每行 myRle < - rle（naSeq）$ length＃查找序列长度 df [rep（myRle == max（myRle），myRle），]＃获取最长序列 } findTable（example1） xyz 5 acc deb资产 6 2 3 1 7 3 2 1 8 4 5 2 findTable（example2） xyzuio 6 acc资产< NA>赢了 7 2 3 1< NA> 2 2 8 3 2 1< NA> 3 2 9 4 5 2< NA> 1个1

I have a certain kind of data frames that contain a subset of interest. The problem is that this subset, is non consistent between the different data frames. Nonetheless, in a more abstract level, follows a general structure: a rectangular region inside the data frame.
example1 <- data.frame(x = c("name", "129-2", NA, NA, "acc", 2, 3, 4, NA, NA), y = c(NA, NA, NA, NA, "deb", 3, 2, 5, NA, NA), z = c(NA, NA, NA, NA, "asset", 1, 1, 2, NA, NA)) print(example1) x y z 1 name <NA> <NA> 2 129-2 <NA> <NA> 3 <NA> <NA> <NA> 4 <NA> <NA> <NA> 5 acc deb asset 6 2 3 1 7 3 2 1 8 4 5 2 9 <NA> <NA> <NA> 10 <NA> <NA> <NA>
The example1 contain a clear rectangular región with a structure information:
5 acc deb asset 6 2 3 1 7 3 2 1 8 4 5 2
As mentioned before, the region is not always consistent,

the position of the columns are not always the same

the name of the variables insde the subset of interest are not always the same

Here another example2:
example2 <- data.frame(x = c("name", "129-2", "wallabe #23", NA, NA, "acc", 2, 3, 4, NA ), y = c(NA, NA, NA, NA, "balance", "deb", 3, 2, 5, NA), z = c(NA, NA, NA, NA, NA, "asset", 1, 1, 2, NA), u = c(NA, NA, NA, "currency:", NA, NA, NA, NA, NA, NA), i = c(NA, NA, NA, "USD", "result", "win", 2, 3, 1, NA), o = c(NA, NA, NA, NA, NA, "lose", 2, 2, 1, NA)) print(example2) > example2 x y z u i o 1 name <NA> <NA> <NA> <NA> <NA> 2 129-2 <NA> <NA> <NA> <NA> <NA> 3 wallabe #23 <NA> <NA> <NA> <NA> <NA> 4 <NA> <NA> <NA> currency: USD <NA> 5 <NA> balance <NA> <NA> result <NA> 6 acc deb asset <NA> win lose 7 2 3 1 <NA> 2 2 8 3 2 1 <NA> 3 2 9 4 5 2 <NA> 1 1 10 <NA> <NA> <NA> <NA> <NA> <NA>
The example2 contain a not clear rectangular región:
6 acc deb asset <NA> win lose 7 2 3 1 <NA> 2 2 8 3 2 1 <NA> 3 2 9 4 5 2 <NA> 1 1
One method to scan this dataframe to locate this kind of region inside of it?

Any idea is appreciated
解决方案
You might want to try the longest sequence with same amount of NAs:
findTable <- function(df){ naSeq <- rowSums(is.na(df)) # How many NA per row myRle <- rle(naSeq )$length # Find sequences length df[rep(myRle == max(myRle), myRle),] # Get longest sequence } findTable(example1) x y z 5 acc deb asset 6 2 3 1 7 3 2 1 8 4 5 2 findTable(example2) x y z u i o 6 acc deb asset <NA> win lose 7 2 3 1 <NA> 2 2 8 3 2 1 <NA> 3 2 9 4 5 2 <NA> 1 1

这篇关于如何在R中的非结构化数据框架内定位数据结构化区域？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在R中的非结构化数据框架内定位数据结构化区域？ [英] How to locate a structured region of data inside of a not structured data frame in R?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在R中的非结构化数据框架内定位数据结构化区域？ [英] How to locate a structured region of data inside of a not structured data frame in R?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭