从tidyr - gather() - r的输出中删除不完整的情况 [英] Removing incomplete cases from output of tidyr - gather() - r

查看:159
本文介绍了从tidyr - gather() - r的输出中删除不完整的情况的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



在这里,您可以在团队中看到一些足球队的名字。 Name1-3是列出第一列中用于引用这些团队的不同名称的变量。

  team name1 name2 name3 
1拉夫堡拉夫堡
2卢顿镇卢顿镇卢顿
3 Macclesfield Macclesfield
4 Maidstone United Maidstone United
5曼彻斯特城曼彻斯特城市城市
6曼彻斯特联队曼彻斯特联队牛顿希思曼联队
7曼斯菲尔德镇曼斯菲尔德镇曼斯菲尔德
8 Merthyr镇Merthyr镇

我的目标是将数据分成2列,包含team-name1, name2,team-name3配对。我只想保留那些在name1,name2或name3中有数据的配对。



为此,我正在尝试tidyr's- gather( )

  temp<  -  dat%>%gather(key,value,2:4 )
temp $ key< -NULL
temp

这将给出以下输出:

 团队价值
1拉夫堡拉夫堡
2卢顿镇卢顿镇
3 Macclesfield Macclesfield
4 Maidstone United Maidstone United
5曼彻斯特城曼彻斯特城
6曼联曼联曼联
7曼斯菲尔德镇曼斯菲尔德镇
8 Merthyr Town Merthyr Town
9 Loughborough
10 Luton Town卢顿
11 Macclesfield
12梅德斯通美式套房
13曼城曼城
14曼联N ewton Heath
15曼斯菲尔德镇曼斯菲尔德
16 Merthyr镇
17拉夫堡
18卢顿镇
19 Macclesfield
20梅德斯通美式联合酒店
21曼城
22曼联曼联
23曼斯菲尔德镇
24 Merthyr Town

我试图删除不完整的情况(例如行20,21,23,24但不是22),使用:

  temp [complete.cases(temp),] 

这没有工作,因为看似空的价值观察包含一个字符 - 我想这是怎么回事 gather()返回缺少的数据?我尝试将 temp $ value 转换为一个因素,但这并不奏效。



我很乐意听到如何摆脱不完整的情况。



样本数据...

  dat< -structure(list(team = structure(1:8,.Label = c(Loughborough,
Luton Town,Macclesfield,Maidstone United,Manchester City ,
Manchester United,Mansfield Town,Merthyr Town),class =factor),
name1 = structure(1:8,.Label = c(Loughborough卢顿镇,
Macclesfield,Maidstone United,曼彻斯特城,曼联,
曼斯菲尔德镇,Merthyr镇),class =factor =结构(c(1L,
2L,1L,1L,3L,5L,4L,1L),.Label = c(,Luton,Man City,
Mansfield ,Newton Heath),class =factor),name3 = structure(c(1L,
1L,1L,1L,1L,2L,1L,1L),.Label = c(,曼联),class =factor)),.Names = c(team,
name1,name2,name3),r ow.names = c(NA,-8L),class =data.frame)


解决方案

您还可以添加过滤器(为了删除空白)和选择为了从 dplyr 包中删除列,并一次性获取所有内容

  temp<  -  dat%>%
gather(key,value,2:4)%>%
filter !=)%>%
选择(-key)

#团队价值
#1 Loughborough Loughborough
#2卢顿镇卢顿镇
#3 Macclesfield Macclesfield
#4 Maidstone United Maidstone United
#5曼城曼彻斯特城
#6曼联曼联曼联
#7曼斯菲尔德镇曼斯菲尔德镇
# 8 Merthyr镇Merthyr镇
#9卢顿镇卢顿
#10曼城曼城
#11曼联wton Heath
#12曼斯菲尔德曼斯菲尔德
#13曼联曼联


I have untidy data in a dataframe that looks like this.

Here you can see in 'team' the names of some soccer teams. Name1-3 are variable listing the different names used to refer to these teams in the first column.

               team             name1        name2      name3
1      Loughborough      Loughborough                        
2        Luton Town        Luton Town        Luton           
3      Macclesfield      Macclesfield                        
4  Maidstone United  Maidstone United                        
5   Manchester City   Manchester City     Man City           
6 Manchester United Manchester United Newton Heath Man United
7    Mansfield Town    Mansfield Town    Mansfield           
8      Merthyr Town      Merthyr Town                        

My aim is to get the data into 2 columns with team-name1, team-name2, team-name3 pairings. I only want to keep those pairings where there is data in name1, name2 or name3.

To do this, I am trying tidyr's- gather()

temp <- dat %>% gather(key, value, 2:4) 
temp$key<-NULL
temp

This gives the following output:

                team             value
1       Loughborough      Loughborough
2         Luton Town        Luton Town
3       Macclesfield      Macclesfield
4   Maidstone United  Maidstone United
5    Manchester City   Manchester City
6  Manchester United Manchester United
7     Mansfield Town    Mansfield Town
8       Merthyr Town      Merthyr Town
9       Loughborough                  
10        Luton Town             Luton
11      Macclesfield                  
12  Maidstone United                  
13   Manchester City          Man City
14 Manchester United      Newton Heath
15    Mansfield Town         Mansfield
16      Merthyr Town                  
17      Loughborough                  
18        Luton Town                  
19      Macclesfield                  
20  Maidstone United                  
21   Manchester City                  
22 Manchester United        Man United
23    Mansfield Town                  
24      Merthyr Town                  

I tried to remove incomplete cases (e.g. rows 20,21, 23,24 but not 22), using:

temp[complete.cases(temp),]

This didn't work as the seemingly empty value observations contain a character "" - I guess this is how gather() returns missing data?. I tried converting temp$value to a factor but this didn't work either.

I'd love to hear how to get rid of the incomplete cases.

Sample data...

dat<-structure(list(team = structure(1:8, .Label = c("Loughborough", 
"Luton Town", "Macclesfield", "Maidstone United", "Manchester City", 
"Manchester United", "Mansfield Town", "Merthyr Town"), class = "factor"), 
    name1 = structure(1:8, .Label = c("Loughborough", "Luton Town", 
    "Macclesfield", "Maidstone United", "Manchester City", "Manchester United", 
    "Mansfield Town", "Merthyr Town"), class = "factor"), name2 = structure(c(1L, 
    2L, 1L, 1L, 3L, 5L, 4L, 1L), .Label = c("", "Luton", "Man City", 
    "Mansfield", "Newton Heath"), class = "factor"), name3 = structure(c(1L, 
    1L, 1L, 1L, 1L, 2L, 1L, 1L), .Label = c("", "Man United"), class = "factor")), .Names = c("team", 
"name1", "name2", "name3"), row.names = c(NA, -8L), class = "data.frame")

解决方案

You could also add filter (in order to remove blanks) and select (in order to remove key column) from dplyr package and get everything in one go

temp <- dat %>% 
  gather(key, value, 2:4) %>% 
  filter(value != "") %>%
  select(-key)

#                 team             value
# 1       Loughborough      Loughborough
# 2         Luton Town        Luton Town
# 3       Macclesfield      Macclesfield
# 4   Maidstone United  Maidstone United
# 5    Manchester City   Manchester City
# 6  Manchester United Manchester United
# 7     Mansfield Town    Mansfield Town
# 8       Merthyr Town      Merthyr Town
# 9         Luton Town             Luton
# 10   Manchester City          Man City
# 11 Manchester United      Newton Heath
# 12    Mansfield Town         Mansfield
# 13 Manchester United        Man United

这篇关于从tidyr - gather() - r的输出中删除不完整的情况的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆