如何分隔列中的值并将其转换为数值? [英] How to separate values in a column and convert to numeric values?

查看:72
本文介绍了如何分隔列中的值并将其转换为数值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,其中的值是折叠的,因此每一行每一列都有多个输入.

例如:

  Gene Score1基因1 NA,NA,NA,0.03,-0.3基因2 NA,0.2,0.1 

我正在尝试将其解压缩,然后为 Score1 列选择每行的最大绝对值-并通过创建新列来跟踪最大绝对值以前是否为负./p>

因此,示例的输出为:

  Gene Score1 Negatives1基因1 0.3 1基因1 0.2 0#Score1现在是最大绝对值,并且如果它曾经是负值,则将被跟踪 

我使用以下代码进行编码:

  dat2<-dat%>%tidyr :: separate_rows(Score1,sep =,",convert = TRUE)%>%group_by(Gene)%&%;%#创建负数列以跟踪最大的负数绝对值summarise(负数1 = +(最小值(分数1 ==-最大值(绝对(分数1)))),Score1 = max(abs(Score1),na.rm = TRUE)) 

但是,由于某种原因,上面的代码给了我这个错误:

 错误:`summarise()`输入`Negatives1`时出现问题.x数学函数的非数值参数我输入`Negatives1'是`+(min(Score1)== -max(abs(Score1)))`.在组1中发生错误:基因="Gene1".运行`rlang :: last_error()`以查看错误发生的位置. 

我通过使用 convert = TRUE 可以使值变成数字-但是错误提示我运行 separate_rows()后代码正在获取非数字值?

示例输入数据:

 结构(列表(基因= c("Gene1","Gene2"),Score1 = c("NA,NA,NA,0.03,-0.3","NA,0.2,0.1")),row.names = c(NA,-2L),class = c("data.table","data.frame")) 

解决方案

如果我们查看 separate_rows 输出,我认为问题很明显:您的分隔列不是数字!我想 convert 没拿起它.我们可以使用 as.numeric()强制转换(而忽略警告-我们希望像"NA" 这样的东西变成 NA )

摘要中还存在一些问题-需要更多 na.rm = TRUE ,parens不匹配等.

 日期%>%tidyr :: separate_rows(Score1,sep =,",convert = TRUE)##小动作:8 x 2#基因得分1#< chr>< chr>#1 Gene1 NA#2 Gene1"NA"#3 Gene1"NA"#4 Gene1"0.03英寸#5 Gene1"-0.3";#6 Gene2 NA#7 Gene2"0.2英寸#8 Gene2"0.1英寸dat%&%;%tidyr :: separate_rows(Score1,sep =,",convert = TRUE)%>%mutate(Score1 = as.numeric(Score1))%>%group_by(Gene)%&%;%#创建负数列以跟踪最大的负数绝对值总结(负数1 = +(min(Score1,na.rm = TRUE)== -max(abs(Score1),na.rm = TRUE)),Score1 = max(abs(Score1),na.rm = TRUE))#`summarise()`取消组合输出(用`.groups`参数覆盖)##小动作:2 x 3#基因阴性1得分1#< chr>< int>< dbl>#1 Gene1 1 0.3#2 Gene2 0 0.2 

I have a dataset where the values are collapsed so each row has multiple inputs per one column.

For example:

Gene   Score1                      
Gene1  NA, NA, NA, 0.03, -0.3 
Gene2  NA, 0.2, 0.1   

I am trying to unpack this to then select the maximum absolute value per row for the Score1 column - and also keep track of if the maximum absolute value was previously negative by creating a new column.

So output of the example is:

Gene   Score1    Negatives1
Gene1   0.3          1
Gene1   0.2          0
#Score1 is now the maximum absolute value and if it used to be negative is tracked

I code this with:

dat2 <- dat %>%
  tidyr::separate_rows(Score1, sep = ",", convert = TRUE) %>%
  group_by(Gene) %>%
  #Create negative column to track max absolute values that were negative
  summarise(Negatives1 = +(min(Score1 == -max(abs(Score1))),
            Score1 = max(abs(Score1), na.rm = TRUE))

However, for some reason the above code gives me this error:

Error: Problem with `summarise()` input `Negatives1`.
x non-numeric argument to mathematical function
i Input `Negatives1` is `+(min(Score1) == -max(abs(Score1)))`.
i The error occurred in group 1: Gene = "Gene1".
Run `rlang::last_error()` to see where the error occurred.

I though by using convert = TRUE this would make the values numeric - but the error suggests the code is getting non-numeric values after I run separate_rows()?

Example input data:

structure(list(Gene = c("Gene1", "Gene2"), Score1 = c("NA, NA, NA, 0.03, -0.3", 
"NA, 0.2, 0.1")), row.names = c(NA, -2L), class = c("data.table", 
"data.frame"))

解决方案

If we look at the separate_rows outuput, I think the issue becomes clear: your separated column isn't numeric! I guess convert didn't pick it up. We can force the conversion with as.numeric() (and ignore the warnings - we want things like " NA" to become NA).

You also have some issues in the summarise - need more na.rm = TRUE, mismatched parens, etc.

dat %>%
  tidyr::separate_rows(Score1, sep = ",", convert = TRUE)
# # A tibble: 8 x 2
#   Gene  Score1 
#   <chr> <chr>  
# 1 Gene1  NA    
# 2 Gene1 " NA"  
# 3 Gene1 " NA"  
# 4 Gene1 " 0.03"
# 5 Gene1 " -0.3"
# 6 Gene2  NA    
# 7 Gene2 " 0.2" 
# 8 Gene2 " 0.1" 

dat %>%
  tidyr::separate_rows(Score1, sep = ",", convert = TRUE) %>% 
  mutate(Score1 = as.numeric(Score1)) %>% 
  group_by(Gene) %>%
  #Create negative column to track max absolute values that were negative
  summarise(
    Negatives1 = +(min(Score1, na.rm = TRUE) == -max(abs(Score1), na.rm = TRUE)),
    Score1 = max(abs(Score1), na.rm = TRUE)
  )
# `summarise()` ungrouping output (override with `.groups` argument)
# # A tibble: 2 x 3
#   Gene  Negatives1 Score1
#   <chr>      <int>  <dbl>
# 1 Gene1          1    0.3
# 2 Gene2          0    0.2

这篇关于如何分隔列中的值并将其转换为数值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆