如何分隔列中的值并将其转换为数值? [英] How to separate values in a column and convert to numeric values?
问题描述
我有一个数据集,其中的值是折叠的,因此每一行每一列都有多个输入.
例如:
Gene Score1基因1 NA,NA,NA,0.03,-0.3基因2 NA,0.2,0.1
我正在尝试将其解压缩,然后为 Score1
列选择每行的最大绝对值-并通过创建新列来跟踪最大绝对值以前是否为负./p>
因此,示例的输出为:
Gene Score1 Negatives1基因1 0.3 1基因1 0.2 0#Score1现在是最大绝对值,并且如果它曾经是负值,则将被跟踪
我使用以下代码进行编码:
dat2<-dat%>%tidyr :: separate_rows(Score1,sep =,",convert = TRUE)%>%group_by(Gene)%&%;%#创建负数列以跟踪最大的负数绝对值summarise(负数1 = +(最小值(分数1 ==-最大值(绝对(分数1)))),Score1 = max(abs(Score1),na.rm = TRUE))
但是,由于某种原因,上面的代码给了我这个错误:
错误:`summarise()`输入`Negatives1`时出现问题.x数学函数的非数值参数我输入`Negatives1'是`+(min(Score1)== -max(abs(Score1)))`.在组1中发生错误:基因="Gene1".运行`rlang :: last_error()`以查看错误发生的位置.
我通过使用 convert = TRUE
可以使值变成数字-但是错误提示我运行 separate_rows()
后代码正在获取非数字值?
示例输入数据:
结构(列表(基因= c("Gene1","Gene2"),Score1 = c("NA,NA,NA,0.03,-0.3","NA,0.2,0.1")),row.names = c(NA,-2L),class = c("data.table","data.frame"))
如果我们查看 separate_rows
输出,我认为问题很明显:您的分隔列不是数字!我想 convert
没拿起它.我们可以使用 as.numeric()
强制转换(而忽略警告-我们希望像"NA"
这样的东西变成 NA
)
摘要
中还存在一些问题-需要更多 na.rm = TRUE
,parens不匹配等.
日期%>%tidyr :: separate_rows(Score1,sep =,",convert = TRUE)##小动作:8 x 2#基因得分1#< chr>< chr>#1 Gene1 NA#2 Gene1"NA"#3 Gene1"NA"#4 Gene1"0.03英寸#5 Gene1"-0.3";#6 Gene2 NA#7 Gene2"0.2英寸#8 Gene2"0.1英寸dat%&%;%tidyr :: separate_rows(Score1,sep =,",convert = TRUE)%>%mutate(Score1 = as.numeric(Score1))%>%group_by(Gene)%&%;%#创建负数列以跟踪最大的负数绝对值总结(负数1 = +(min(Score1,na.rm = TRUE)== -max(abs(Score1),na.rm = TRUE)),Score1 = max(abs(Score1),na.rm = TRUE))#`summarise()`取消组合输出(用`.groups`参数覆盖)##小动作:2 x 3#基因阴性1得分1#< chr>< int>< dbl>#1 Gene1 1 0.3#2 Gene2 0 0.2
I have a dataset where the values are collapsed so each row has multiple inputs per one column.
For example:
Gene Score1
Gene1 NA, NA, NA, 0.03, -0.3
Gene2 NA, 0.2, 0.1
I am trying to unpack this to then select the maximum absolute value per row for the Score1
column - and also keep track of if the maximum absolute value was previously negative by creating a new column.
So output of the example is:
Gene Score1 Negatives1
Gene1 0.3 1
Gene1 0.2 0
#Score1 is now the maximum absolute value and if it used to be negative is tracked
I code this with:
dat2 <- dat %>%
tidyr::separate_rows(Score1, sep = ",", convert = TRUE) %>%
group_by(Gene) %>%
#Create negative column to track max absolute values that were negative
summarise(Negatives1 = +(min(Score1 == -max(abs(Score1))),
Score1 = max(abs(Score1), na.rm = TRUE))
However, for some reason the above code gives me this error:
Error: Problem with `summarise()` input `Negatives1`.
x non-numeric argument to mathematical function
i Input `Negatives1` is `+(min(Score1) == -max(abs(Score1)))`.
i The error occurred in group 1: Gene = "Gene1".
Run `rlang::last_error()` to see where the error occurred.
I though by using convert = TRUE
this would make the values numeric - but the error suggests the code is getting non-numeric values after I run separate_rows()
?
Example input data:
structure(list(Gene = c("Gene1", "Gene2"), Score1 = c("NA, NA, NA, 0.03, -0.3",
"NA, 0.2, 0.1")), row.names = c(NA, -2L), class = c("data.table",
"data.frame"))
If we look at the separate_rows
outuput, I think the issue becomes clear: your separated column isn't numeric! I guess convert
didn't pick it up. We can force the conversion with as.numeric()
(and ignore the warnings - we want things like " NA"
to become NA
).
You also have some issues in the summarise
- need more na.rm = TRUE
, mismatched parens, etc.
dat %>%
tidyr::separate_rows(Score1, sep = ",", convert = TRUE)
# # A tibble: 8 x 2
# Gene Score1
# <chr> <chr>
# 1 Gene1 NA
# 2 Gene1 " NA"
# 3 Gene1 " NA"
# 4 Gene1 " 0.03"
# 5 Gene1 " -0.3"
# 6 Gene2 NA
# 7 Gene2 " 0.2"
# 8 Gene2 " 0.1"
dat %>%
tidyr::separate_rows(Score1, sep = ",", convert = TRUE) %>%
mutate(Score1 = as.numeric(Score1)) %>%
group_by(Gene) %>%
#Create negative column to track max absolute values that were negative
summarise(
Negatives1 = +(min(Score1, na.rm = TRUE) == -max(abs(Score1), na.rm = TRUE)),
Score1 = max(abs(Score1), na.rm = TRUE)
)
# `summarise()` ungrouping output (override with `.groups` argument)
# # A tibble: 2 x 3
# Gene Negatives1 Score1
# <chr> <int> <dbl>
# 1 Gene1 1 0.3
# 2 Gene2 0 0.2
这篇关于如何分隔列中的值并将其转换为数值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!