将列添加到我的数据框,列出具有最高行值的列 [英] add column to my data frame listing columns with the highest row value
问题描述
尝试告诉r读取我数据框的行,并将该行中具有最高值的列添加到数据框中名为"MOST_COMMON_CANCER"的新列中.
trying tell r to read through the rows of my dataframe and add the column with the highest value in the row to a new column in the dataframe called "MOST_COMMON_CANCER"
我尝试了以下代码,但出现了错误.
I tried the following code but got an error.
BASE_DF2 <- BASE_DF2%>%mutate(MOST_COMMON_CANCER=colnames(BASE_DF2[8:26])[max.col(BASE_DF2[8:26],ties.method="first")],.keep="all",.after=c_INCS_RATE)
Error: Problem with `mutate()` input `MOST_COMMON_CANCER`.
x Input `MOST_COMMON_CANCER` can't be recycled to size 1.
i Input `MOST_COMMON_CANCER` is `colnames(BASE_DF2[8:26])[max.col(BASE_DF2[8:26], ties.method = "first")]`.
i Input `MOST_COMMON_CANCER` must be size 1, not 490.
i The error occurred in group 1: YEAR_OF_DIAGNOSIS = "2015", STATE_ABBR = "CA", COUNTY_NAME = "ALAMEDA".
这是我数据框的dput,尽管我已将其从原来的80列中缩小了
dput(head(BASE_DF2[1:31]))
structure(list(YEAR_OF_DIAGNOSIS = structure(c(1L, 2L, 1L, 2L,
1L, 2L), .Label = c("2015", "2016"), class = "factor"), STATE_ABBR = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("CA", "KY", "MA", "NM", "NY"), class = "factor"),
COUNTY_NAME = c("ALAMEDA", "ALAMEDA", "AMADOR", "AMADOR",
"BUTTE", "BUTTE"), AGE_AT_DIAGNOSIS = c(64.0595588235294,
64.4077743902439, 65.5079365079365, 66, 66.5040322580645,
66.4507575757576), `%_<_HIGH_SCHOOL_EDUCATION` = c(12.46,
12.46, 10.29, 10.29, 11.25, 11.25), `%_PERSONS_<150%_OF_POVERTY` = c(17.82,
17.82, 18.68, 18.68, 31.63, 31.63), `MEDIAN_FAMILY_INCOME_(IN_TENS)_ACS_2013-2017` = c(10360,
10360, 7415, 7415, 6105, 6105), Leukemia = c(59, 72, 0, 3,
13, 6), Miscellaneous = c(33, 36, 2, 3, 3, 4), Colorectal = c(124,
124, 6, 7, 25, 24), Musculoskeletal = c(10, 15, 1, 0, 3,
2), Brain_Nervous_System = c(26, 20, 1, 1, 2, 2), Breast = c(208,
214, 8, 10, 37, 42), Cervical_Uterine = c(54, 73, 2, 1, 7,
10), UGI_Tract = c(52, 51, 5, 1, 17, 9), Head = c(91, 65,
3, 1, 15, 15), Pancreatic_Biliary = c(104, 80, 5, 4, 10,
13), Lymphoma = c(56, 77, 1, 4, 15, 22), Throat = c(17, 19,
0, 0, 2, 2), Kidney_Ureter = c(48, 45, 5, 1, 5, 7), Lung = c(154,
128, 8, 6, 33, 37), Skin_Melanoma = c(80, 52, 9, 5, 17, 25
), Female_reproductive = c(28, 32, 0, 2, 6, 2), Male_reproductive = c(6,
9, 1, 0, 1, 3), Bladder = c(54, 53, 2, 2, 10, 7), Prostate = c(156,
147, 4, 2, 27, 32), TOTAL_CANCER = c(1360, 1312, 63, 53,
248, 264), c_INCS_RATE = c(0.000832039389723579, 0.000794693964081287,
0.00170127730820124, 0.00141601432044671, 0.00110403283607338,
0.0011669488266418), population = c(1634538, 1650950, 37031,
37429, 224631, 226231), AIR_1990 = c(3889287, 3889287, 222121,
222121, 252194, 252194), OnSite_LAND_1990 = c(231928, 231928,
1460, 1460, 515, 515)), row.names = c(NA, -6L), groups = structure(list(
YEAR_OF_DIAGNOSIS = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("2015",
"2016"), class = "factor"), STATE_ABBR = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("CA", "KY", "MA", "NM", "NY"
), class = "factor"), COUNTY_NAME = c("ALAMEDA", "AMADOR",
"BUTTE", "ALAMEDA", "AMADOR", "BUTTE"), .rows = structure(list(
1L, 3L, 5L, 2L, 4L, 6L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, 6L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
Run `rlang::last_error()` to see where the error occurred.
我能够得到下面的输出,我想我可以将其分配给一个向量,然后添加到数据帧中,但我想保持整洁和简化.
I was able to get this output(below), which I imagine I could assign to a vector then add to the dataframe but I would like to keep things neat and streamlined.
colnames(BASE_DF2[8:26])[max.col(BASE_DF2[8:26],ties.method="first")]
[1] "Breast" "Breast" "Skin_Melanoma" "Breast" "Breast"
[6] "Breast"
我的问题被标记为是因为类似的问题.我的问题与使用该问题作为代码基础类似,但是我还有其他一些参数卡住了.
My question was flagged because of a similar question. My question is similar as it used that question as a basis for my code however I have additional parameters that have me stuck.
推荐答案
请注意,您的数据已分组,您也可以使用.
来引用数据框.
Notice that your data is grouped, also you can use .
to refer to dataframe here.
library(dplyr)
BASE_DF2%>%
ungroup %>%
mutate(MOST_COMMON_CANCER = colnames(.[8:26])[max.col(.[8:26],
ties.method="first")], .after=c_INCS_RATE)
这篇关于将列添加到我的数据框,列出具有最高行值的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!