在多列中对R使用排序和排序 [英] Using sort and rank in R on multiple columns
本文介绍了在多列中对R使用排序和排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试按各州的最低费率对我的医院名称进行排名. 当多个医院的费率相同时,应通过使用医院名称并按字母顺序对它们进行打结.到目前为止,我已经设法在州内对费用进行了排名,并按医院名称对其进行了排序,但是我无法弄清楚如何打破联系并对其进行排名而不跳过数字
I’m trying to rank my hospital name by lowest rate for each state. When multiple hospitals have the same rate, the tie should be broken by using the hospital name and sorting it alphabetically. So far I’ve managed to rank it by rate within the state sorting it by hospital name, but I can’t figure out how to break the ties and rank it without skipping numbers
这是我到目前为止通过使用以下代码获得的信息:
This is what I’ve got so far by using the following code:
outcome_data <- read.csv("outcome-of-care-measures.csv", na.strings="Not Available" ,stringsAsFactors=FALSE) #Read csv file
myData = outcome_data[,c(2, 7, 11)] #Retrieve only Hosp name, state and heart attack rate
arr1<-myData[complete.cases(myData[,3]),] ##Remove NAs
arr2 <- arr1[order(arr1[2], arr1[3], arr1[1]),] #sort by state, then rate and then hospital name
arr3<-transform(arr2, rank = ave(rate, State, FUN = function(x) rank(x, ties.method = "min"))) #Rank by rate within each state
我当前得到的输出是:
Hospital.Name State rate rank
SOUTH PENINSULA HOSPITAL AK 10.8 1
YUKON KUSKOKWIM DELTA REG HOSPITAL AK 11.2 2
MAT-SU REGIONAL MEDICAL CENTER AK 11.4 3
PEACEHEALTH KETCHIKAN MEDICAL CENTER AK 11.4 3
ALASKA NATIVE MEDICAL CENTER AK 11.6 5
BARTLETT REGIONAL HOSPITAL AK 11.6 5
CENTRAL PENINSULA GENERAL HOSPITAL AK 11.6 5
PROVIDENCE ALASKA MEDICAL CENTER AK 12.4 8
ALASKA REGIONAL HOSPITAL AK 13.4 9
FAIRBANKS MEMORIAL HOSPITAL AK 15.6 10
GEORGE H. LANIER MEMORIAL HOSPITAL AL 8.8 1
EVERGREEN MEDICAL CENTER AL 9.1 2
BAPTIST MEDICAL CENTER EAST AL 9.6 3
LAWRENCE MEDICAL CENTER AL 9.9 4
ANDALUSIA REGIONAL HOSPITAL AL 10.1 5
JACKSON HOSPITAL & CLINIC INC AL 10.2 6
BIRMINGHAM VA MEDICAL CENTER AL 10.4 7
FLORALA MEMORIAL HOSPITAL AL 10.4 7
GROVE HILL MEMORIAL HOSPITAL AL 10.4 7
SPRINGHILL MEDICAL CENTER AL 10.4 7
WEDOWEE HOSPITAL AL 10.4 7
PARKWAY MEDICAL CENTER AL 10.5 12
ST VINCENT'S BIRMINGHAM AL 10.6 13
WIREGRASS MEDICAL CENTER AL 10.6 13
GADSDEN REGIONAL MEDICAL CENTER AL 10.7 15
HALE COUNTY HOSPITAL AL 10.7 15
MOBILE INFIRMARY AL 10.7 15
但是我想要得到的是
Hospital.Name State rate rank
SOUTH PENINSULA HOSPITAL AK 10.8 1
YUKON KUSKOKWIM DELTA REG HOSPITAL AK 11.2 2
MAT-SU REGIONAL MEDICAL CENTER AK 11.4 3
PEACEHEALTH KETCHIKAN MEDICAL CENTER AK 11.4 4
ALASKA NATIVE MEDICAL CENTER AK 11.6 5
BARTLETT REGIONAL HOSPITAL AK 11.6 6
CENTRAL PENINSULA GENERAL HOSPITAL AK 11.6 7
PROVIDENCE ALASKA MEDICAL CENTER AK 12.4 8
ALASKA REGIONAL HOSPITAL AK 13.4 9
FAIRBANKS MEMORIAL HOSPITAL AK 15.6 10
GEORGE H. LANIER MEMORIAL HOSPITAL AL 8.8 1
EVERGREEN MEDICAL CENTER AL 9.1 2
BAPTIST MEDICAL CENTER EAST AL 9.6 3
LAWRENCE MEDICAL CENTER AL 9.9 4
ANDALUSIA REGIONAL HOSPITAL AL 10.1 5
JACKSON HOSPITAL & CLINIC INC AL 10.2 6
BIRMINGHAM VA MEDICAL CENTER AL 10.4 7
FLORALA MEMORIAL HOSPITAL AL 10.4 8
GROVE HILL MEMORIAL HOSPITAL AL 10.4 9
SPRINGHILL MEDICAL CENTER AL 10.4 10
WEDOWEE HOSPITAL AL 10.4 11
PARKWAY MEDICAL CENTER AL 10.5 12
ST VINCENT'S BIRMINGHAM AL 10.6 13
WIREGRASS MEDICAL CENTER AL 10.6 14
GADSDEN REGIONAL MEDICAL CENTER AL 10.7 15
HALE COUNTY HOSPITAL AL 10.7 16
MOBILE INFIRMARY AL 10.7 17
有什么想法吗?
推荐答案
使用data.table
相对简单:
library(data.table)
# Read only relevant columns from csv file using data.table::fread
outcome_data <- fread("outcome-of-care-measures.csv",
na.strings="Not Available" ,
select = c("Hospital.Name","State","rate"))
# Drop rows NA values using data.table::na.omit
outcome_data <- na.omit(outcome_data)
## Use data.table::setkey to sort/index by State, then rate, then hospital name
setkey(outcome_data,State,rate,Hospital.Name)
## Add a rank column by state, order within groups will be based key order above
## (the .N operator is the number of rows in each State group)
outcome_data[,rank := seq_len(.N),by = .(State)]
这篇关于在多列中对R使用排序和排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文