在多列中对R使用排序和排序 [英] Using sort and rank in R on multiple columns

查看:206
本文介绍了在多列中对R使用排序和排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试按各州的最低费率对我的医院名称进行排名. 当多个医院的费率相同时,应通过使用医院名称并按字母顺序对它们进行打结.到目前为止,我已经设法在州内对费用进行了排名,并按医院名称对其进行了排序,但是我无法弄清楚如何打破联系并对其进行排名而不跳过数字

I’m trying to rank my hospital name by lowest rate for each state. When multiple hospitals have the same rate, the tie should be broken by using the hospital name and sorting it alphabetically. So far I’ve managed to rank it by rate within the state sorting it by hospital name, but I can’t figure out how to break the ties and rank it without skipping numbers

这是我到目前为止通过使用以下代码获得的信息:

This is what I’ve got so far by using the following code:

outcome_data <- read.csv("outcome-of-care-measures.csv", na.strings="Not Available" ,stringsAsFactors=FALSE) #Read csv file
myData = outcome_data[,c(2, 7, 11)] #Retrieve only Hosp name, state and heart attack rate
arr1<-myData[complete.cases(myData[,3]),]  ##Remove NAs
arr2 <- arr1[order(arr1[2], arr1[3], arr1[1]),] #sort by state, then rate and then hospital name
arr3<-transform(arr2, rank = ave(rate, State, FUN = function(x) rank(x, ties.method = "min"))) #Rank by rate within each state

我当前得到的输出是:

Hospital.Name                           State  rate  rank
SOUTH PENINSULA HOSPITAL                AK     10.8  1
YUKON KUSKOKWIM DELTA REG HOSPITAL      AK     11.2  2
MAT-SU REGIONAL MEDICAL CENTER          AK     11.4  3
PEACEHEALTH KETCHIKAN MEDICAL CENTER    AK     11.4  3
ALASKA NATIVE MEDICAL CENTER            AK     11.6  5
BARTLETT REGIONAL HOSPITAL              AK     11.6  5
CENTRAL PENINSULA GENERAL HOSPITAL      AK     11.6  5
PROVIDENCE ALASKA MEDICAL CENTER        AK     12.4  8
ALASKA REGIONAL HOSPITAL                AK     13.4  9
FAIRBANKS MEMORIAL HOSPITAL             AK     15.6  10
GEORGE H. LANIER MEMORIAL HOSPITAL      AL     8.8   1
EVERGREEN MEDICAL CENTER                AL     9.1   2
BAPTIST MEDICAL CENTER EAST             AL     9.6   3
LAWRENCE MEDICAL CENTER                 AL     9.9   4
ANDALUSIA REGIONAL HOSPITAL             AL     10.1  5
JACKSON HOSPITAL & CLINIC INC           AL     10.2  6
BIRMINGHAM VA MEDICAL CENTER            AL     10.4  7
FLORALA MEMORIAL HOSPITAL               AL     10.4  7
GROVE HILL MEMORIAL HOSPITAL            AL     10.4  7
SPRINGHILL MEDICAL CENTER               AL     10.4  7
WEDOWEE HOSPITAL                        AL     10.4  7
PARKWAY MEDICAL CENTER                  AL     10.5  12
ST VINCENT'S BIRMINGHAM                 AL     10.6  13
WIREGRASS MEDICAL CENTER                AL     10.6  13
GADSDEN REGIONAL MEDICAL CENTER         AL     10.7  15
HALE COUNTY HOSPITAL                    AL     10.7  15
MOBILE INFIRMARY                        AL     10.7  15

但是我想要得到的是

Hospital.Name                           State  rate  rank
SOUTH PENINSULA HOSPITAL                AK     10.8  1
YUKON KUSKOKWIM DELTA REG HOSPITAL      AK     11.2  2
MAT-SU REGIONAL MEDICAL CENTER          AK     11.4  3
PEACEHEALTH KETCHIKAN MEDICAL CENTER    AK     11.4  4
ALASKA NATIVE MEDICAL CENTER            AK     11.6  5
BARTLETT REGIONAL HOSPITAL              AK     11.6  6
CENTRAL PENINSULA GENERAL HOSPITAL      AK     11.6  7
PROVIDENCE ALASKA MEDICAL CENTER        AK     12.4  8
ALASKA REGIONAL HOSPITAL                AK     13.4  9
FAIRBANKS MEMORIAL HOSPITAL             AK     15.6  10
GEORGE H. LANIER MEMORIAL HOSPITAL      AL     8.8   1
EVERGREEN MEDICAL CENTER                AL     9.1   2
BAPTIST MEDICAL CENTER EAST             AL     9.6   3
LAWRENCE MEDICAL CENTER                 AL     9.9   4
ANDALUSIA REGIONAL HOSPITAL             AL     10.1  5
JACKSON HOSPITAL & CLINIC INC           AL     10.2  6
BIRMINGHAM VA MEDICAL CENTER            AL     10.4  7
FLORALA MEMORIAL HOSPITAL               AL     10.4  8
GROVE HILL MEMORIAL HOSPITAL            AL     10.4  9
SPRINGHILL MEDICAL CENTER               AL     10.4  10
WEDOWEE HOSPITAL                        AL     10.4  11
PARKWAY MEDICAL CENTER                  AL     10.5  12
ST VINCENT'S BIRMINGHAM                 AL     10.6  13
WIREGRASS MEDICAL CENTER                AL     10.6  14
GADSDEN REGIONAL MEDICAL CENTER         AL     10.7  15
HALE COUNTY HOSPITAL                    AL     10.7  16
MOBILE INFIRMARY                        AL     10.7  17

有什么想法吗?

推荐答案

使用data.table相对简单:

library(data.table)

# Read only relevant columns from csv file using data.table::fread
outcome_data <- fread("outcome-of-care-measures.csv",
                      na.strings="Not Available" ,
                      select = c("Hospital.Name","State","rate"))

# Drop rows NA values using data.table::na.omit
outcome_data <- na.omit(outcome_data)

## Use data.table::setkey to sort/index by State, then rate, then hospital name
setkey(outcome_data,State,rate,Hospital.Name)

## Add a rank column by state, order within groups will be based key order above
## (the .N operator is the number of rows in each State group)
outcome_data[,rank := seq_len(.N),by = .(State)]

这篇关于在多列中对R使用排序和排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆