如何按行和列查找>栏名? [英] How to lookup by row and column > column names?
问题描述
我正在考虑如何按大学名称查找时间数据(第一行: A
,..., F
),字段名称(第一列: Acute
,..., En
)和/或毕业时间(时间
)在以下文件 DS.csv
中。
我正在考虑 dplyr
方法,但无法扩展数字ID查找(线程答案如何重载R中的函数参数)到三个变量的查找。
挑战
- 如何通过第一行进行查找?也许,类似于
$ 1 ==A
。 - 如何将大学查询扩展为两列?伪码
$ 1 ==A
是关于第二和第三列,...,$ 1 ==F
大约两个最后一列。 -
按3查找标准进行查找:第一行(无标题),标题为
字段
的第一列,时间
。伪代码times< - getTimes($ 1 ==A,Field ==Ane,by =desc时间))
数据 DS.csv
具有数据。第一列表示实验。以下数据是交叉表格式,使得
,A ,, B ,, C ,, D ,, E ,, F ,
字段,时间,T,时间,T,时间,T,时间,T,时间,T,时间,T
急性,0,0,8.3,1,7.5,1,8.6, 2,0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
En,15.6,2,12.9,1, 0,0,0,0,14,3,1,14.6,4
Fo,9.2,2,0,0,5,4,1,0,0,0,0,7.9,3
并以直板格式使得
code>字段,时间,T,实验
Acut,0,0,A
An,9,120,A
En,15.6,2,A
Fo,9.2 ,2,A
急性,8.3,1,B
An,7.7,26,B
En,12.9,1,B
Fo,0,0,B
急性,7.5,1,C
An,7.9,43,C
En,0,0,C
Fo,5.4,1,C
急性,8.6, 2,D
An,7.8,77,D
En,0,0,D
Fo,0,0,D
急性,0,0,E
An,7.9,60,E
En,14.3,1,E
Fo,0,0,E
急性,8.3,4,F
An,8.2,326 ,F
En,14.6,4,F
Fo,7.9,3,F
Pseudocode
library('dplyr')
o w< - options(warn)$ b $ b DF< - read.csv(/ home / masi / CSV / DS.csv,header = T)
#查找第一行,按字段查找,按字段的第一列进行查找?
次< - getTimes($ 1 ==A,Field ==Ane,by =desc(time))
预期输出: 9
预期输出一般化: a
, b
, c
,...
##数据中标有小字母a,b,c,...的值需要
#uni1 uni2 ...
#time T time ...
#Field1 ac
#Field2 b ...
#... ...
R:3.3.3(2017-03-06)
操作系统:Debian 8.7
硬件:华硕Zenbook UX303UA
以您的初始原始数据为起点:
#读取数据&跳过1&只包含标题信息的第二行
/ pre>
DF< - read.csv(text =,A ,, B ,, C ,, D ,, E ,, F,
字段,时间,时间,T,时间,T,时间,T,时间,T,时间,T
急性,0,0,8.3,1,7.5,1,8.6,2,0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
恩,15.6,2,12.9,1,0,0,0,0,14.3,1 ,14.6,4
Fo,9.2,2,0,0,5.4,1,0,0,0,0,7.9,3,header = FALSE,stringsAsFactors = FALSE,skip = 2)
#读取包含头信息的前两行
headers< - read.csv(text =,A ,, B ,, C ,, D ,, E ,, F,
字段,时间,T,时间,T,时间,T,时间,T,时间,T,时间,T
急性,0,0,8.3,1,7.5,1,8.6,2, 0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
En,15.6,2,12.9,1,0, 0,0,0,14,3,1,14.6,4
Fo,9.2,2,0,0,5,4,1,0,0,0,0,7.9,3,header = FALSE,stringsAsFactors = FALSE,nrow = 2)
#提取头数据的大学名称。框
大学< - unlist(headers [1,])
university< - 大学[university!='']
#从中创建列名标题'data.frame
vec< - headers [2,] [headers [2,] =='T']
headers [2,] [headers [2,] =='T' ]< - paste0(vec,seq_along(vec))
名称(DF)< - paste0(headers [2,],headers [1,])
您的数据框现在看起来如下:
> DF
现场时间A T1时间B T2时间C T3时间D时间T时间T5时间T6
1:急性0.0 0 8.3 1 7.5 1 8.6 2 0.0 0 8.3 4
2:Ane 9.0 120 7.7 26 7.9 43 7.8 77 7.9 60 8.2 326
3:En 15.6 2 12.9 1 0.0 0 0.0 0 14.3 1 14.6 4
4:Fo 9.2 2 0.0 0 5.4 1 0.0 0 0.0 0 7.9 3
最好将数据转换成长格式:
library(data.table)
DT< - melt(setDT(DF),id = 1,
measure.vars = patterns('^ time' ^ T'),
variable.name ='university',
value.name = c('time','t')
)[,大学:=大学] []
现在您的数据看起来像:
> DT
现场大学时间t
1:急性A 0.0 0
2:Ane A 9.0 120
3:En A 15.6 2
4:Fo A 9.2 2
5:急性B 8.3 1
6:Ane B 7.7 26
7:En B 12.9 1
8:Fo B 0.0 0
9:急性C 7.5 1
10:Ane C 7.9 43
11:En C 0.0 0
12:Fo C 5.4 1
13:急性D 8.6 2
14:Ane D 7.8 77
15:En D 0.0 0
16:Fo D 0.0 0
17:急性E 0.0 0
18:Ane E 7.9 60
19:En E 14.3 1
20:Fo E 0.0 0
21:急性F 8.3 4
22:Ane F 8.2 326
23:En F 14.6 4
24:Fo F 7.9 3
现在您可以选择所需的信息:
DT [大学=='A &安培;字段=='Ane']
其中:
现场大学时间t
1:Ane A 9 120
几个
dplyr
过滤数据的示例:
$($)
DT%>%
过滤器(Field ==En& t> 1)
给出:
字段大学时间t
1 En A 15.6 2
2 En F 14.6 4
或者:
DT%>%
arrange(desc(time))%>%
过滤器(时间< 14> 3)
给出:
现场大学时间t
1 Ane A 9.0 120
2急性F 8.3 4
3 Ane F 8.2 326
4 Ane C 7.9 43
5 Ane E 7.9 60
6 Ane D 7.8 77
7 Ane B 7.7 26
I am thinking how to lookup time data by University name (first row: A
,...,F
), Field name (first column: Acute
,...,En
) and/or graduation time (time
) in the following file DS.csv
.
I am thinking dplyr
approach but could not expand numerical ID lookup (thread answer How to overload function parameters in R?) to the lookup by three variables.
Challenges
- How to lookup by the first row? Maybe, something similar to
$1 == "A"
. - How to Expand university lookup to two columns? Pseudocode
$1 == "A"
is about the second and third column, ...,$1 == "F"
about two last columns. Do lookup by 3 lookup criterias: first row (no header), first column with header
Field
and for the headertime
. Pseudocodetimes <- getTimes($1 == "A", Field == "Ane", by = "desc(time)")
Data DS.csv
has the data. The first column denotes experiment. The data below is in crosstab format such that
,A,,B,,C,,D,,E,,F,
Field,time,T,time,T,time,T,time,T,time,T,time,T
Acute,0,0,8.3,1,7.5,1,8.6,2,0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
En,15.6,2,12.9,1,0,0,0,0,14.3,1,14.6,4
Fo,9.2,2,0,0,5.4,1,0,0,0,0,7.9,3
and in the straight table format such that
Field,time,T,Experiment
Acut,0,0,A
An,9,120,A
En,15.6,2,A
Fo,9.2,2,A
Acute,8.3,1,B
An,7.7,26,B
En,12.9,1,B
Fo,0,0,B
Acute,7.5,1,C
An,7.9,43,C
En,0,0,C
Fo,5.4,1,C
Acute,8.6,2,D
An,7.8,77,D
En,0,0,D
Fo,0,0,D
Acute,0,0,E
An,7.9,60,E
En,14.3,1,E
Fo,0,0,E
Acute,8.3,4,F
An,8.2,326,F
En,14.6,4,F
Fo,7.9,3,F
Pseudocode
library('dplyr')
ow <- options("warn")
DF <- read.csv("/home/masi/CSV/DS.csv", header = T)
# Lookup by first row, Lookup by Field, lookup by Field's first column?
times <- getTimes($1 == "A", Field == "Ane", by = "desc(time)")
Expected output: 9
Expected output generalised: a
, b
, c
, ...
## Data where values marked by small letters a,b,c, ... are wanted
# uni1 uni2 ...
# time T time T ...
#Field1 a c
#Field2 b ...
#... ...
R: 3.3.3 (2017-03-06)
OS: Debian 8.7
Hardware: Asus Zenbook UX303UA
Taking your initial raw data as starting point:
# read the data & skip 1st & 2nd line which contain only header information
DF <- read.csv(text=",A,,B,,C,,D,,E,,F,
Field,time,T,time,T,time,T,time,T,time,T,time,T
Acute,0,0,8.3,1,7.5,1,8.6,2,0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
En,15.6,2,12.9,1,0,0,0,0,14.3,1,14.6,4
Fo,9.2,2,0,0,5.4,1,0,0,0,0,7.9,3", header=FALSE, stringsAsFactors=FALSE, skip=2)
# read the first two lines which contain the header information
headers <- read.csv(text=",A,,B,,C,,D,,E,,F,
Field,time,T,time,T,time,T,time,T,time,T,time,T
Acute,0,0,8.3,1,7.5,1,8.6,2,0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
En,15.6,2,12.9,1,0,0,0,0,14.3,1,14.6,4
Fo,9.2,2,0,0,5.4,1,0,0,0,0,7.9,3", header=FALSE, stringsAsFactors=FALSE, nrow=2)
# extract the university names for the 'headers' data.frame
universities <- unlist(headers[1,])
universities <- universities[universities != '']
# create column names from the 'headers' data.frame
vec <- headers[2,][headers[2,] == 'T']
headers[2,][headers[2,] == 'T'] <- paste0(vec, seq_along(vec))
names(DF) <- paste0(headers[2,],headers[1,])
You dataframe now looks as follows:
> DF
Field timeA T1 timeB T2 timeC T3 timeD T4 timeE T5 timeF T6
1: Acute 0.0 0 8.3 1 7.5 1 8.6 2 0.0 0 8.3 4
2: Ane 9.0 120 7.7 26 7.9 43 7.8 77 7.9 60 8.2 326
3: En 15.6 2 12.9 1 0.0 0 0.0 0 14.3 1 14.6 4
4: Fo 9.2 2 0.0 0 5.4 1 0.0 0 0.0 0 7.9 3
As it is better to transform you data into long format:
library(data.table)
DT <- melt(setDT(DF), id = 1,
measure.vars = patterns('^time','^T'),
variable.name = 'university',
value.name = c('time','t')
)[, university := universities[university]][]
Now your data looks like:
> DT
Field university time t
1: Acute A 0.0 0
2: Ane A 9.0 120
3: En A 15.6 2
4: Fo A 9.2 2
5: Acute B 8.3 1
6: Ane B 7.7 26
7: En B 12.9 1
8: Fo B 0.0 0
9: Acute C 7.5 1
10: Ane C 7.9 43
11: En C 0.0 0
12: Fo C 5.4 1
13: Acute D 8.6 2
14: Ane D 7.8 77
15: En D 0.0 0
16: Fo D 0.0 0
17: Acute E 0.0 0
18: Ane E 7.9 60
19: En E 14.3 1
20: Fo E 0.0 0
21: Acute F 8.3 4
22: Ane F 8.2 326
23: En F 14.6 4
24: Fo F 7.9 3
Now you can select the required info:
DT[university == 'A' & Field == 'Ane']
which gives:
Field university time t
1: Ane A 9 120
Several dplyr
examples to filter the data:
library(dplyr)
DT %>%
filter(Field=="En" & t > 1)
gives:
Field university time t
1 En A 15.6 2
2 En F 14.6 4
Or:
DT %>%
arrange(desc(time)) %>%
filter(time < 14 & t > 3)
gives:
Field university time t
1 Ane A 9.0 120
2 Acute F 8.3 4
3 Ane F 8.2 326
4 Ane C 7.9 43
5 Ane E 7.9 60
6 Ane D 7.8 77
7 Ane B 7.7 26
这篇关于如何按行和列查找>栏名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!