如何按行和列查找>栏名? [英] How to lookup by row and column > column names?

查看:106
本文介绍了如何按行和列查找>栏名?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在考虑如何按大学名称查找时间数据(第一行: A ,..., F ),字段名称(第一列: Acute ,..., En )和/或毕业时间(时间)在以下文件 DS.csv 中。
我正在考虑 dplyr 方法,但无法扩展数字ID查找(线程答案如何重载R中的函数参数)到三个变量的查找。
挑战


  1. 如何通过第一行进行查找?也许,类似于 $ 1 ==A

  2. 如何将大学查询扩展为两列?伪码 $ 1 ==A是关于第二和第三列,..., $ 1 ==F大约两个最后一列。

  3. 按3查找标准进行查找:第一行(无标题),标题为字段的第一列, 时间。伪代码

      times<  -  getTimes($ 1 ==A,Field ==Ane,by =desc时间))


数据 DS.csv 具有数据。第一列表示实验。以下数据是交叉表格式,使得

 ,A ,, B ,, C ,, D ,, E ,, F ,
字段,时间,T,时间,T,时间,T,时间,T,时间,T,时间,T
急性,0,0,8.3,1,7.5,1,8.6, 2,0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
En,15.6,2,12.9,1, 0,0,0,0,14,3,1,14.6,4
Fo,9.2,2,0,0,5,4,1,0,0,0,0,7.9,3

并以直板格式使得

 code>字段,时间,T,实验
Acut,0,0,A
An,9,120,A
En,15.6,2,A
Fo,9.2 ,2,A
急性,8.3,1,B
An,7.7,26,B
En,12.9,1,B
Fo,0,0,B
急性,7.5,1,C
An,7.9,43,C
En,0,0,C
Fo,5.4,1,C
急性,8.6, 2,D
An,7.8,77,D
En,0,0,D
Fo,0,0,D
急性,0,0,E
An,7.9,60,E
En,14.3,1,E
Fo,0,0,E
急性,8.3,4,F
An,8.2,326 ,F
En,14.6,4,F
Fo,7.9,3,F

Pseudocode

  library('dplyr')
o w< - options(warn)$​​ b $ b DF< - read.csv(/ home / masi / CSV / DS.csv,header = T)

#查找第一行,按字段查找,按字段的第一列进行查找?
次< - getTimes($ 1 ==A,Field ==Ane,by =desc(time))

预期输出: 9

预期输出一般化: a b c ,...

  ##数据中标有小字母a,b,c,...的值需要
#uni1 uni2 ...
#time T time ...
#Field1 ac
#Field2 b ...
#... ...

R:3.3.3(2017-03-06)

操作系统:Debian 8.7

硬件:华硕Zenbook UX303UA

解决方案

以您的初始原始数据为起点:

 #读取数据&跳过1&只包含标题信息的第二行
DF< - read.csv(text =,A ,, B ,, C ,, D ,, E ,, F,
字段,时间,时间,T,时间,T,时间,T,时间,T,时间,T
急性,0,0,8.3,1,7.5,1,8.6,2,0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
恩,15.6,2,12.9,1,0,0,0,0,14.3,1 ,14.6,4
Fo,9.2,2,0,0,5.4,1,0,0,0,0,7.9,3,header = FALSE,stringsAsFactors = FALSE,skip = 2)

#读取包含头信息的前两行
headers< - read.csv(text =,A ,, B ,, C ,, D ,, E ,, F,
字段,时间,T,时间,T,时间,T,时间,T,时间,T,时间,T
急性,0,0,8.3,1,7.5,1,8.6,2, 0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
En,15.6,2,12.9,1,0, 0,0,0,14,3,1,14.6,4
Fo,9.2,2,0,0,5,4,1,0,0,0,0,7.9,3,header = FALSE,stringsAsFactors = FALSE,nrow = 2)

#提取头数据的大学名称。框
大学< - unlist(headers [1,])
university< - 大学[university!='']

#从中创建列名标题'data.frame
vec< - headers [2,] [headers [2,] =='T']
headers [2,] [headers [2,] =='T' ]< - paste0(vec,seq_along(vec))
名称(DF)< - paste0(headers [2,],headers [1,])
/ pre>

您的数据框现在看起来如下:

 > DF 
现场时间A T1时间B T2时间C T3时间D时间T时间T5时间T6
1:急性0.0 0 8.3 1 7.5 1 8.6 2 0.0 0 8.3 4
2:Ane 9.0 120 7.7 26 7.9 43 7.8 77 7.9 60 8.2 326
3:En 15.6 2 12.9 1 0.0 0 0.0 0 14.3 1 14.6 4
4:Fo 9.2 2 0.0 0 5.4 1 0.0 0 0.0 0 7.9 3

最好将数据转换成长格式:

  library(data.table)
DT< - melt(setDT(DF),id = 1,
measure.vars = patterns('^ time' ^ T'),
variable.name ='university',
value.name = c('time','t')
)[,大学:=大学] []

现在您的数据看起来像:

 > DT 
现场大学时间t
1:急性A 0.0 0
2:Ane A 9.0 120
3:En A 15.6 2
4:Fo A 9.2 2
5:急性B 8.3 1
6:Ane B 7.7 26
7:En B 12.9 1
8:Fo B 0.0 0
9:急性C 7.5 1
10:Ane C 7.9 43
11:En C 0.0 0
12:Fo C 5.4 1
13:急性D 8.6 2
14:Ane D 7.8 77
15:En D 0.0 0
16:Fo D 0.0 0
17:急性E 0.0 0
18:Ane E 7.9 60
19:En E 14.3 1
20:Fo E 0.0 0
21:急性F 8.3 4
22:Ane F 8.2 326
23:En F 14.6 4
24:Fo F 7.9 3

现在您可以选择所需的信息:

  DT [大学=='A &安培;字段=='Ane'] 

其中:

 现场大学时间t 
1:Ane A 9 120






几个 dplyr 过滤数据的示例:



$($)
DT%>%
过滤器(Field ==En& t> 1)

给出:

 字段大学时间t 
1 En A 15.6 2
2 En F 14.6 4

或者:

  DT%>%
arrange(desc(time))%>%
过滤器(时间< 14> 3)

给出:

 现场大学时间t 
1 Ane A 9.0 120
2急性F 8.3 4
3 Ane F 8.2 326
4 Ane C 7.9 43
5 Ane E 7.9 60
6 Ane D 7.8 77
7 Ane B 7.7 26


I am thinking how to lookup time data by University name (first row: A,...,F), Field name (first column: Acute,...,En) and/or graduation time (time) in the following file DS.csv. I am thinking dplyr approach but could not expand numerical ID lookup (thread answer How to overload function parameters in R?) to the lookup by three variables. Challenges

  1. How to lookup by the first row? Maybe, something similar to $1 == "A".
  2. How to Expand university lookup to two columns? Pseudocode $1 == "A" is about the second and third column, ..., $1 == "F" about two last columns.
  3. Do lookup by 3 lookup criterias: first row (no header), first column with header Field and for the header time. Pseudocode

    times <- getTimes($1 == "A", Field == "Ane", by = "desc(time)")
    

Data DS.csv has the data. The first column denotes experiment. The data below is in crosstab format such that

,A,,B,,C,,D,,E,,F,
Field,time,T,time,T,time,T,time,T,time,T,time,T
Acute,0,0,8.3,1,7.5,1,8.6,2,0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
En,15.6,2,12.9,1,0,0,0,0,14.3,1,14.6,4
Fo,9.2,2,0,0,5.4,1,0,0,0,0,7.9,3

and in the straight table format such that

Field,time,T,Experiment       
Acut,0,0,A
An,9,120,A
En,15.6,2,A
Fo,9.2,2,A
Acute,8.3,1,B       
An,7.7,26,B
En,12.9,1,B
Fo,0,0,B  
Acute,7.5,1,C       
An,7.9,43,C
En,0,0,C  
Fo,5.4,1,C
Acute,8.6,2,D       
An,7.8,77,D
En,0,0,D  
Fo,0,0,D  
Acute,0,0,E         
An,7.9,60,E
En,14.3,1,E
Fo,0,0,E  
Acute,8.3,4,F       
An,8.2,326,F
En,14.6,4,F
Fo,7.9,3,F

Pseudocode

library('dplyr')
ow <- options("warn")
DF <- read.csv("/home/masi/CSV/DS.csv", header = T)

# Lookup by first row, Lookup by Field, lookup by Field's first column?
times <- getTimes($1 == "A", Field == "Ane", by = "desc(time)")

Expected output: 9
Expected output generalised: a, b, c, ...

## Data where values marked by small letters a,b,c, ... are wanted 
#       uni1    uni2 ... 
#       time T  time T ...
#Field1 a       c
#Field2 b       ...
#...    ...

R: 3.3.3 (2017-03-06)
OS: Debian 8.7
Hardware: Asus Zenbook UX303UA

解决方案

Taking your initial raw data as starting point:

# read the data & skip 1st & 2nd line which contain only header information
DF <- read.csv(text=",A,,B,,C,,D,,E,,F,
Field,time,T,time,T,time,T,time,T,time,T,time,T
Acute,0,0,8.3,1,7.5,1,8.6,2,0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
En,15.6,2,12.9,1,0,0,0,0,14.3,1,14.6,4
Fo,9.2,2,0,0,5.4,1,0,0,0,0,7.9,3", header=FALSE, stringsAsFactors=FALSE, skip=2)

# read the first two lines which contain the header information
headers <- read.csv(text=",A,,B,,C,,D,,E,,F,
Field,time,T,time,T,time,T,time,T,time,T,time,T
Acute,0,0,8.3,1,7.5,1,8.6,2,0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
En,15.6,2,12.9,1,0,0,0,0,14.3,1,14.6,4
Fo,9.2,2,0,0,5.4,1,0,0,0,0,7.9,3", header=FALSE, stringsAsFactors=FALSE, nrow=2)

# extract the university names for the 'headers' data.frame
universities <- unlist(headers[1,])
universities <- universities[universities != '']

# create column names from the 'headers' data.frame
vec <- headers[2,][headers[2,] == 'T']
headers[2,][headers[2,] == 'T'] <- paste0(vec, seq_along(vec))
names(DF) <- paste0(headers[2,],headers[1,])

You dataframe now looks as follows:

> DF
   Field timeA  T1 timeB T2 timeC T3 timeD T4 timeE T5 timeF  T6
1: Acute   0.0   0   8.3  1   7.5  1   8.6  2   0.0  0   8.3   4
2:   Ane   9.0 120   7.7 26   7.9 43   7.8 77   7.9 60   8.2 326
3:    En  15.6   2  12.9  1   0.0  0   0.0  0  14.3  1  14.6   4
4:    Fo   9.2   2   0.0  0   5.4  1   0.0  0   0.0  0   7.9   3

As it is better to transform you data into long format:

library(data.table)
DT <- melt(setDT(DF), id = 1, 
           measure.vars = patterns('^time','^T'),
           variable.name = 'university', 
           value.name = c('time','t')
           )[, university := universities[university]][]

Now your data looks like:

> DT
    Field university time   t
 1: Acute          A  0.0   0
 2:   Ane          A  9.0 120
 3:    En          A 15.6   2
 4:    Fo          A  9.2   2
 5: Acute          B  8.3   1
 6:   Ane          B  7.7  26
 7:    En          B 12.9   1
 8:    Fo          B  0.0   0
 9: Acute          C  7.5   1
10:   Ane          C  7.9  43
11:    En          C  0.0   0
12:    Fo          C  5.4   1
13: Acute          D  8.6   2
14:   Ane          D  7.8  77
15:    En          D  0.0   0
16:    Fo          D  0.0   0
17: Acute          E  0.0   0
18:   Ane          E  7.9  60
19:    En          E 14.3   1
20:    Fo          E  0.0   0
21: Acute          F  8.3   4
22:   Ane          F  8.2 326
23:    En          F 14.6   4
24:    Fo          F  7.9   3

Now you can select the required info:

 DT[university == 'A' & Field == 'Ane']

which gives:

   Field university time   t
1:   Ane          A    9 120


Several dplyr examples to filter the data:

library(dplyr)
DT %>% 
  filter(Field=="En" & t > 1)

gives:

  Field university time t
1    En          A 15.6 2
2    En          F 14.6 4

Or:

DT %>%
  arrange(desc(time)) %>%
  filter(time < 14 & t > 3)

gives:

  Field university time   t
1   Ane          A  9.0 120
2 Acute          F  8.3   4
3   Ane          F  8.2 326
4   Ane          C  7.9  43
5   Ane          E  7.9  60
6   Ane          D  7.8  77
7   Ane          B  7.7  26

这篇关于如何按行和列查找&gt;栏名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆