根据列表中数据框的文件名将大列表中的数据子集化 [英] Subset data in a large list based on filename of the dataframes in the list

查看:62
本文介绍了根据列表中数据框的文件名将大列表中的数据子集化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理包含450个数据帧的大型列表.我将举例说明数据框的名称:

I'm working with a large list that contains 450 dataframes. I'll make an example of the names of the dataframes:

ALL_SM51_SE1_hourly, ALL_SM201_SE1_hourly, ALL_SM501_SE1_hourly
ALL_SM51_SE2_hourly, ALL_SM201_SE2_hourly, ALL_SM501_SE2_hourly
...................................................................
ALL_SM51_SE150_hourly, ALL_SM201_SE150_hourly, ALL_SM501_SE150_hourly

数据框包含在不同深度(5cm,20cm,50cm,文件名中由"SM51,SM201,SM501"表示的代表的)土壤湿度数据,并且有 150个传感器(由文件名中的"SE1,SE2,SE3,..."表示),这就是为什么我将450个数据帧存储在列表中的原因.

The dataframes contain measured soil moisture data at different depths (5cm, 20cm, 50cm, represented by "SM51, SM201, SM501" in the filenames) and there are 150 sensors (represented by the "SE1, SE2, SE3, ..." in the filename) which is why I have 450 dataframes that are stored in a list.

我想做的事情:我想为每个包含3个元素的传感器创建一个新列表(做一个子集).所以我想列出一个SE1,SE2,SE3,...,SE150以及相应的测量深度.

What I would like to do: I want to create a new list (make a subset) for each sensor that then contains 3 elements. So I wanna have a list for SE1, SE2, SE3, ..., SE150 with the corresponding measuring depths.

我已经在寻找问题的合适答案,但是我只找到了通过特定值表示子集数据的答案,但我想通过文件名来表示子集.

I already searched for an appropriate answer to my question but I only found answers that subset data by specific values but I want to subset by the filenames.

有人知道该怎么做吗?

推荐答案

使用正则表达式,您可以识别唯一的传感器 un.se ,您可以将其粘贴new.names .然后,原始列表 lst 可以拆分到唯一的传感器中,进行 ordered 并转换为 data.frame s.

Using regular expressions you may identify unique sensors un.se which you can paste to new.names. The original list lst then can be split into unique sensors, ordered and converted into data.frames.

un.se <- gsub(".*SE(\\d+).*", "\\1", names(lst))
new.names <- paste0("SE", unique(un.se))
tmp <- setNames(split(lst, un.se), paste0("SE", unique(un.se)))
res <- lapply(tmp, function(x) {
  nm <- gsub(".*SM(\\d+).*", "\\1", names(x))
  setNames(lapply(x[order(nm)], data.frame), paste0("d", gsub("1$", "", nm)))
  })

说明 gsub -regex:

Explanation gsub-regex:

在正则表达式.* 中查找任何直到字符",然后从字面上看 SE .现在,我们在括号( )中使用分组,在其中使用 \\ d + 查找一个或多个出现的数字或 d igit.在第二个 gsub 参数 \\ 1 中,对第一组(在括号中)进行向后引用,以替换整个字符串.例如.结果 un.se 是在每个字符串中每个 SE 之后找到的数字(请参阅: https://regex101.com/r/zuO8Ts/1 ;请注意,我们需要在R中使用两次转义 \\ .

In the regex .* looks for any "character-until", then we have SE literally. Now we use grouping inside parentheses ( ), where we look with \\d+ for one or more occurrences of a number or digit. In the second gsub-argument \\1 does a back-reference on the first group (that in the parentheses) to replace the whole string. E.g. resulting un.se is the number found after each SE in each string (see: https://regex101.com/r/zuO8Ts/1; and note that we need double escapes \\ in R).

这会在子列表中为每个传感器列出每个深度的数据帧.

This lists each sensor with data frames for each depth in sublists.

res
# $SE1
# $SE1$d5
#   x1 x2 x3
# 1  1  2  3
# 
# $SE1$d20
#   x1 x2 x3
# 1  1  2  3
# 
# $SE1$d50
#   x1 x2 x3
# 1  1  2  3
# 
# 
# $SE2
# $SE2$d5
#   x1 x2 x3
# 1  1  2  3
# 
# $SE2$d20
#   x1 x2 x3
# 1  1  2  3
# 
# $SE2$d50
#   x1 x2 x3
# 1  1  2  3


玩具数据

lst <- list(ALL_SM51_SE1_hourly = list(x1 = 1, x2 = 2, x3 = 3), ALL_SM201_SE1_hourly = list(
    x1 = 1, x2 = 2, x3 = 3), ALL_SM501_SE1_hourly = list(x1 = 1, 
    x2 = 2, x3 = 3), ALL_SM51_SE2_hourly = list(x1 = 1, x2 = 2, 
    x3 = 3), ALL_SM201_SE2_hourly = list(x1 = 1, x2 = 2, x3 = 3), 
    ALL_SM501_SE2_hourly = list(x1 = 1, x2 = 2, x3 = 3))

这篇关于根据列表中数据框的文件名将大列表中的数据子集化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆