提取可能出现在R中数据框中任何位置的单元格和相邻单元格 [英] Extract cells and adjacent cells that may appear anywhere within a dataframe in R

查看:111
本文介绍了提取可能出现在R中数据框中任何位置的单元格和相邻单元格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中包含有关植被覆盖率和覆盖率百分比的信息,这些信息是使用四边形收集的.设置数据帧,使每一行代表一个正交.如果一个象限内有多个物种,则将它们全部列在同一行中,并始终在下一列中列出相应的百分比覆盖率.这是一个示例,物种用4个字母代码表示:

I have a dataframe that contains information about vegetation cover and percent coverage, collected using a quadrat. The dataframe is set up so that each row represents a single quadrat. If there are multiple species within one quadrat, they are all listed within the same row with respective % coverage always following in next column. Here is an example, species are represented as 4 letter codes:

问题在于,没有以任何特定顺序记录物种,并且并非所有物种都出现在每个样方中.每个四足动物也可以有任何数量的物种.我需要能够提取每种物种及其各自的覆盖范围,然后将它们放到另一个数据框中进行进一步分析.例如,上面示例数据中的物种"bope"看起来像这样:

The problem is that the species were not recorded in any particular order, and not all species occur in every quadrat. There can also be any number of species per quadrat. I need to be able to extract each species AND it’s respective coverage, and place them into another dataframe for further analysis. For Example, species "bope" from above example data would look like this:

任何帮助,我们将不胜感激. 布莱恩

Any help greatly appreciated. Brian

推荐答案

您可以通过将数据重塑为长格式,然后按行值进行过滤来实现此目的.

You could accomplish this by reshaping the data into a long format and then filtering by row values.

df = data.frame(Quadrat = 1:6, Date = seq.Date(as.Date("2014-01-01"), by = 1, length = 6), Species_1 = c("unk1", "bope", "bope", "stgu", "bg","bope"),
                covrage = sample(1:100,6), Species_2 = c("bope", "bial", "stgu", "bg","unk1", "bg"), covrage2 = sample(1:100,6))

> df
  Quadrat       Date Species_1 covrage Species_2 covrage2
1       1 2014-01-01      unk1      76      bope       63
2       2 2014-01-02      bope      82      bial       33
3       3 2014-01-03      bope      41      stgu        5
4       4 2014-01-04      stgu       6        bg       45
5       5 2014-01-05        bg      65      unk1       21
6       6 2014-01-06      bope      15        bg       96

df$Species_1 = as.character(df$Species_1)
df$Species_2 = as.character(df$Species_2)


df2 = reshape(df, varying = list(c("Species_1", "Species_2"), c("covrage", "covrage2")), v.names = c("Species", "Covrage"), direction = "long")

> df2
    Quadrat       Date time Species Covrage id
1.1       1 2014-01-01    1    unk1      76  1
2.1       2 2014-01-02    1    bope      82  2
3.1       3 2014-01-03    1    bope      41  3
4.1       4 2014-01-04    1    stgu       6  4
5.1       5 2014-01-05    1      bg      65  5
6.1       6 2014-01-06    1    bope      15  6
1.2       1 2014-01-01    2    bope      63  1
2.2       2 2014-01-02    2    bial      33  2
3.2       3 2014-01-03    2    stgu       5  3
4.2       4 2014-01-04    2      bg      45  4
5.2       5 2014-01-05    2    unk1      21  5
6.2       6 2014-01-06    2      bg      96  6

> df2[df2$Species == "bope", colnames(df2) %in% c("Quadrat", "Covrage")]
    Quadrat Covrage
2.1       2      82
3.1       3      41
6.1       6      15
1.2       1      63

这篇关于提取可能出现在R中数据框中任何位置的单元格和相邻单元格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆