基于公共列但长度不一致的合并.csvs [英] merge .csvs based on common column but of inconsistent length

查看:90
本文介绍了基于公共列但长度不一致的合并.csvs的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下午(或早上,晚上)

我正在尝试合并几个布局相似的 .csv 文件,它们在一个列( character )中有一个类,并且有大量( num).

I am trying to merge several .csv files that have a similar layout, they have a class in one column (character) and an abundance (num) in another.

当作为 data.frame 导入时,示例为:

When imported as a data.frame example would be:

print(one[1:5,])
  X                            Class Abundance_inds
1 1                      Chaetognath              2
2 2     Copepod_Calanoid_Acartia_spp              9
3 3 Copepod_Calanoid_Centropages_spp              4
4 4      Copepod_Calanoid_Temora_spp              1
5 5         Copepod_Calanoid_Unknown             55 

class列(行数和顺序)根据找到的内容更改每个csv,我想根据class列绑定多个(30+)csv,我有以下内容(我确定是在前一段时间工作的....)

The class column (number of rows and order) changes every csv based on what was found and I want to bind several (30+) csvs based on the class column, I had the following (which I am sure was working a while ago.....):

DensityFiles <- list.files(CSVdirectory,
                           pattern = '.csv',
                           full.names = T)

Combined <- rbindlist(
  lapply(
    DensityFiles,
    fread),
  fill = TRUE,
  use.names = TRUE)

这将产生以下结果:

str(Combined)    
Classes ‘data.table’ and 'data.frame':  461 obs. of  3 variables:

不完全是我的追求!我正在寻找以下内容:

not quite what I was after! I am looking for the following:

> print(example)
    X                            Class CSV.NAME CSV.NAME.1
1   1                   Bivalve_Larvae        1          3
2   2                   Bryozoa_Larvae        4          6
3   3                      Chaetognath       NA          7
4   4                         Cnidaria        1          8
5   5     Copepod_Calanoid_Acartia_spp       22         NA
6   6     Copepod_Calanoid_Calanus_spp       24          4
7   7     Copepod_Calanoid_Candacia_sp        5          3
8   8 Copepod_Calanoid_Centropages_spp       41          2
9   9      Copepod_Calanoid_Temora_spp       39          8
10 10         Copepod_Calanoid_Unknown      458         NA
11 11  Copepod_Cyclopoid_Corycaeus_spp       46         NA
12 12    Copepod_Cyclopoid_Oithona_spp       NA          4
13 13     Copepod_Cyclopoid_Oncaea_spp       NA          7
14 14             Copepod_Harpacticoid       36         NA
15 15                  Copepod_Nauplii       12          9

在使用时,我可以使用 idcol ="origin" 将CSV名称添加到列标题中 data.table libary rbindlist .但不确定是否适用于所有解决方案.

I can get the CSV name into the column header using idcol = "origin" when using data.table libary rbindlist. but not sure if this works for all solutions.

我四处游历,但大多数示例似乎都在处理一致数量的行,

I have had a good hunt around but most examples seem to be dealing with a consistent number of rows,

任何帮助将不胜感激!

吉姆

推荐答案

您可以使用 reader bind_rows

library(dplyr)
library(readr)
df <- do.call(bind_rows, lapply(DensityFiles,read_csv))

这篇关于基于公共列但长度不一致的合并.csvs的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆