tolower 函数并合并两个数据帧 [英] tolower function and merging two dataframes

查看:58
本文介绍了tolower 函数并合并两个数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我分别调用了 3 个数据框:barometre2013、barometre2016、barometre2018.

I have 3 dataframes called respectively: barometre2013, barometre2016, barometre2018.

我已经像这样合并了 barometre2018 和 barometre2016:

I've already merge barometre2018 and barometre2016 like this:

baro1618 <- merge(barometre2016, barometre2018, all = TRUE)

一切都很好,我有两个数据帧的所有行,并且相同的列名与两个数据帧的所有行合并为一个.正是我想要的.

All was good, I have all rows of the two dataframes and the columns names that are the same are merged in one with all rows of the tow dataframes. Exactly what I wanted.

合并后的表格如下所示:

The merged table looks like this:

names(baro1618)
    [1] "q0qc"           "regio"          "sexe"           "age"            "langu"          "q1a_1"          "q1a_2"          "q1a_3"          "q1a_4"          "q1a_5"         
    [11] "q1a_6"          "q1a_7"          "q1a_8"          "q1a_9"          "q1a_10"         "q1b_1"          "q1b_2"          "q1b_3"          "q1b_4"          "q1b_5"         
    [21] "q1b_6"          "q1b_7"          "q1b_8"          "q1b_9"          "q1b_10"

现在,我的问题从这里开始.

NOW, my problem start here.

我想将 baro1618 与 barometre2013 合并,但在此之前我必须将所有列名称小写,因为当我尝试合并而不这样做时,barometre2013 的大写列与小写 baro1618 的名称相同未合并.

I want to merge baro1618 with barometre2013, but before doing that I have to lower case all the columns names because when I tried to merge without doing this, the columns in uppercase of barometre2013 that have the same name in lower case baro1618 weren't merged.

df barometre2013 看起来像这样:

The df barometre2013 looks like this:

names(barometre2013)
    [229] "POND"        "Q1A_1"       "Q1A_2"       "Q1A_3"       "Q1A_4"       "Q1A_5"       "Q1A_6"       "Q1A_7"       "Q1A_8"       "Q1A_9"       "Q1A_10"      "Q1B_1"      
    [241] "Q1B_2"       "Q1B_3"       "Q1B_4"       "Q1B_5"       "Q1B_6"       "Q1B_7"       "Q1B_8"       "Q1B_9"       "Q1B_10"      "Q5A_1"       "Q5A_2"       "Q5A_3"  

所以我尝试了这两种小写的解决方案(都有效):

So I've tried this two solutions to lower case (both works):

barometre2013 <- setnames(barometre2013, tolower(names(barometre2013)))

colnames(barometre2013) <- tolower(colnames(barometre2013))

结果:

[229] "pond"        "q1a_1"       "q1a_2"       "q1a_3"       "q1a_4"       "q1a_5"       "q1a_6"       "q1a_7"       "q1a_8"       "q1a_9"       "q1a_10"      "q1b_1"      
[241] "q1b_2"       "q1b_3"       "q1b_4"       "q1b_5"       "q1b_6"       "q1b_7"       "q1b_8"       "q1b_9"       "q1b_10"      "q5a_1"       "q5a_2"       "q5a_3"  

但是,当我尝试像这样合并时:

BUT, when I've tried to merge like this :

baro1118 <- merge(baro1618, barometre2013, all = TRUE)

它给了我这个错误:

Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column

我不明白为什么它在第一个示例中有效,而在第二个示例中无效.我无法指定任何列,因为我有太多匹配的名称列和很多不匹配的列.

I don't understand why it was working in the first example and not in this second one. I can't specify any columns because I have TOO much name columns that match and a lot that do not match.

应该可以不指定吧?

另外,我想保留所有匹配的列名和不匹配的列名.

Also, I want to keep all the columns names that match and the ones that don't match of both df.

对不起,我的解释太长了,但我真的需要答案,我已经阅读了很多关于 SO 的问答,但没有找到我的答案.

Sorry for this long explanation, but I really need answer and I've read a lot of Q/A on SO and didn't find my answer.

推荐答案

也许值得一试:

baro1118 <- merge(baro1618, barometre2013, all = TRUE, by=intersect(names(baro1618), names(barometre2013))

这仅按公共列合并.

话虽如此,您对此使用 rbind 的预感可能更正确.如果这是来自不同时间段的数据,并且它们不重叠,则 rbind 将简单地将一个堆叠在另一个之上.这并不总是顺利,但这里有一个粗略的技巧:

That being said, your hunch of using rbind for this is probably more correct. If this is data from differentt time periods, and they don't overlap, rbind will simply stack one on top of the other. This doesn't always go smoothly, but here's a crude hack:

# maybe barometre2013 has missing column names
missing.column.names <- setdiff(names(baro1618), names(barometre2013))
barometre2013[, missing.column.names] <- NA

# maybe baro1618 has missing column names
missing.column.names <- setdiff(names(barometre2013), names(baro1618))
baro1618[, missing.column.names] <- NA

这篇关于tolower 函数并合并两个数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆