许多数据框,不同的行长,相似的列和数据框标题,如何绑定? [英] Many dataframes, different row lengths, similiar columns and dataframe titles, how to bind?

查看:86
本文介绍了许多数据框,不同的行长,相似的列和数据框标题,如何绑定?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这需要一些解释,并且帖子本身可能太长而无法回答.

This takes a bit to explain and the post itself may be a bit too long to be answered.

我在某个时间点有许多棋手的数据框及其特定的评分.

I have MANY data frames of individual chess players and their specific ratings at points in time.

这是我的数据的样子.请原谅我分离数据集的格式不佳.卡尔森和中村是单独的数据帧.

Here is what my data looks like. Please forgive me for my poor formatting of separating the datasets. Carlsen and Nakamura are separate dataframes.

Player1

 Nakamura, Hikaru Year
             2364 2001-01-01
             2430 2002-01-01
             2520 2003-01-01
             2571 2004-01-01
             2613 2005-01-01
             2644 2006-01-01
             2651 2007-01-01
             2670 2008-01-01
             2699 2009-01-01
             2708 2010-01-01
             2751 2011-01-01
             2759 2012-01-01
             2769 2013-01-01
             2789 2014-01-01
             2776 2015-01-01
             2787 2016-01-01

Player2
          Carlsen, Magnus Year

                   2127   2002-01-01
                   2279   2003-01-01
                   2484   2004-01-01
                   2553   2005-01-01
                   2625   2006-01-01
                   2690   2007-01-01
                   2733   2008-01-01
                   2776   2009-01-01
                   2810   2010-01-01
                   2814   2011-01-01
                   2835   2012-01-01
                   2861   2013-01-01
                   2872   2014-01-01
                   2862   2015-01-01
                   2844   2016-01-01

您可以在此处下载两组:

You can download the two sets here:

下载Player2 下载Player1

在上面的代码和下面的代码之间,Ive删除了两列,并将观察值重新分配为列标题.

Between the above code, and below, Ive deleted two columns and reassigned an observation as a column title.

中村光(Hikaru Nakamura)/马格努斯·卡尔森(Magnus Carlsen)随着时间的推移对国际象棋的评分

Hikaru的数据分配给一个数据帧Player1. Magnus的数据分配给一个数据帧Player2.

Hikaru's data is assigned to a dataframe, Player1. Magnus's data is assigned to a dataframe, Player2.

我想做的就是获取下面看到的内容,将它们的数据框合并在一起.

What I want to be able to do is get what you see below, a dataframe of them combined.

我用来生成该框架的代码是

The code I used to produce this frame is

 merged<- merge(Player1, Player2, by = c("Year"), all = TRUE)

现在,这对于两个数据集来说既有趣又花花公子,但是在将更多的玩家添加到此组合数据集中时,我遇到了非常烦人的困难.

Now, this is all fun and dandy for two data sets, but I am having very annoying difficulties to add more players to this combined data set.

例如,也许我想在此组合中增加5、10、15个播放器.这些玩家的例子有克拉姆尼克,阿南德,盖尔芬(著名的国际象棋玩家的例子).如您所料,对于5个玩家,数据框将有6列,10列将有11列,15列将有16列,所有这些都按Year变量很好地排序.

For example, maybe I would like to add 5, 10, 15 more players to this set. Examples of these players would be Kramnik, Anand, Gelfand ( Examples of famous chess players). As you'd expect, for 5 players, the dataframe would have 6 columns, 10 would have 11, 15 would have 16, all ordered nicely by the Year variable.

幸运的是,每个玩家的观察次数总是少于100.而且,每个单独的玩家都被分配了自己的数据集.

Fortunately, the number of observations for each Player is less than 100 always. Also, each individual player is assigned his/her own dataset.

例如,

 Nakamura is the Player1 dataframe
 Carlsen is the Player2 dataframe
 Kramnik is the Player3 dataframe
 Anand is the Player4 dataframe
 Gelfand is the Player5 dataframe

所有我使用此代码通过for循环分配过程创建的

all of which I have created using a for loop assigning process using this code

for (i in 1:nrow(as.data.frame(unique(Timed_set_filtered$Name)))) {
  assign(paste("Player",i,sep=""), subset(Timed_set_filtered, Name == unique(Timed_set_filtered$Name)[i]))
}

我不想写出如下内容:

 merged<- merge(Player1, Player2,.....Player99 ,Player100, by = c("Year"), all = TRUE)

我希望能够按年将我在循环中创建的所有5、10、15 ... i个Player"i"对象合并在一起.

I want to able to merge all 5, 10, 15...i number of Player"i" objects that I created in the loop together by Year.

此外,一旦它最初离开循环,每个数据集都将如下所示.

Also, once it leaves the loop initially, each dataset looks like this.

所以最终发生的事情是,我使用以下代码段将所有数据集分配给列表:

So what ends up happening is that I assign all of the data sets to a list by using the following snippet:

 lst <- mget(ls(pattern='^Player\\d+'))
 list2env(lapply(lst,`[`,-2), envir =.GlobalEnv)
 lst <- mget(ls(pattern='^Player\\d+'))

for (i in 1:nrow(as.data.frame(unique(Timed_set_filtered$Name)))) {
  names(lst[[i]]) [names(lst[[i]]) == 'Rating'] <- eval(unique(Timed_set_filtered$Name)[i])
}

这就是我的列表.

有没有一种方法可以将YEAR用作合并表,因此它可以[cbinds,bind_cols,merges等]每个Player"i"数据帧, length ,在我的列表中,我得到了一种组合/合并设置,就像您在merged(player1,player2)设置下面看到的那样?

Is there a way I write a table with YEAR as the way its merged by, so that it[cbinds, bind_cols, merges, etc] each of the Player"i" dataframes, which are necessarily not equal in length , in my lists are such a way that I get a combined/merged set like the one you saw below the merged(player1, player2) set?

这又是一幅图表,但这对于许多玩家而言,不仅是卡尔森和中村.

Here is the diagram again, but it would have to be for many players, not just Carlsen and Nakmura.

还有,有一种方法可以避免使用列表函数,而是直接使用

Also, is there a way I can avoid using the list function, and just straight up do

names(Player"i") [names(Player"i") == 'Rating'] <- eval(unique(Timed_set_filtered$Name)[i])

只是重命名以"Player"开头的所有数据框的标题.

which just renames the titles of all of the dataframes that start with "Player".

merge(player1, player2, player3,...., player99, player100, by = c("YEAR"), all = TRUE) 

它将合并所有玩家""i"数据集?

which would merge all of the "Player""i" datasets?

如果不清楚,请提及.

推荐答案

有趣的是,只有一行代码可以解决问题.将所有Player1,Player 2 .... Player i分配到列表中后,我就按Year加入了列表中包含的所有组合.

It was pretty funny that one line of code did the trick. After I assigned all of the Player1, Player 2....Player i into the list, I just joined all of the sets contained in the list by Year.

用于生成所有唯一数据集的循环.

for (i in 1:nrow(as.data.frame(unique(Timed_set_filtered$Name)))) {
  assign(paste("Player",i,sep=""), subset(Timed_set_filtered, Name == unique(Timed_set_filtered$Name)[i]))
}

将它们放入列表

 lst <- mget(ls(pattern='^Player\\d+'))

合并或加入具有共同价值的人

df <- join_all(lst, by = 'Year')

不幸的是,与merge(datasets ..,all = TRUE)不同,它会出于未知原因丢弃某些观察值,因此必须了解为什么会发生这种情况.

Unfortunately, unlike merge(datasets...., all= TRUE), it drops certain observations for an unknown reason, will have to see why this happens.

这篇关于许多数据框,不同的行长,相似的列和数据框标题,如何绑定?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆