如何在一个列中合并两个具有共同值的独立数据集 [英] How to Merge two separate data sets with common values in a column

查看:62
本文介绍了如何在一个列中合并两个具有共同值的独立数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,我要尝试合并两个数据集.问题是我需要通过某个列和该列中的某些值来组合它们.他们都有一个名为player_id的列.一个数据集只有玩家ID. 第一个数据集

Hello so I have two data sets I am trying to combine. The problem is I need to combine them by a certain column and certain values in the column. They both have a column called player_id. One data set has only players ids. First data set

第二个数据集包含玩家的本垒打次数和玩家ID.问题是第二个数据集有很多我不需要的无关玩家.因此,我只需要根据数据集1中的玩家ID来合并它们.数据集2

The second data set has the number of home runs a player had and the player id. The problem is the second dataset has a ton of irrelevant players I don't need. So I need to merge them just by the player ids from data set 1. data set 2

这是我的代码,合并不起作用,因为它从两个数据集中带来了所有玩家ID.

Here is my code, the combine does not work because It brings all of the players ids from both data sets.

player2 = subset(player, select = c(player_id, birth_state))

player.mt <- player[ which(player$birth_state =='MT'),]
player.mt2 = subset(player.mt, select = c(player_id))
batting.hr <- subset(batting, select = c(player_id, hr))
batting.hr

combine <- merge(player.mt2, batting.hr, by=c("player_id"), all=TRUE) 

推荐答案

这是一个简单而常见的问题,请对其进行一些搜索.您想要的是一个内部合并,仅当id列在两个字段中时,您才保留数据.一个字符变化就是区别.

This is a simple and common problem, search around for it a bit. What you want is an inner merge where you keep data only if the id column is in both. One character change is the difference.

combine <- merge(player.mt2, batting.hr, by=c("player_id"), all=F) 

或者,如果您想将所有这些保留在播放器数据集中(不管它们是否存在于本垒打中),而不是将所有那些保留在本垒打数据集中,则可以执行以下操作:

Alternatively, if you wanted to keep all those in the player dataset (regardless of whether or not they were present in homeruns) and not all those in the homerun dataset you could do:

combine <- merge(player.mt2, batting.hr, by=c("player_id"), all.x=T, all.y=F) 

全部归结为代码的all部分.该文档在?merge()中是很容易解释的,这个问题在这里和其他地方都得到了解答.

It all comes down to the all part of your code. The documentation is pretty self explanatory in ?merge() and this question is answered all over here and elsewhere.

这篇关于如何在一个列中合并两个具有共同值的独立数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆