基于不同的列名称连接表 [英] Joining tables based on different column names

查看:137
本文介绍了基于不同的列名称连接表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在看一个关于Pandas的视频[1],看看Pandas可以做什么与data.table比较。我很惊讶地学到在数据表中连接表有多难。如果你观看视频,特别是@ 49:00到@ 52:00分钟,你会看到Pandas允许你基于不同的列名称连接表,你可以为左表和右表选择不同的后缀。我理解setkey用于优化目的[2],并了解如何使用相同的列名称连接表[3]。我尝试了data.table的合并,但很难设置 by = 关键字参数使用不同的列名。所以这里是我的问题。



在data.table中,可以连接基于不同列名的表吗?如果是,如何?如果不是,为什么不呢?此外,更有用的是,这个功能不会有用吗?我感到奇怪的是,这个问题没有提前。如果这已经被讨论过,请原谅我(如果这已经被讨论过了。)



BTW,Greg正在谈论的数据在他的github [4]。


  1. https ://www.youtube.com/watch?v = 1uVWjdAbgBg

  2. http ://stackoverflow.com/a/13686768/3892933


  3. https://github.com/gjreda/pydata2014nyc


解决方案

更新:下面列出的所有功能都可以在data.table的当前稳定版本中使用 v1.9.6

$ CR $。






可能用于data.tables中的联接。




  • merge.data.table 获得 by.x by.y 参数


  • 辅助键使用上面讨论的两种形式,而不需要设置键,而是通过指定 x i




最简单的原因是我们还没有成功。 p>

I was watching a video[1] by Greg Reda about Pandas to see what Pandas can do how it compares with data.table. I was surprised to learn how difficult it was to join tables in data.table. If you watch the video, specifically @49:00 to @52:00 minutes you see that Pandas allows you to join tables based on different column names and you can choose different suffixes for left and right tables. I understand that setkey is used for optimizaion purposes[2] and understand how to join tables using same column names[3]. I tried data.table's merge but had much difficulty setting the by= keyword parameter using different column names. So here are my questions.

Is it possible, in data.table, to join tables based on different column names? If so, how? If not, why not? Also, more usefully, wouldn't this feature be useful? I find it surprising that this issue hasn't come up earlier. Pardon me (and please point me to them) if this has been discussed earlier.

BTW, the data that Greg is talking about is found on his github[4].

  1. https://www.youtube.com/watch?v=1uVWjdAbgBg
  2. http://stackoverflow.com/a/13686768/3892933
  3. Joining tables with identical (non-keyed) column names in R data.table
  4. https://github.com/gjreda/pydata2014nyc

解决方案

Update: All the features listed below are implemented and is available in the current stable version of data.table v1.9.6 on CRAN.


There are at least these improvements possible for joins in data.tables.

  • merge.data.table gaining by.x and by.y arguments

  • Using secondary keys to join using both forms discussed above without need to set keys, but rather by specifying columns on x and i.

The simplest reason is that we've not managed to get to it yet.

这篇关于基于不同的列名称连接表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆