如何在R中的任何匹配子句上保留JOIN? [英] How to LEFT JOIN on ANY of the matching clauses in R?

查看：43 发布时间：2021/5/20 19:11:34 r join left-join non-equi-join

本文介绍了如何在R中的任何匹配子句上保留JOIN?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

您能帮我吗?

我有一个数据框( df1 )，其中包含网站CMS中发布的所有文章的索引.在当前的 URL 中有一列，在发布后被更改的原始URL中有一列(列名 Origin ):

I have a dataframe (df1) that has index of all articles published in the website's CMS. There's a column for current URL and a column of original URLs in case they were changed after publication (column name Origin):

<身体>

URL	来源	ArticleID	作者	类别	费用
https://example.com/article1	https://example.com/article	001	AuthorName	政治	120美元
https://example.com/article2	https://example.com/article2	002	AuthorName	财务	68美元

URL	Origin	ArticleID	Author	Category	Cost
https://example.com/article1	https://example.com/article	001	AuthorName	Politics	120 USD
https://example.com/article2	https://example.com/article2	002	AuthorName	Finance	68 USD

接下来，我有一个巨大的数据框( df2 )，其中包含一个时间范围内的网络分析导出.它有一个日期，URL和浏览量只有1列.

Next I have an huge dataframe (df2)with web analytics export for a timeframe. It has a date, just 1 column for URL and number of pageviews.

<身体>

PageviewDate	URL	浏览量
2019-01-01	https://example.com/article	224544
2019-01-01	https://example.com/article1	656565

我该如何与第一个数据框保持连接，但在 URL = URL 或 Origin = URL

How do I left join this with first dataframe but matching on either URL = URL OR Origin = URL

因此最终结果将如下所示:

So that end result would look like this:

<身体>

PageviewDate	浏览量	ArticleID	作者	类别
2019-01-01	881109	001	AuthorName	政治

即 881109 是将与同一文章相关的 224544 和 656565 相加的结果

i.e 881109 is the result of adding up 224544 and 656565 that both related to the same article

我想我正在寻找的等效于SQL语法，例如:

I guess what I'm looking for is the equivalent of SQL syntax like:

LEFT JOIN ...`enter code here`
ON URL = URL
OR Origin = URL```

推荐答案

您可以获取长格式的数据帧1( df1 )，以便 Origin 和URL 在同一列中，然后与第二个数据帧( df2 )执行联接.

You could get dataframe 1 (df1) in long format so that both Origin and URL are in the same column and then perform the join with second dataframe (df2).

library(dplyr)
library(tidyr)

df1 %>%
  pivot_longer(cols = c(URL, Origin), values_to = 'URL') %>%
  inner_join(df2, by = 'URL') %>%
  select(-name)

#  ArticleID Author     Category name   URL                          PageviewDate Pageviews
#      <int> <chr>      <chr>    <chr>  <chr>                        <chr>            <int>
#1         1 AuthorName Politics URL    https://example.com/article1 2019-01-01      656565
#2         1 AuthorName Politics Origin https://example.com/article  2019-01-01      224544

数据

df1 <- structure(list(URL = c("https://example.com/article1", "https://example.com/article2"
), Origin = c("https://example.com/article", "https://example.com/article2"
), ArticleID = 1:2, Author = c("AuthorName", "AuthorName"), 
Category = c("Politics", "Finance")), class = "data.frame",row.names =c(NA, -2L))


df2 <- structure(list(PageviewDate = c("2019-01-01", "2019-01-01"), 
    URL = c("https://example.com/article", "https://example.com/article1"), 
Pageviews = c(224544L, 656565L)), class = "data.frame", row.names = c(NA, -2L))

这篇关于如何在R中的任何匹配子句上保留JOIN?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在R中的任何匹配子句上保留JOIN? [英] How to LEFT JOIN on ANY of the matching clauses in R?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在R中的任何匹配子句上保留JOIN? [英] How to LEFT JOIN on ANY of the matching clauses in R?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭