合并R中的数据帧.由于矢量中的引号导致无法合并? [英] Merging a data frame in R. Unable to merge due to quotations marks in vector?

查看:72
本文介绍了合并R中的数据帧.由于矢量中的引号导致无法合并?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据帧,movies_df.我正在尝试通过电影标题将其与另一个数据框genres_df合并.我将使用一张记录,即男孩遇见世界"节目来显示问题所在

I have a data frame, movies_df. I am trying to merge it with another data frame, genres_df, by the title of the movie. I'll use one record, the show "Boy Meets World" to show what's the problem

>movies_df[30546]

      votes  rank title                   date    type
30546 29168  8.1 "Boy Meets World" (1993) 1993 TV Show



>genres_df[13126]

      title                  genre_1  genre_2  genre_3  genre_4
13126 Boy Meets World (1993) Comedy   Drama    Family   NA

因此,我尝试并合并,但都失败了:

So to merge I tried and got, which both failed:

>merge.data.frame(movies_df[30546], genres_df[13126], all.x=TRUE)

  title                    votes rank date type     genre_1  genre_2  genre_3  genre_4
1 "Boy Meets World" (1993) 29168  8.1 1993 TV Show   <NA>    <NA>     <NA>     <NA>

>merge.data.frame(genres_df[13126], movies_df[30546], all.x=TRUE)

  title                  genre_1  genre_2 genre_3  genre_4 votes rank date type
1 Boy Meets World (1993) Comedy   Drama   Family   NA      <NA> <NA> <NA> <NA>

我几乎可以肯定地说,问题在于标题字段不匹配,因为在movie_df $ title记录中存在引号.

I am almost positive that the problem is that the title fields do not match, because of the quotation marks present in the records of movies_df$title.

这就是我尝试删除所有失败的引号的方法:

So here is how I tried to delete the quotation marks, which all failed:

>gsub("\\"", "", movies_df$title[30546])
Error: unexpected string constant in "gsub("\\"", ""

>gsub(""", "", movies_df$title[30546])
Error: unexpected string constant in "gsub(""", ""

>gsub("[[punc:]]", "", movies_df$title[30546])
[1] "\"Boy Meets World\" (1993)"   ##What the heck is this???

>gsub("\\\\", "", movies_df$title[30546])
[1] "\"Boy Meets World\" (1993)"   
##Again, where did the backslashes come from, why can't I delete them???

如果有人可以用正则表达式来帮助我删除那些引号,或者可以帮助我成功地合并这两个记录,那真是太棒了.我读过不同的论坛,有人说引文是否存在无关紧要;但是我几乎肯定,这就是为什么我无法成功合并两个数据框的原因.

If anyone can help me with a regex to delete those quotation marks or help me merge those two records successfully, then that would be awesome. I read different forums, some saying the quotations don't matter if they are present or not; But I am almost positive they are the reason why I can't merge my two data frames successfully.

更多信息是,每个数据帧中的两个标题向量都属于因子"类.

More info is that both title vectors in each data frame are of the class 'factor'.

推荐答案

您尝试过此方法吗?

gsub("[[:punct:]]", "", movies_df$title[30546])

gsub('"', "", movies_df$title[30546])

这篇关于合并R中的数据帧.由于矢量中的引号导致无法合并?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆