合并R中的数据帧.由于矢量中的引号导致无法合并? [英] Merging a data frame in R. Unable to merge due to quotations marks in vector?
问题描述
我有一个数据帧,movies_df.我正在尝试通过电影标题将其与另一个数据框genres_df合并.我将使用一张记录,即男孩遇见世界"节目来显示问题所在
I have a data frame, movies_df. I am trying to merge it with another data frame, genres_df, by the title of the movie. I'll use one record, the show "Boy Meets World" to show what's the problem
>movies_df[30546]
votes rank title date type
30546 29168 8.1 "Boy Meets World" (1993) 1993 TV Show
>genres_df[13126]
title genre_1 genre_2 genre_3 genre_4
13126 Boy Meets World (1993) Comedy Drama Family NA
因此,我尝试并合并,但都失败了:
So to merge I tried and got, which both failed:
>merge.data.frame(movies_df[30546], genres_df[13126], all.x=TRUE)
title votes rank date type genre_1 genre_2 genre_3 genre_4
1 "Boy Meets World" (1993) 29168 8.1 1993 TV Show <NA> <NA> <NA> <NA>
>merge.data.frame(genres_df[13126], movies_df[30546], all.x=TRUE)
title genre_1 genre_2 genre_3 genre_4 votes rank date type
1 Boy Meets World (1993) Comedy Drama Family NA <NA> <NA> <NA> <NA>
我几乎可以肯定地说,问题在于标题字段不匹配,因为在movie_df $ title记录中存在引号.
I am almost positive that the problem is that the title fields do not match, because of the quotation marks present in the records of movies_df$title.
这就是我尝试删除所有失败的引号的方法:
So here is how I tried to delete the quotation marks, which all failed:
>gsub("\\"", "", movies_df$title[30546])
Error: unexpected string constant in "gsub("\\"", ""
>gsub(""", "", movies_df$title[30546])
Error: unexpected string constant in "gsub(""", ""
>gsub("[[punc:]]", "", movies_df$title[30546])
[1] "\"Boy Meets World\" (1993)" ##What the heck is this???
>gsub("\\\\", "", movies_df$title[30546])
[1] "\"Boy Meets World\" (1993)"
##Again, where did the backslashes come from, why can't I delete them???
如果有人可以用正则表达式来帮助我删除那些引号,或者可以帮助我成功地合并这两个记录,那真是太棒了.我读过不同的论坛,有人说引文是否存在无关紧要;但是我几乎肯定,这就是为什么我无法成功合并两个数据框的原因.
If anyone can help me with a regex to delete those quotation marks or help me merge those two records successfully, then that would be awesome. I read different forums, some saying the quotations don't matter if they are present or not; But I am almost positive they are the reason why I can't merge my two data frames successfully.
更多信息是,每个数据帧中的两个标题向量都属于因子"类.
More info is that both title vectors in each data frame are of the class 'factor'.
推荐答案
您尝试过此方法吗?
gsub("[[:punct:]]", "", movies_df$title[30546])
或
gsub('"', "", movies_df$title[30546])
这篇关于合并R中的数据帧.由于矢量中的引号导致无法合并?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!