在R中读取多个* .rtf文件 [英] Read multiple *.rtf files in r

查看:90
本文介绍了在R中读取多个* .rtf文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含2000多个rtf文档的文件夹.我想将它们导入r(最好是可以与 tidytext 结合使用的数据框)包裹).此外,我还需要添加一个列,即添加文件名,以便可以将每个rtf文档的内容链接到文件名(以后,我还将不得不从文件名中提取信息并将其保存到数据集的单独列中).

I have a folder with more than 2,000 rtf documents. I want to import them into r (preferable into a data frame that can be used in combination with the tidytext package). In addition, I need an additional column, adding the filename so that I can link the content of each rtf document to the filename (later, I will also have to extract information from the filename and save it into seperate columns of my data set).

我遇到了 Jens Leerssen 提出的一种解决方案,该解决方案旨在适应我的要求:

I came across a solution by Jens Leerssen that I tried to adapt to my requirements:

require(textreadr)

read_plus <- function(flnm) {
read_rtf(flnm) %>% 
    mutate(filename = flnm)
}

tbl_with_sources <-
    list.files(path= "./data", pattern = "*.rtf", 
           full.names = TRUE) %>% 
map_df(~read_plus(.))

但是,我收到以下错误消息:

However, I get the following error message:

UseMethod("mutate_")中的错误:没有适用于'mutate_'的适用方法应用于字符"类的对象

Error in UseMethod("mutate_") : no applicable method for 'mutate_' applied to an object of class "character"

任何人都可以告诉我为什么会发生此错误,或者提出其他解决方案来解决我的问题吗?

Can anyone tell me why this error occurs or propose another solution to my problem?

推荐答案

我终于通过一些解决方法解决了这个问题.

I finally solved the problem, with some workaround.

1)我通过在MacOSX终端中使用 textutil 命令将* .rft文件转换为* .txt文件:

1) I converted the *.rft files to *.txt files by using the textutil command in the MacOSX terminal:

find . -name \*.rtf -print0 | xargs -0 textutil -convert txt

这样做,我也摆脱了格式化.

By doing so, I get also rid of formatting.

2)然后,我使用了Jens Lerrssen的 read_plus 函数.但是我现在使用 read.delim 而不是 read_rtf ,并包括两个选项( stringsAsFactors quote )来摆脱警告和/或错误:

2) I then used the read_plus function of Jens Lerrssen. However I now use read.delim instead of read_rtf and included two options (stringsAsFactors and quote) to get rid of warnings and/or errors:

read_plus <- function(flnm) {
    read.delim(flnm, header = FALSE, stringsAsFactors = FALSE, quote = "") %>% 
            mutate(filename = flnm)
}

3)最后,我读取了所有* .txt文件,并在最后将列重命名为 V1 .

3) Finally, I read in all the *.txt files and renamed the columnn V1 at the end.

df <- list.files(path = "./data", pattern = "*.txt", 
               full.names = TRUE) %>% 
    map_df(~read_plus(.)) %>%
    rename(paragraph = V1)

这篇关于在R中读取多个* .rtf文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆