如何从R中的帮助页面获取文本数据? [英] How to get text data from help pages in R?
问题描述
在全球范围内,我对从R文档中获取所有文本数据以将其放入数据框中并应用文本挖掘技术感兴趣.
Globally, I'm interested in getting all text data from R documentations to put them in data frames and apply text mining techniques.
- 软件包级别:假设我对软件包感兴趣,例如"utils",并且我想获取矢量中的所有文本数据. 这有效:
- PACKAGE LEVEL: Suppose I'm interested in a package, for instance "utils" and I want to get all text data in a vector. This works:
package_d <- packageDescription("utils")
package_d$Description
package_d <- packageDescription("utils")
package_d$Description
但不是这样:
package_d$Details
-
功能级别:相同的问题,但功能相同.我尝试了这个但没有成功:
FUNCTIONS LEVEL : Same problem but for the functions. I tried this without success:
function_d <- ?utils::adist
function_d$Description
function_d <- ?utils::adist
function_d$Description
SUB-LEVELS:我想提取特定程序包的所有详细信息,参数说明和功能值...
SUB-LEVELS : I would like to extract all the details, descriptions of arguments and values of the functions of a particular package...
非常感谢您的帮助!
推荐答案
我找不到内置的,但是查看完成大部分工作的函数的源代码,这里有一个可以提取文本的函数在帮助页面上.
I couldn't find a built in one, but looking at the source for the functions that do most of the work, here's a function that can extract the text from the help page.
help_text <- function(...) {
file <- help(...)
path <- dirname(file)
dirpath <- dirname(path)
pkgname <- basename(dirpath)
RdDB <- file.path(path, pkgname)
rd <- tools:::fetchRdDB(RdDB, basename(file))
capture.output(tools::Rd2txt(rd, out="", options=list(underline_titles=FALSE)))
}
您可以将其与软件包帮助页面和功能帮助页面一起使用.
You can use it with the package help pages and function help pages.
h1 <- help_text(utils)
h2 <- help_text(adist)
您将从帮助页面获得一系列行.您可以使用
You'll get an array of rows from the help page. You can print them with
cat(h1, sep="\n")
这篇关于如何从R中的帮助页面获取文本数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!