读取价值跨越多行的键值对的最有效的方法？ [英] Most efficient way to read key value pairs where values span multiple lines?

查看：110 发布时间：2017/3/26 2:02:14 r dataframe

本文介绍了读取价值跨越多行的键值对的最有效的方法？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

将文本文件（如下面的示例）解析成两列 data.frame 的最快方法是什么，然后将其转换为宽格式？ p>

What is the fastest way to parse a text file such as the example below into a two column data.frame which then then be transformed into a wide format?

FN Thomson Reuters Web of Science™
VR 1.0
PT J
AU Panseri, Sara
   Chiesa, Luca Maria
   Brizzolari, Andrea
   Santaniello, Enzo
   Passero, Elena
   Biondi, Pier Antonio
TI Improved determination of malonaldehyde by high-performance liquid
   chromatography with UV detection as 2,3-diaminonaphthalene derivative
SO JOURNAL OF CHROMATOGRAPHY B-ANALYTICAL TECHNOLOGIES IN THE BIOMEDICAL
   AND LIFE SCIENCES
VL 976
BP 91
EP 95
DI 10.1016/j.jchromb.2014.11.017
PD JAN 22 2015
PY 2015

使用 readLines 是有问题的，因为多行字段没有键。读取为固定宽度表也不行。建议？如果不是为了多行问题，这可以很容易地实现，每个行/记录的操作如下所示：

Using readLines is problematic because the multi-line fields don't have the keys. Reading as fixed width table also doesn't work. Suggestions? If not for the multiline issue, this would be easily accomplished with a function that operates on each row/record like so:

x <- "FN Thomson Reuters Web of Science"
re <- "^([^\\s]+)\\s*(.*)$"
key <- sub(re, "\\1", x, perl=TRUE)
value <- sub(re, "\\2", x, perl=TRUE)
data.frame(key, value)
key                          value
1  FN Thomson Reuters Web of Science

注意：字段将始终为大写和两个字符。作者的整个标题和列表可以并入单个单元格。

Notes: The fields will always be uppercase and two characters. The entire title and list of authors can be concatenated into a single cell.

推荐答案

这是另一个想法，如果你想留在基地R，可能会很有用：

Here's another idea, that might be useful if you want to stay in base R:

parseEntry <- function(entry) {
    ## Split at beginning of each line that starts with a non-space character    
    ll <- strsplit(entry, "\\n(?=\\S)", perl=TRUE)[[1]]
    ## Clean up empty characters at beginning of continuation lines
    ll <- gsub("\\n(\\s){3}", "", ll)
    ## Split each field into its two components
    read.fwf(textConnection(ll), c(2, max(nchar(ll))))
}

## Read in and collapse entry into one long character string.
## (If file contained more than one entry, you could preprocess it accordingly.)
ee <- paste(readLines("egFile.txt"), collapse="\n")
## Parse the entry
parseEntry(ee)

这篇关于读取价值跨越多行的键值对的最有效的方法？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

读取价值跨越多行的键值对的最有效的方法？ [英] Most efficient way to read key value pairs where values span multiple lines?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

读取价值跨越多行的键值对的最有效的方法？ [英] Most efficient way to read key value pairs where values span multiple lines?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭