Rentrez的摘要列表在使用append()合并后停止工作 [英] Lists of summaries from rentrez stop working after being merged using append()

查看:80
本文介绍了Rentrez的摘要列表在使用append()合并后停止工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

tl; dr: rentrez 生成的摘要列表有什么不同,为什么所说的列表停止与其他 rentrez 函数使用 append()合并后?

tl;dr: What is different about an esummary list produced by rentrez, and why do said lists stop working with other rentrez functions after they are merged using append()?

我正在使用<$ c访问Pubmed $ c> rentrez 。我可以毫无问题地搜索出版物并下载摘要。但是,对于我不了解的摘要列表必须有一些特殊之处,因为当我使用 append()尝试合并列表时,事情会分崩离析。通过阅读文档,我无法弄清有什么区别。这是使我能够毫无问题地搜索Pubmed和下载记录的代码:

I am accessing Pubmed using rentrez. I am able to search for publications and download esummaries without problem. However, there must be something special about an esummary list that I do not understand, because things fall apart when I used append() to try to merge lists. I have not been able to figure out what that difference is by reading the documentation. Here is the code that allows me to search Pubmed and download records without problem:

# set search term and retmax
term_set <- '"Transcription, Genetic"[Mesh] AND "Regulatory Sequences, Nucleic Acid"[Mesh] AND 2017:2018[PDAT]'
retmax_set <- 500
# search pubmed using web history
search.l <- entrez_search(db = "pubmed", term = term_set, use_history = T)
# get summaries of search hits using web history 
for (seq_start in seq(0, search.l$count, retmax_set)) {
    if (seq_start == 0) {summary.l <- list()} 
    summary.l[[length(summary.l)+1]] <- entrez_summary(
        db = "pubmed", 
        web_history = search.l$web_history, 
        retmax = retmax_set, 
        retstart = seq_start
    )
}

但是,使用 summary.l<-列表(),然后 summary.l [[length(summary.l)+1]]<-entrez_summary(... 会导致李的名单摘要的sts(此搜索中的3个子列表)。这会导致在数据提取的后续步骤中出现多个 for 循环(如下所示),并且是一个毫无意义的数据结构。

However, using summary.l <- list() and then summary.l[[length(summary.l)+1]] <- entrez_summary(... results in a list of lists of esummaries (3 sub-lists, in this search). This results in multiple for loops in subsequent steps of the data extraction (below) and is an unweildly data structure.

# extract desired information from esummary, convert to dataframe
for (i in 1:length(summary.l)) {
    if (i == 1) {faut.laut.l <- list()}
    faut.laut <- summary.l[[i]] %>% 
        extract_from_esummary(
            c("uid", "sortfirstauthor", "lastauthor"), 
            simplify = F
        )
    faut.laut.l <- c(faut.laut.l, faut.laut)
}
faut.laut.df <- rbindlist(faut.laut.l)

使用<下面代码中的code> append()给出了所有1334个摘要的单个列表,避免了子列表。

Using append() in the code below gives a single list of all 1334 esummaries, avoiding the sub-lists.

# get summaries of search hits using web history 
for (seq_start in seq(0, search.l$count, retmax_set)) {
    if (seq_start == 0) {
        summary.append.l <- entrez_summary(
            db = "pubmed", 
            web_history = search.l$web_history, 
            retmax = retmax_set, 
            retstart = seq_start
        )
    } 
    summary.append.l <- append(
        summary.append.l,
        entrez_summary(
            db = "pubmed", 
            web_history = search.l$web_history, 
            retmax = retmax_set, 
            retstart = seq_start
        )
    )
}

但是,在随后的步骤 esummaries 应该是一个摘要对象列表,但是> extract_from_esummary()会引发错误。

However, in the subsequent step extract_from_esummary() throws an error, even though the documentation says states that the argument esummaries should be a list of esummary objects.

# extract desired information from esummary, convert to dataframe
faut.laut.append.l <- extract_from_esummary(
    esummaries = summary.append.l,
    elements = c("uid", "sortfirstauthor", "lastauthor"), 
    simplify = F
)
Error in UseMethod("extract_from_esummary", esummaries) : 
no applicable method for 'extract_from_esummary' applied to an object of class "list"

faut.laut.append.df <- rbindlist(faut.laut.append.l)
Error in rbindlist(faut.laut.append.l) : 
object 'faut.laut.append.l' not found

搜索结果小于500记录可以在 entrez_summary()的单个调用中完成,并且不需要列表的串联。结果,下面的代码可以正常工作。

A search that yeilds less than 500 records can be done in a single call of entrez_summary() and does not require the concatenation of lists. As a result, the code below works.

# set search term and retmax
term_set_small <- 'kadonaga[AUTH]'
retmax_set <- 500
# search pubmed using web history
search_small <- entrez_search(db = "pubmed", term = term_set_small, use_history = T)
# get summaries from search with <500 hits
summary_small <- entrez_summary(
    db = "pubmed", 
    web_history = search_small$web_history, 
    retmax = retmax_set
)
# extract desired information from esummary, convert to dataframe
faut.laut_small <- extract_from_esummary(
    esummaries = summary_small,
    elements = c("uid", "sortfirstauthor", "lastauthor"), 
    simplify = F
)
faut.laut_small.df <- rbindlist(faut.laut_small)

为什么 append()会打断摘要,这可以避免吗?谢谢。

Why does append() break the esummaries, and can this be avoided? Thanks.

推荐答案

extract_from_esummary 的文档对此有点困惑。它真正需要的是 esummary 对象或 esummary_list 。因为 esummary 对象本身是从列表继承的,所以我认为我们不能轻易地将 extract_from_esummary 用于任何列表被扔给它。我会修复文档,也许会考虑为对象设计更好的设计。

The documentation for extract_from_esummary is a little confusing on this. What it really needs is either an esummary object or an esummary_list. Because the esummary object itself inherits from a list I don't think we can easily have extract_from_esummary work on any list that is thrown at it. I'll fix the docs and maybe think about a better design for the objects.

要解决此特定问题,有一些解决方法。一种,您可以只对摘要列表进行分类

To fix this particular problem there are a few fixes. One, you can just re-class the list of esummaries

class(summary.append.l) <- c("list", "esummary_list")
extract_from_esummary(summary.append.l, "sortfirstauthor")

应该做到这一点。另一种选择是在执行任何附加操作之前提取相关数据。这与您的示例类似,对于

Should do the trick. Another option would be to extract the relevant data before you do any appending. This is something simlar to your example with more lapply and less for

all_the_summs <- lapply(seq(0,50,5),  function(s) {
    entrez_summary(db="pubmed", 
                   web_history=search.l$web_history, 
                   retmax=5,  retstart=s)
})
desired_fields <- lapply(all_the_summs, extract_from_esummary, c("uid", "sortfirstauthor", "lastauthor"), simplify=FALSE)  
res <- do.call(cbind.data.frame, desired_fields)

希望能提供前进的道路。

Hope that provides a way forward.

这篇关于Rentrez的摘要列表在使用append()合并后停止工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆