创建一个函数以获取europmc文献以跳过不返回表格的文件 [英] Creating a function to fetch europmc literature to skip paper which does't return tables
问题描述
这是我的问题,我想跟进.问题
This was my question which I would like to follow up. Question
我正在遵循一种相当复杂的方式来做我现在想要做的事情.
I m following a rather complex way to do what I want to do as of now.
但是 Ben 提出的一个简单解决方案就是这个
But a simple solution which proposed by Ben was this
library(tidypmc)
library(tidyverse)
library(europepmc)
doc <- map("PMC7809753", epmc_ftxt)
tbls <- pmc_table(doc[[1]])
tbls[[1]]
我的目标是我正在尝试做.请参见europmc上具有开放访问权限的药品或疾病等,并以表格形式获取其数据 **
**并保存.
My objective was i was trying to do. See drugs or disease etc on europmc which have open access and the fetch its data **as a tabular form
** and save it.
要实现第一部分,就可以完成工作
To achieve the first part this does the job
library(europepmc)
b <-epmc_search(query = 'cytarabine aml OPEN_ACCESS:Y',limit = 20)
pmcids <- b$pmcid[b$isOpenAccess=="Y"]
我得到了属于角色的类的pmcids.
I get pmcids which class is character.
要以 Ben 的形式做第二部分,建议这样做确实很好.
To do the second part as Ben suggested this works really well.
doc <- map("PMC7809753", epmc_ftxt)
tbls <- pmc_table(doc[[1]])
tbls[[1]]
要在一个慷慨的stackoverflow用户的帮助下解决上述问题,我获得了此功能
To address the above with help a generous stackoverflow user I got this function
b <-epmc_search(query = 'cytarabine aml OPEN_ACCESS:Y',limit = 6)
pmcids <- b$pmcid[b$isOpenAccess=="Y"]
pub_tables <- lapply(pmcids, function(pmc_id) {
message("-- Trying ", pmc_id, "...")
doc <- tryCatch(pmc_xml(pmc_id),
error = function(e) {
message("------ Failed to recover PMCID")
return(NULL)
})
if(!is.null(doc)) {
#-- If succeed, try to get table
tables <- pmc_table(doc)
if(!is.null(tables)) {
#-- If succeed, try to get table name
table_caps <- pmc_caption(doc) %>%
filter(tag == "table")
names(tables) <- paste(table_caps$label, table_caps$text, sep = " - ")
}
return(tables)
} else {
#-- If fail, return NA
return(NA)
}
})
names(pub_tables) <- pmcids
这很好用,但是我得到了这个错误
This works well but i got this error
Error in names(tables) <- paste(table_caps$label, table_caps$text, sep = " - ") :
'names' attribute [3] must be the same length as the vector [2]
这些是我用来验证其限制为6的pmcid.
These are my pmcids which Im using to query it with limit set to 6.
"PMC7837979" "PMC7809753" "PMC7790830" "PMC7797573" "PMC7806552" "PMC7836575"
现在我该如何跳过那些论文,如果我没有得到任何信息,那么我将跳至下一篇,换句话说,如何解决此错误.
Now how do i skip those papers where if I dont get any information then I will skip to the next one in other words how to work around this error.
我在创建复杂函数上有非常微小的/分钟的经验,但是从代码中,如果我理解这段代码应该在上面工作,但不确定为什么不是这样!
I have very tiny/minute experience in creating complicated function but from the code if i understand this chunk of code should be working on it but not sure why it is not!!.
} else {
#-- If fail, return NA
return(NA)
}
Error in names(tables) <- paste(table_caps$label, table_caps$text, sep = " - ") :
'names' attribute [3] must be the same length as the vector [2]
例如,当限制设置为4时,pub_tables作为列表返回,而最后一个pmcid返回为
For example When the limit is set 4 it works well the pub_tables is returned as list and the last pmcid is returned as
$PMC7797573
NULL
但是问题出现在"PMC7806552"
上.因此,当我在读取表时出现错误,然后移至下一个PMCID时,如何获得空结果.
But the problem occurs with "PMC7806552"
. So how do i get the null result when i see an error in fetching table and then move to the next PMCIDs.
任何帮助将不胜感激.
或者有任何更简单的方法.
Or there is any simpler way of doing it.
推荐答案
此处是对该函数进行了稍微修改以使其起作用的功能.唯一的修改是我添加了以下几行:
Here is the function modified slightly to work. The only edit is that I added these lines:
table_caps <- table_caps %>% group_by(label) %>%
summarise(text = paste(text, collapse=" "),
tag = "table")
table_caps 对象的初始定义之后.问题在于某些表标题有多个句子.这会将多个句子粘贴在一起.
after the initial definition of the table_caps
object. The problem was that some table captions had multiple sentences. This pastes the multiple sentences together.
b <-epmc_search(query = 'cytarabine aml OPEN_ACCESS:Y',limit = 10)
pmcids <- b$pmcid[b$isOpenAccess=="Y"]
pub_tables <- lapply(pmcids, function(pmc_id) {
message("-- Trying ", pmc_id, "...")
doc <- tryCatch(pmc_xml(pmc_id),
error = function(e) {
message("------ Failed to recover PMCID")
return(NULL)
})
if(!is.null(doc)) {
#-- If succeed, try to get table
tables <- pmc_table(doc)
if(!is.null(tables)) {
#-- If succeed, try to get table name
table_caps <- pmc_caption(doc) %>%
filter(tag == "table")
table_caps <- table_caps %>% group_by(label) %>%
summarise(text = paste(text, collapse=" "),
tag = "table")
names(tables) <- paste(table_caps$label, table_caps$text, sep = " - ")
}
return(tables)
} else {
#-- If fail, return NA
return(NA)
}
})
这篇关于创建一个函数以获取europmc文献以跳过不返回表格的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!