对于不接受向量的函数,我该怎么办?错误:`x`必须是长度为1的字符串 [英] What do I do about a function that will not accept a vector? Error: `x` must be a string of length 1

查看:51
本文介绍了对于不接受向量的函数,我该怎么办?错误:`x`必须是长度为1的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用xml2包来读取许多播客feed.我希望能够计算一系列播客中每个播客持续时间的第75个百分位数,以及许多类似的指标(例如,插播频率).我经常使用data.table.我想继续使用它.每次我调用read_xml参数以读取列中的url时,都会出现此错误:

I am trying to use the xml2 package to read many podcast feeds. I want to be able to calculate the 75th percentile for the duration of each podcast in a series, and many similar metrics (eg frequency of episodes). I use data.table a lot. I want to carry on using it. Every time I invoke the read_xml argument to read the urls in a column I get this error:

Error: `x` must be a string of length 1

如果我只处理一行,那会达到目的.

I can get it to work if I process just one row but that defeats the purpose.

让我给你一个简单的例子.这只是我的统计播客列表,但在现实生活中,我订阅了>跨多个领域的100个播客.

Let me give you an example that is simple. Here is the list of just my statistics podcasts but in real life I subscribe > 100 podcasts across many fields.

library(data.table)
library(xml2)
statml.opml <- read_xml(x = "https://player.fm/farrelbuch/statistics-ml.opml")
statml.items <- xml_find_all(x = statml.opml, "/opml/body/outline")
xml_structure(statml.opml)
statml.dt <- data.table(podcast = xml_attr(statml.items, "text"), url = xml_attr(statml.items, "xmlUrl"))

我首先阅读我的播客聚合器提供的opml文件.谢谢 player.fm..然后,我获得了每个提要的清单,并通过查看结构可以看到我需要从每个提要中提取的内容.我最终得到一个data.table,该表具有每个播客的名称及其网址.

I start by reading the opml file my podcast aggregator provides. Thank you player.fm. . Then I get a listing of each feed and by looking at the structure I can see what I need to extract out of each feed. I end up with a data.table that has the name of each podcast and its url.

statml.dt[1, url]
pod1 <- read_xml(x = "https://podcasts.files.bbci.co.uk/p02nrss1.rss")

xml_structure(pod1)
xml_children(pod1)


xml_find_all(x = pod1, "/rss/channel/item/itunes:duration")
xml_text(xml_find_all(x = pod1, "/rss/channel/item/itunes:duration"))

list(xml_text(xml_find_all(x = read_xml(x = "https://podcasts.files.bbci.co.uk/p02nrss1.rss"), "/rss/channel/item/itunes:duration")))

因此,我可以轻松地仅显示一个URL并从该URL读取该xml.xml_find_all将获取所有带有itunes标签的项目:duration和xml_text将隔离实际持续时间并抛弃所有标签.可以将其转换为时间列表,这样应该可以将其存储在data.table列中.

So I can easily display just a single URL and read that xml at that URL. xml_find_all will get all the items tagged with itunes:duration and xml_text will isolate the actual time duration and jettison all the tags. One can convert to a list of times which should enable one to store it in a data.table column.

看看当我尝试这些简单的代码行以使用:=通过引用快速添加列时会发生什么.您将看到,如果我将i设置为1,则一切运行良好(换句话说,我仅在第一行和第一行进行操作).但是可惜的是,如果我将我留空,以便对所有行进行操作,或者即使我将i设置为1:2,操作也会失败,并且有关x的错误必须为1的字符串.

Look what happens when I try these simple lines of code to fast add columns by reference using :=. You will see that everything works well if I set i=1 (in other words I am operating on the first row and the first row only). But alas, if I leave i blank so that it operates on all the rows or even if I set i to 1:2 the operation fails with error about x must be a string of 1.

statml.dt[,times:=list(xml_text(xml_find_all(x = read_xml(url), "/rss/channel/item/itunes:duration")))]
statml.dt[1,times:=list(xml_text(xml_find_all(x = read_xml(url), "/rss/channel/item/itunes:duration")))]
statml.dt[,hereIam:=list(read_xml(url))]
statml.dt[1,hereIam:=list(read_xml(url))]

当不期望一列值时,如何使参数在data.table的每一行上工作?

How do I get an argument to work on every row of a data.table when it is not expecting a column of values?

推荐答案

Vectorize(somefunc)将从最多接受一个的非矢量化函数 somefunc 转换接受向量的参数.

Vectorize(somefunc) will convert a non-vectorized function somefunc from one that accepts at most one argument into one that accepts a vector.

Vectorize(somefunc)返回函数,然后在后续调用中使用该函数.预先对向量进行Vectorize 函数并内联使用很容易.

Vectorize(somefunc) returns a function, which you then use in a subsequent call. It is easy to both pre-Vectorize a function and use it inline.

func1 <- function(x) { stopifnot(length(x) == 1L); 2*x; }

data.table(a=1:2)[, b := func1(a) ]
# Error in func1(a) : length(x) == 1L is not TRUE

data.table(a=1:2)[, b := Vectorize(func1)(a) ][]
#    a b
# 1: 1 2
# 2: 2 4

func1_n <- Vectorize(func1)
data.table(a=1:2)[, b := func1_n(a) ][]
#    a b
# 1: 1 2
# 2: 2 4


当您需要执行更复杂的逻辑时(例如,向量的每个元素调用多个函数),通常最好使用匿名(内联)或预定义的排序函数,即使是暂时的.从那里,使用 lapply sapply :

data.table(a=1:2)[, b := lapply(a, func1)][]
#    a b
# 1: 1 2
# 2: 2 4
str(data.table(a=1:2)[, b := lapply(a, func1)])
# Classes ‘data.table’ and 'data.frame':    2 obs. of  2 variables:
#  $ a: int  1 2
#  $ b:List of 2
#   ..$ : num 2
#   ..$ : num 4
#  - attr(*, ".internal.selfref")=<externalptr> 
str(data.table(a=1:2)[, b := sapply(a, func1)])
# Classes ‘data.table’ and 'data.frame':    2 obs. of  2 variables:
#  $ a: int  1 2
#  $ b: num  2 4
#  - attr(*, ".internal.selfref")=<externalptr> 

请注意, lapply 方法看起来就像它生成了一个简单列",但 lapply 始终返回一个 list ,它恰好呈现了人们的思维方式.如果您知道您的函数将始终返回标量"(实际上在R中是长度为1的向量),则可以使用 sapply vapply(a,func1,numeric(1)).

Note that the lapply method looks like it generates a "simple column", but lapply always returns a list, it just happens to render the way one would think. If you know that your function will always return a "scalar" (which in R is actually a vector of length 1), then you can use sapply or perhaps vapply(a, func1, numeric(1)).

这篇关于对于不接受向量的函数,我该怎么办?错误:`x`必须是长度为1的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆