访问 R 中拆分字符串的元素 [英] Accessing element of a split string in R

查看:20
本文介绍了访问 R 中拆分字符串的元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有一个字符串,

x <- "Hello World"

如何使用字符串拆分访问第二个单词World"

x <- strsplit(x, " ")

x[[2]] 什么都不做.

解决方案

正如评论中提到的,认识到 strsplit 返回一个列表对象很重要.由于您的示例仅拆分单个项目(长度为 1 的向量),因此您的列表长度为 1.我将用一个稍微不同的示例进行解释,输入长度为 3 的向量(要拆分的 3 个文本项):

input <- c( "Hello world", "Hi there", "Back at ya" )x <- strsplit( 输入, " " )>X[[1]][1]你好"世界"[[2]][1]嗨"那里"[[3]][1]返回"在"你"

请注意,返回的列表有 3 个元素,输入向量的每个元素一个.每个列表元素都按照 strsplit 调用进行拆分.所以我们可以使用 [[ 调用这些列表元素中的任何一个(这是你的 x[[2]] 调用所做的,但你只有一个列表元素,它这就是为什么你不能得到任何回报):

<代码>>x[[1]][1]你好"世界">x[[3]][1]返回"在"你"

现在我们可以通过附加一个 [ 调用来获取任何这些列表元素的第二部分:

<代码>>x[[1]][2][1] 《世界》>x[[3]][2][1] 在"

这将返回每个列表元素的第二个项目(请注意,在这种情况下,Back at ya"输入已返回at").您可以使用 apply 系列中的某些内容一次对所有项目执行此操作.sapply 将返回一个向量,在这种情况下可能会很好:

<代码>>sapply( x, "[", 2 )[1]世界"那里"在"

此处输入的最后一个值 (2) 被传递给 [ 操作符,这意味着操作 x[2] 应用于每个列表元素.>

如果您想要每个列表元素的 last 项而不是第二项,我们可以在 sapply 中使用 tail调用而不是 [:

<代码>>sapply( x, 尾, 1 )[1]世界"那里"你"

这一次,我们将 tail( x, 1 ) 应用于每个列表元素,得到最后一项.

作为一种偏好,我最喜欢的应用此类操作的方式是使用 magrittr 管道,对于第二个词,如下所示:

x <- 输入 %>%strsplit("")%>%sapply( "[", 2 )>X[1]世界"那里"在"

或者最后一句话:

x <- 输入 %>%strsplit("")%>%应用(尾巴,1)>X[1]世界"那里"你"

If I have a string,

x <- "Hello World"

How can I access the second word, "World", using string split, after

x <- strsplit(x, " ")

x[[2]] does not do anything.

解决方案

As mentioned in the comments, it's important to realise that strsplit returns a list object. Since your example is only splitting a single item (a vector of length 1) your list is length 1. I'll explain with a slightly different example, inputting a vector of length 3 (3 text items to split):

input <- c( "Hello world", "Hi there", "Back at ya" )

x <- strsplit( input, " " )

> x
[[1]]
[1] "Hello" "world"

[[2]]
[1] "Hi"    "there"

[[3]]
[1] "Back" "at"   "ya"  

Notice that the returned list has 3 elements, one for each element of the input vector. Each of those list elements is split as per the strsplit call. So we can recall any of these list elements using [[ (this is what your x[[2]] call was doing, but you only had one list element, which is why you couldn't get anything in return):

> x[[1]]
[1] "Hello" "world"

> x[[3]]
[1] "Back" "at"   "ya" 

Now we can get the second part of any of those list elements by appending a [ call:

> x[[1]][2]
[1] "world"

> x[[3]][2]
[1] "at"

This will return the second item from each list element (note that the "Back at ya" input has returned "at" in this case). You can do this for all items at once using something from the apply family. sapply will return a vector, which will probably be good in this case:

> sapply( x, "[", 2 )
[1] "world" "there" "at"

The last value in the input here (2) is passed to the [ operator, meaning the operation x[2] is applied to every list element.

If instead of the second item, you'd like the last item of each list element, we can use tail within the sapply call instead of [:

> sapply( x, tail, 1 )
[1] "world" "there" "ya"

This time, we've applied tail( x, 1 ) to every list element, giving us the last item.

As a preference, my favourite way to apply actions like these is with the magrittr pipe, for the second word like so:

x <- input %>%
    strsplit( " " ) %>%
    sapply( "[", 2 )

> x
[1] "world" "there" "at"

Or for the last word:

x <- input %>%
    strsplit( " " ) %>%
    sapply( tail, 1 )

> x
[1] "world" "there" "ya" 

这篇关于访问 R 中拆分字符串的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆