从data.frame中选择行,以R中的特定字符串结尾 [英] Select rows from data.frame ending with a specific character string in R

查看:2442
本文介绍了从data.frame中选择行,以R中的特定字符串结尾的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用R,我有一个data.frame有接近2000条如下:

I'm using R and I have a data.frame with nearly 2,000 entries that looks as follows:

> head(PVs,15)
     LogFreq   Word PhonCV  FreqDev
1593     140    was    CVC 5.480774
482      139    had    CVC 5.438114
1681     138    zou   CVVC 5.395454
1662     137    zei    CVV 5.352794
1619     136   werd   CVCC 5.310134
1592     135  waren CVV-CV 5.267474
620      134    kon    CVC 5.224814
646      133   kwam   CCVC 5.182154
483      132 hadden CVC-CV 5.139494
436      131   ging    CVC 5.096834
734      130  moest  CVVCC 5.054174
1171     129  stond  CCVCC 5.011514
1654     128    zag    CVC 4.968854
1620     127 werden CVC-CV 4.926194
1683     126 zouden CVV-CV 4.883534

我想要做的是创建一个新的data.frame等于PV,除了具有作为字列的成员的字符串不以te或de结尾的字符串的所有条目被去除。即不应以de或te结尾的所有字应从data.frame中删除。

What I want to do is to create a new data.frame that is equal to PVs, except that all entries having as a member of the "Word" column a string of character that does NOT end in either "te" or "de" removed. i.e. All words not ending in either "de" or "te" should be removed from the data.frame.

我知道如何使用逻辑运算符从data.frames中选择性地删除条目,但是当您设置数字标准时,这些条目会起作用。我想要这样做,我需要使用正则表达式,但可悲的是,R是唯一的编程语言,我知道,所以我还远远不知道什么类型的代码在这里使用。

I know how to slectively remove entries from data.frames using logical operators, but those work when you're setting numeric criteria. I think to do this I need to use regular expressions, but sadly R is the only programming language I "know", so I'm far from knowing what type of code to use here.

感谢您的帮助。
提前感谢。

I appreciate your help. Thanks in advance.

推荐答案

方法1

可以使用适当的正则表达式使用 grepl 。考虑以下内容:

You can use grepl with an appropraite regular expression. Consider the following:

x <- c("blank","wade","waste","rubbish","dedekind","bated")
grepl("^.+(de|te)$",x)
[1] FALSE  TRUE  TRUE FALSE FALSE FALSE

正则表达式开始( ^ )任何次数c $ c>。+ )然后找到de或te((de | te))then end( $ )。

The regular expression says begin (^) with anything any number of times (.+) and then find either de or te ((de|te)) then end ($).

因此,对于你的data.frame尝试,

So for your data.frame try,

subset(PVs,grepl("^.+(de|te)$",Word))

方法2

要避免使用regexp方法,您可以使用 substr 方法代替。

To avoid the regexp method you can use a substr method instead.

# substr the last two characters and test
substr(x,nchar(x)-1,nchar(x)) %in% c("de","te")
[1] FALSE  TRUE  TRUE FALSE FALSE FALSE

所以尝试:

subset(PVs,substr(Word,nchar(Word)-1,nchar(Word)) %in% c("de","te"))

这篇关于从data.frame中选择行,以R中的特定字符串结尾的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆