R编程中的Hashtag Extract函数 [英] Hashtag Extract function in R Programming

查看：128 发布时间：2018/4/17 10:17:05 r function if-statement hashtag

本文介绍了R编程中的Hashtag Extract函数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图在R中创建一个hashtag提取函数。这个函数会从帖子中提取一个hashtags，如果有的话，否则会给出一个空白。我的函数就像

  hashtag_extract = function（text）{
 match = str_extract_all（text，＃\\\ \\ S +）
 if（match）{
 return match 
} else {
 return''}} 
 String =＃letsdoit #Tonewbeginnign world is on a新的＃路线

但是我的功能不起作用，显示出大量的错误。比如第一个错误是

 错误：意外符号在：
if（match）{
 return match

所以我想将它应用为

  hashatag_extract（字符串）

答案应该像

  #letsdoit ## Tonewbeginnign #route

最后我会用sapply在整列上应用这个函数，这就是为什么If部分很重要，请忽略R的缩进，因为它不是im portant for R，但每个建议都会有所帮助

解决方案

Hashtag正则表达式并非如此简单

我不确定你是否理解标准中常用的规则
我不相信 str_extract_all / code>正在返回您认为它的结果

只需使用 stringi 即 stringr code>函数建立在之上 p>这应该可以处理大部分（即使不是全部）的情况： get_tags< - function（x）{ ＃via http://stackoverflow.com/a/5768660/1457051 twitter_hashtag_regex< - （^ | [^& \\p {L} \\p {M} \\\ \\p {钕} _\\\‌\\\‍\\\꙾\\\־\\\׳\\\״\\\゛\\\゜\\\゠\\\・\\\〃\\\་\\\༌\\\·]）（＃| \\\＃）（？\\\️ | \\\⃣）（[\\p {L} \\ p {M} \\p {钕} _\\\‌\\\‍\\\꙾\\\־\\\׳\\\״\\\゛\\\゜\\\゠\\\・\\\〃\\\་\\ \༌\\\·] * [\\p {L} \\p {M}] [\\p {L} \\p {M} \\p {}的Nd _\\\‌\\\‍\\\꙾\\\־\\\׳\\\״\\\゛\\\゜\\\゠\\\・\\\〃\\\་\\\༌\\\·] *） stringi :: stri_match_all_regex（x，hashtag_regex）％>％ purrr :: map（〜。[，4]）％>％ purrr :: flatten_chr（）测试< -c（＃teste_teste //下划线接受，＃teste-teste //连字符不被接受， #leof_gfg。 sdfsd // dot not accepted，＃f34234 @ 45＃6fgh6 // @ not accepted，＃leo＃leo2＃asd //跟随hastag无空格，＃6663 //只接受数字， _＃asd_ // hashtag无法启动或fini sh带下划线， - ＃sdfsdf- // hashtag不能以连字符开始或结束，。＃sdfsdf。 //标签无法以点开始或结束， #leo_leo__leo__leo____leo // decline after underline） get_tags（测试） ## [ 1]teste_testeteste ## [3]leof_gfgf34234 ## [5]leoNA ## [7] NAsdfsdf ## [9]sdfsdfleo_leo__leo__leo____leo your_string< - #letsdoit #Tonewbeginnign世界位于新的＃路线 get_tags（your_string ） ## [1]letsdoitTonewbeginnign 您需要如果您需要将每组哈希标签与每个输入向量进行分组，但是您并未提供有关您真正想要完成的功能的详细信息，请调整该功能。 I am trying to create an hashtag extraction function in R. This function will extract a hashtags from a post, if there are any, else will give a blank. My function is like hashtag_extract= function(text){ match = str_extract_all(text,"#\\S+") if (match) { return match }else{ return ''}} String="#letsdoit #Tonewbeginnign world is on a new#route But my function is not working, showing me tons of errors.like 1st error is Error: unexpected symbol in: " if (match) { return match" so I want to apply it as hashatag_extract(string) and answer should come like #letsdoit ##Tonewbeginnign #route And eventually I will use sapply to apply this function on whole column, that's why the If part is important. Please ignore my indentation for R, since its not important for R, but every suggestion will be helpful 解决方案 Hashtag regexes aren't that simple I'm not sure you understand the commonly accepted "rules" for hashtags I do not believe str_extract_all() is returning what you think it is Just use stringi which stringr functions are built on top of Folks rly need to stop analyzing tweets This should handle most, if not all, cases: get_tags <- function(x) { # via http://stackoverflow.com/a/5768660/1457051 twitter_hashtag_regex <- "(^|[^&\\p{L}\\p{M}\\p{Nd}_\u200c\u200d\ua67e\u05be\u05f3\u05f4\u309b\u309c\u30a0\u30fb\u3003\u0f0b\u0f0c\u00b7])(#|\uFF03)(?!\uFE0F|\u20E3)([\\p{L}\\p{M}\\p{Nd}_\u200c\u200d\ua67e\u05be\u05f3\u05f4\u309b\u309c\u30a0\u30fb\u3003\u0f0b\u0f0c\u00b7]*[\\p{L}\\p{M}][\\p{L}\\p{M}\\p{Nd}_\u200c\u200d\ua67e\u05be\u05f3\u05f4\u309b\u309c\u30a0\u30fb\u3003\u0f0b\u0f0c\u00b7]*)" stringi::stri_match_all_regex(x, hashtag_regex) %>% purrr::map(~.[,4]) %>% purrr::flatten_chr() } tests <- c("#teste_teste //underscore accepted", "#teste-teste //Hyphen not accepted", "#leof_gfg.sdfsd //dot not accepted", "#f34234@45#6fgh6 // @ not accepted", "#leo#leo2#asd //followed hastag without space ", "#6663 // only number accepted", "_#asd_ // hashtag can't start or finish with underscore", "-#sdfsdf- // hashtag can't start or finish with hyphen", ".#sdfsdf. // hashtag can't start or finish with dot", "#leo_leo__leo__leo____leo // decline followed underline") get_tags(tests) ## [1] "teste_teste" "teste" ## [3] "leof_gfg" "f34234" ## [5] "leo" NA ## [7] NA "sdfsdf" ## [9] "sdfsdf" "leo_leo__leo__leo____leo" your_string <- "#letsdoit #Tonewbeginnign world is on a new#route" get_tags(your_string) ## [1] "letsdoit" "Tonewbeginnign" You'll need to tweak the function if you need each set of hashtags to be grouped with each input vector but you really didn't provide much detail on what you're really trying to accomplish. 这篇关于R编程中的Hashtag Extract函数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

R编程中的Hashtag Extract函数 [英] Hashtag Extract function in R Programming

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R编程中的Hashtag Extract函数 [英] Hashtag Extract function in R Programming

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭