MongoDB - 从正则表达式中提取数据 [英] MongoDB - Extract Data from Regex

查看:157
本文介绍了MongoDB - 从正则表达式中提取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一项作业,需要使用 MongoDB 从一些 twitter 帖子中检索数据,并且已经解决了几个小时的问题.我需要提取提到的用户(在 Twitter 中,你 @TheirUsername 提到他们),并且很难这样做,我尝试使用 substrCP,并找到@"开始位置的索引,但无法弄清楚如何找到@"停止的位置,因为名称的长度不同,并且名称结束后可以有任何字符,例如?",."等

I have an assignment where I need to retrieve data from some twitter posts using MongoDB, and have been sitting with a problem for a few hours now. I need to extract the mentioned user (In Twitter you @TheirUsername to mention them), and have a hard time doing so, I've tried using substrCP, and finding indexes for where the "@" begins, but can't figure out how to find where the "@" stops, as names have a different length, and there can be any character after the name ends, such as "?", "." etc.

因此,我使用正则表达式模式:/@\w+/来确定推文是否包含一串字符,其中包含 @ 符号,后跟某个单词.这在确定推文中是否包含 @Someone 方面非常有效,但我仍然不知道如何提取"它.

Therefore I was using the regex pattern: /@\w+/ to find out if the tweet has a string of characters that includes an @ symbol, followed by some word. This works really well in finding out if the tweet has an @Someone in it, but I still cannot figure out how to "extract" it.

(顺便说一句.我一直在使用聚合来做到这一点,所以我可以通过 $match、$project 和最后 $sort 将它通过管道传输)

(Btw. I've been using aggregate to do this, so I could pipe it through $match, then $project, and finally $sort)

看起来像这样:

https://hastebin.com/adohogedil.bash

需要提取用户名的字符串示例是:
该死!@white_cat22 我错过了 11:11"

An example of a string that needs to extract the username is:
"damnnn! @white_cat22 i missed 11:11"

我只想要@white_cat22"部分.

Where I only want the "@white_cat22" part.

在谷歌上搜索了一下之后,我认为更好的描述方式如下,我需要在正在测试的字符串上检索匹配的正则表达式模式.

After googling a bit, I think a better way to describe it is as follows, I need to retrieve the matched regex pattern on the string that is being tested on.

如何提取提到的用户名?任何帮助将不胜感激!(已编辑)

What can I do to extract the mentioned username? Any help would be greatly appreciated! (edited)

推荐答案

它有点棘手,你必须使用 $split$unwind 运算符,然后 $match@ 如下:

Its tittle bit tricky, you have to use $split and $unwind operator and then $match with @ as below:

db.tweets.aggregate([ 
    {
        $match: { tweet: /@\w+/ }
    }, 
    {
        $project: {tweet: {$split: ["$tweet", " "]}}
    }, 
    {
        $unwind: "$tweet"
    }, 
    {
        $match: { tweet: /@\w+/  }
    } 
])

它产生的结果是,几乎与您的要求相似:

{ "_id" : ObjectId("5c61aee91765cd7b27eb473e"), "tweet" : "@white_cat22" }
{ "_id" : ObjectId("5c61aeee1765cd7b27eb473f"), "tweet" : "@white_cat23" }
{ "_id" : ObjectId("5c61aef61765cd7b27eb4740"), "tweet" : "@cat23" }
{ "_id" : ObjectId("5c61aefd1765cd7b27eb4741"), "tweet" : "@KP" }
{ "_id" : ObjectId("5c61af051765cd7b27eb4742"), "tweet" : "@kpTesting" }
{ "_id" : ObjectId("5c61af091765cd7b27eb4743"), "tweet" : "@kpTesting12" }
{ "_id" : ObjectId("5c61b4791765cd7b27eb4744"), "tweet" : "@kpTesting12" }

有关更多信息,我对上述使用过的集合的简单查找查询是:

For more information, my simple find query on above used collection are:

> db.tweets.find()
{ "_id" : ObjectId("5c61aee91765cd7b27eb473e"), "tweet" : "damnnn! @white_cat22 i missed 11:11" }
{ "_id" : ObjectId("5c61aeee1765cd7b27eb473f"), "tweet" : "damnnn! @white_cat23 i missed 11:11" }
{ "_id" : ObjectId("5c61aef61765cd7b27eb4740"), "tweet" : "damnnn! @cat23 i missed 11:11" }
{ "_id" : ObjectId("5c61aefd1765cd7b27eb4741"), "tweet" : "damnnn! @KP i missed 11:11" }
{ "_id" : ObjectId("5c61af051765cd7b27eb4742"), "tweet" : "damnnn! @kpTesting i missed 11:11" }
{ "_id" : ObjectId("5c61af091765cd7b27eb4743"), "tweet" : "damnnn! @kpTesting12 i missed 11:11" }
{ "_id" : ObjectId("5c61b4791765cd7b27eb4744"), "tweet" : "@kpTesting12 i missed 11:11" }
>

它首先包含用户名,即 @ 单词,如果用户名出现在推文句子的最后,它也将起作用.

It contains the username i.e @ word at first place as well, it will also work if the username present at the last of the tweet sentences.

它可能有帮助,但您可以随时优化此查询,我在这里发布只是为了您的理解,我不会为您提供所需的优化解决方案.

It might be helpful for, but you can always optimized this query, I am posting here just for your understanding, I am not providing you the optimized solution of what you required.

有关更多详细信息,请查看以下参考资料:

For more details please check the below reference:

$split(聚合)

$unwind(聚合)

这篇关于MongoDB - 从正则表达式中提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆