MongoDB - 从正则表达式中提取数据 [英] MongoDB - Extract Data from Regex

查看：157 发布时间：2021/6/3 20:50:07 regex mongodb mongodb-query aggregation-framework

本文介绍了MongoDB - 从正则表达式中提取数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一项作业，需要使用 MongoDB 从一些 twitter 帖子中检索数据，并且已经解决了几个小时的问题.我需要提取提到的用户(在 Twitter 中，你 @TheirUsername 提到他们)，并且很难这样做，我尝试使用 substrCP，并找到@"开始位置的索引，但无法弄清楚如何找到@"停止的位置，因为名称的长度不同，并且名称结束后可以有任何字符，例如?"，."等

I have an assignment where I need to retrieve data from some twitter posts using MongoDB, and have been sitting with a problem for a few hours now. I need to extract the mentioned user (In Twitter you @TheirUsername to mention them), and have a hard time doing so, I've tried using substrCP, and finding indexes for where the "@" begins, but can't figure out how to find where the "@" stops, as names have a different length, and there can be any character after the name ends, such as "?", "." etc.

因此，我使用正则表达式模式:/@\w+/来确定推文是否包含一串字符，其中包含 @ 符号，后跟某个单词.这在确定推文中是否包含 @Someone 方面非常有效，但我仍然不知道如何提取"它.

Therefore I was using the regex pattern: /@\w+/ to find out if the tweet has a string of characters that includes an @ symbol, followed by some word. This works really well in finding out if the tweet has an @Someone in it, but I still cannot figure out how to "extract" it.

(顺便说一句.我一直在使用聚合来做到这一点，所以我可以通过 $match、$project 和最后 $sort 将它通过管道传输)

(Btw. I've been using aggregate to do this, so I could pipe it through $match, then $project, and finally $sort)

看起来像这样:

https://hastebin.com/adohogedil.bash

需要提取用户名的字符串示例是:
该死！@white_cat22 我错过了 11:11"

An example of a string that needs to extract the username is:
"damnnn! @white_cat22 i missed 11:11"

我只想要@white_cat22"部分.

Where I only want the "@white_cat22" part.

在谷歌上搜索了一下之后，我认为更好的描述方式如下，我需要在正在测试的字符串上检索匹配的正则表达式模式.

After googling a bit, I think a better way to describe it is as follows, I need to retrieve the matched regex pattern on the string that is being tested on.

如何提取提到的用户名?任何帮助将不胜感激！(已编辑)

What can I do to extract the mentioned username? Any help would be greatly appreciated! (edited)

推荐答案

它有点棘手，你必须使用 $split 和 $unwind 运算符，然后 $match 和 @ 如下:

Its tittle bit tricky, you have to use $split and $unwind operator and then $match with @ as below:

db.tweets.aggregate([ 
    {
        $match: { tweet: /@\w+/ }
    }, 
    {
        $project: {tweet: {$split: ["$tweet", " "]}}
    }, 
    {
        $unwind: "$tweet"
    }, 
    {
        $match: { tweet: /@\w+/  }
    } 
])

它产生的结果是，几乎与您的要求相似:

{ "_id" : ObjectId("5c61aee91765cd7b27eb473e"), "tweet" : "@white_cat22" }
{ "_id" : ObjectId("5c61aeee1765cd7b27eb473f"), "tweet" : "@white_cat23" }
{ "_id" : ObjectId("5c61aef61765cd7b27eb4740"), "tweet" : "@cat23" }
{ "_id" : ObjectId("5c61aefd1765cd7b27eb4741"), "tweet" : "@KP" }
{ "_id" : ObjectId("5c61af051765cd7b27eb4742"), "tweet" : "@kpTesting" }
{ "_id" : ObjectId("5c61af091765cd7b27eb4743"), "tweet" : "@kpTesting12" }
{ "_id" : ObjectId("5c61b4791765cd7b27eb4744"), "tweet" : "@kpTesting12" }

有关更多信息，我对上述使用过的集合的简单查找查询是:

For more information, my simple find query on above used collection are:

> db.tweets.find()
{ "_id" : ObjectId("5c61aee91765cd7b27eb473e"), "tweet" : "damnnn! @white_cat22 i missed 11:11" }
{ "_id" : ObjectId("5c61aeee1765cd7b27eb473f"), "tweet" : "damnnn! @white_cat23 i missed 11:11" }
{ "_id" : ObjectId("5c61aef61765cd7b27eb4740"), "tweet" : "damnnn! @cat23 i missed 11:11" }
{ "_id" : ObjectId("5c61aefd1765cd7b27eb4741"), "tweet" : "damnnn! @KP i missed 11:11" }
{ "_id" : ObjectId("5c61af051765cd7b27eb4742"), "tweet" : "damnnn! @kpTesting i missed 11:11" }
{ "_id" : ObjectId("5c61af091765cd7b27eb4743"), "tweet" : "damnnn! @kpTesting12 i missed 11:11" }
{ "_id" : ObjectId("5c61b4791765cd7b27eb4744"), "tweet" : "@kpTesting12 i missed 11:11" }
>

它首先包含用户名，即 @ 单词，如果用户名出现在推文句子的最后，它也将起作用.

It contains the username i.e @ word at first place as well, it will also work if the username present at the last of the tweet sentences.

它可能有帮助，但您可以随时优化此查询，我在这里发布只是为了您的理解，我不会为您提供所需的优化解决方案.

It might be helpful for, but you can always optimized this query, I am posting here just for your understanding, I am not providing you the optimized solution of what you required.

有关更多详细信息，请查看以下参考资料:

For more details please check the below reference:

$split(聚合)

$unwind(聚合)

这篇关于MongoDB - 从正则表达式中提取数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

MongoDB - 从正则表达式中提取数据 [英] MongoDB - Extract Data from Regex

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

MongoDB - 从正则表达式中提取数据 [英] MongoDB - Extract Data from Regex

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭