从Rails中的字符串解析/提取文本? [英] Parsing / Extracting Text from String in Rails?

查看:98
本文介绍了从Rails中的字符串解析/提取文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Rails中有一个字符串,例如这是一条Twitter消息.#books《战争与和平》,列夫·托尔斯泰.我喜欢这本书!",我想解析文本,仅提取某些短语,例如《战争与和平》,列夫·托尔斯泰".

I have a string in Rails, e.g. "This is a Twitter message. #books War & Peace by Leo Tolstoy. I love this book!", and I want to parse the text and extract only certain phrases, like "War & Peace by Leo Tolstoy".

这是使用Regex并将"#books"之间的文本提升为."的问题吗?

Is this a matter of using Regex and lifting the text between "#books" to "."?

如果消息没有任何结构,例如: 这是一条推特讯息,列奥·托尔斯泰的《战争与和平》一书,我很喜欢这本书!"或者 这是一条Twitter消息.我喜欢列夫·托尔斯泰#books所著的《战争与和平》 我如何能可靠地拉出列夫·托尔斯泰的战争与和平"一语而又不知道事前.

What if there's no structure to the message, like: "This is a Twitter message #books War & Peace by Leo Tolstoy I love this book!" or "This is a Twitter message. I love the book War & Peace by Leo Tolstoy #books" How can I reliably pull the phrase "War & Peace by Leo Tolstoy" without knowing the phrase ex ante.

是否有任何宝石,方法等可以帮助我做到这一点?

Are there any gems, methods, etc. that can help me do this?

至少,您会称呼我要做什么?这将帮助我在Google上搜索解决方案.我尝试过一些解析"搜索,但是没有运气.

At the very least, what would you call what I'm trying to do? It will help me search for a solution on Google. I've tried a few searches on "parsing" with no luck.

-编辑- 基于@rogeliog的建议,我将添加以下内容:

--- edit --- based on @rogeliog suggestion, I will add the following:

我可以忍受#books之后出现的垃圾文本,但之前没有.我尝试了匹配.(/#books.*/)"-结果在这里: www.rubular.com/r/gM7oSZxF5M .

I can live with the garbage text that comes after #books, but nothing before. I tried "match.(/#books.*/)" -- results here: www.rubular.com/r/gM7oSZxF5M.

但是如何捕获结果6? (例如,当某人将#books放在句子的末尾时)?

But how can I capture Result #6? (e.g., when someone puts #books at the end of the sentence)?

我可以用正则表达式进行if-then吗?像这样:

Is there a way for me to do an if-then with regex? Something like:

如果[#books在邮件末尾],

if [#books is at the end of the message],

然后[在#books之前使用最后10个单词,

then [take the last 10 words preceding #books],

其他[match.(/#books.*/)]

else [match.(/#books.*/)]

如果您提供正则表达式,请使用rubular.com通过永久链接发布您的解决方案

If you offer a regex, please post your solution via a permalink using rubular.com

推荐答案

我认为您正在尝试解析一些非常复杂的变体.您是否有一个包含所有书名的数据库?这将有助于分配.

I Think that you are trying to parse some pretty complex variations. Do you have a DB with all the book titles? That will help allot.

要从第一个示例中获得标题(这是一条Twitter消息.列奥·托尔斯泰#books《战争与和平》.我喜欢这本书!"),您可以简单地:

To get out the title from the first example("This is a Twitter message. #books War & Peace by Leo Tolstoy. I love this book!") you can simply:

"This is a Twitter message. #books War & Peace by Leo Tolstoy. I love this book".match(/#book.*\./).to_s.gsub("#books",'')

这将返回:列夫·托尔斯泰的战争与和平".

That will return: " War & Peace by Leo Tolstoy."

如果要根据#books是否在结尾处执行if else语句,则可以:

If you want to do an if else statement depending if #books is at the end or not, you can:

if text.match(/#books$/)
  puts text.match(/([^\s]*\s){10}(#books$)/).to_s
else
  puts text.match(/#books.*/).to_s.gsub("#books",'')
end

如果#books位于末尾,则将为您提供书籍前面的最后10个单词,如果#books位于末尾,则将为您提供#books之后的所有内容

That will give you the last 10 words preceding books if #books is at the end, and whatever it is after #books if it is not at the end

我真的没有更好的主意,希望对您有用,让我知道:)

I dont really have a better idea, hope that works for you, let me know:)

这篇关于从Rails中的字符串解析/提取文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆