正则表达式从文本中提取JSON字符串 [英] regular expression to extract JSON string from text

查看:1405
本文介绍了正则表达式从文本中提取JSON字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找正则表达式来从文本中提取json字符串。
我有以下文本,其中包含

  JSON字符串(mTitle,mPoster,mYear,mDate)

如下:

 <$ c $$ 999999999:138138138,020202020202:{846:{mTitle:\\\а,mPoster:{
small:\ / upload \ / ms\ / b_248.jpg,middle:600.jpg,big:400.jpg},mYear
:2013,mDate 2014年1月1日 }, 847 :{ mTitle : \\\к mPoster: 小: \ / upload\ / ms\ / 241.jpg, 中:600.jpg,big:
138.jpg},mYear:2013,mDate:2013-12-26},848 mTitle:\\\П,mPoster:{small:\ / upload\ / movies\ / 2
40.jpg,middle:138.jpg big:131.jpg},mYear:2013​​,mDate:2013-12-19}}}

为了解析JSON字符串,我应该从文本中提取JSON字符串。
这就是为什么,我的问题:你能帮我从文本中只得到JSON字符串
吗?请帮忙。



我没有成功尝试过这个正则表达式:

  {mTitle:( \w | \W)*mDate:( \w | \W)*} 


解决方案

以下正则表达式应该可以工作:

  \ {\s * mTitle \s *:\s * \s *,\s * mPoster(+。):(?+)\s * \ S *,\s * mYear \s *:\s * \s *,\s * mDate \s *(+?)(?+)\s * \s * \} 

检查演示



与正则表达式的主要区别是。+?部分,分解,意思是:




  • 匹配任何字符(

  • 一次或多次( +

  • 尽可能少(



运算符在 + 在这里非常重要---因为如果你删除它,第一个。+ (在 \ {\s *mTitle\s *:\s * (。+?))将匹配整个文本文本直到mPoster请注意,这只是一个更复杂的版本的 \ {mTitle:(。+?), mPost:(。+?),mYear:(。+?),mDate:(。+?)\} (with \s * 匹配空格,由JSON符号允许)。


I'm looking for regex to extract json string from text. I have the text below, which contains

JSON string(mTitle, mPoster, mYear, mDate)

like that:

{"999999999":"138138138","020202020202":{"846":{"mTitle":"\u0430","mPoster":{"
small":"\/upload\/ms\/b_248.jpg","middle":"600.jpg","big":"400.jpg"},"mYear"
:"2013","mDate":"2014-01-01"},"847":{"mTitle":"\u043a","mPoster":"small":"\/upload\/ms\/241.jpg","middle":"600.jpg","big":"
138.jpg"},"mYear":"2013","mDate":"2013-12-26"},"848":{"mTitle":"\u041f","mPoster":{"small":"\/upload\/movies\/2
40.jpg","middle":"138.jpg","big":"131.jpg"},"mYear":"2013","mDate":"2013-12-19"}}}

In order to parse JSON string I should extract JSON string from the text. That is why, my question: Could you help me to get only JSON string from text? Please help.

I've tried this regular expression with no success:

{"mTitle":(\w|\W)*"mDate":(\w|\W)*}

解决方案

The following regex should work:

\{\s*"mTitle"\s*:\s*(.+?)\s*,\s*"mPoster":\s*(.+?)\s*,\s*"mYear"\s*:\s*(.+?)\s*,\s*"mDate"\s*:\s*(.+?)\s*\}

Check demo here.

The main difference from your regex is the .+? part, that, broken down, means:

  • Match any character (.)
  • One or more times (+)
  • As little as possible (?)

The ? operator after the + is very important here --- because if you removed it, the first .+ (in \{\s*"mTitle"\s*:\s*(.+?)) would match the whole text, not the text up to the "mPoster" word, that is what you want.

Notice it is just a more complicated version of \{"mTitle":(.+?),"mPoster":(.+?),"mYear":(.+?),"mDate":(.+?)\} (with \s* to match spaces, allowed by the JSON notation).

这篇关于正则表达式从文本中提取JSON字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆