从HTML字符串中提取字符串 [英] Extract string from HTML String

查看:337
本文介绍了从HTML字符串中提取字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从html字符串中提取一个数字(我通常不知道该数字).

i want to extract a number from a html string (i usually do not know the number).

关键部分如下:

<test test="3" test="search_summary_figure WHR WVM">TOTAL : 286</test>
<tagend>

我想提取"286".我想做一些类似的事情,例如在"L:之后开始",在<"之前停止. 我怎样才能做到这一点 ?提前非常感谢您.

And i want to extract the "286". I want to do something like "start after "L :" and stop before "<". How can i do this ? Thank you very much in advance.

推荐答案

如果字符串"TOTAL:number"是唯一的,则使用正则表达式首先搜索该子字符串,然后从中提取数字.

If the string "TOTAL : number" is unique then use a regular expression to first search this substring and then extract the number from it.

import re

string = 'test test="3" test="search_summary_figure WHR WVM">TOTAL : 286</test>'

reg__expr = r'TOTAL\s:\s\d+'  # TOTAL<whitespace>:<whitespace><number>
# find the substring
result = re.findall(reg__expr, string)
if result:

   substring = result[0]

   reg__expr = r'\d+'  # <number>
   result = re.findall(reg__expr, substring)
   number = int(result[0])

   print(number)

您可以在此处测试自己的正则表达式 https://regex101.com/

You can test your own regular expressions here https://regex101.com/

这篇关于从HTML字符串中提取字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆