字符串解析 [英] String parsing

查看:67
本文介绍了字符串解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



下面的字符串是一段从网页返回的约20000

字符的较长字符串。我需要在包含''LastUpdated''的行的末尾隔离

的数字。我可以找到

''LastUpdated''与.find但不确定如何隔离

数字。 ''LastUpdated''保证只发生一次。如果你们其中一个字符串解析whizzes会刺伤

就会感激它。


谢谢,


jh


< input type =" hidden"名称= QUOT; RFP" value =" -1" />

<! - < input type =" hidden"名称= QUOT; EnteredBy" value =" johnxxxx" /> - >

< input type =" hidden"名称= QUOT; EnteredBy" value =" john" />

< input type =" hidden"名称= QUOT; ServiceIndex" value =" 1" />

< input type =" hidden"名称= QUOT; LASTUPDATED" value =" 1178658863" />

< input type =" hidden"名称= QUOT;下一页" value =" ../ active / active.php" />

< input type =" hidden"名称= QUOT; ExistingStatus"值= QUOT; 10" ?>

< table width =" 98%" CELLPADDING = QUOT; 0" CELLSPACING = QUOT; 0" border =" 0"

align =" center"

解决方案

En Tue,May May 2007 22:09:52 -0300,HMS惊喜< jo ** @ datavoiceint.com>
$ b $bescribió:


下面的字符串是一段从网页返回的约20000

字符的较长字符串。我需要在包含''LastUpdated''的行的末尾隔离

的数字。我可以找到

''LastUpdated''与.find但不确定如何隔离

数字。 ''LastUpdated''保证只发生一次。如果你们其中一个字符串解析whizzes会刺伤

就会感激它。


< input type =" hidden"名称= QUOT; RFP" value =" -1" />

<! - < input type =" hidden"名称= QUOT; EnteredBy" value =" johnxxxx" /> - >

< input type =" hidden"名称= QUOT; EnteredBy" value =" john" />

< input type =" hidden"名称= QUOT; ServiceIndex" value =" 1" />

< input type =" hidden"名称= QUOT; LASTUPDATED" value =" 1178658863" />

< input type =" hidden"名称= QUOT;下一页" value =" ../ active / active.php" />

< input type =" hidden"名称= QUOT; ExistingStatus"值= QUOT; 10" ?>

< table width =" 98%" CELLPADDING = QUOT; 0" CELLSPACING = QUOT; 0" border =" 0"

align =" center"



你真的应该在这里使用html解析器。但是假设页面将不会改变它的结构,你可以使用像

这样的正则表达式:


expr = re.compile(r''name\s * = \s *" LastUpdated" \s + value \s * = \ s *"(。*?)"'',

re.IGNORECASE)

number = expr.search(text).group(1)

(处理未找到和重复的案例留作练习

读者)


请注意< input value =" 1178658863"类型= QUOT;隐藏"名称= QUOT; LASTUPDATED" /是

与你的html一样有效,但不符合表达式。


-

Gabriel Genellina


2007年5月8日18:09:52 -0700,HMS Surprise< jo ** @ datavoiceint.comwrote:


>

下面的字符串是一段从网页返回的约20000

字符的较长字符串。我需要在包含''LastUpdated''的行的末尾隔离

的数字。我可以找到

''LastUpdated''与.find但不确定如何隔离

数字。 ''LastUpdated''保证只发生一次。如果你们其中一个字符串解析whizzes会刺伤

就会感激它。



这个帮助?


在[7]中:s =''< input type =" hidden" name =" LastUpdated"

value =" 1178658863" />''


在[8]中:int(s.split(" =")[ - 1] .split(''"'')[1])

Out [8]:1178658863


有'这可能是一百种不同的方式,但这是第一次想到的。


干杯,

Tim


谢谢,


jh


< ; input type =" hidden"名称= QUOT; RFP" value =" -1" />

<! - < input type =" hidden"名称= QUOT; EnteredBy" value =" johnxxxx" /> - >

< input type =" hidden"名称= QUOT; EnteredBy" value =" john" />

< input type =" hidden"名称= QUOT; ServiceIndex" value =" 1" />

< input type =" hidden"名称= QUOT; LASTUPDATED" value =" 1178658863" />

< input type =" hidden"名称= QUOT;下一页" value =" ../ active / active.php" />

< input type =" hidden"名称= QUOT; ExistingStatus"值= QUOT; 10" ?>

< table width =" 98%" CELLPADDING = QUOT; 0" CELLSPACING = QUOT; 0" border =" 0"

align =" center"


-
http://mail.python.org/mailman/listinfo/python-list


感谢发布。你能否推荐一个可以与python或jython一起使用的HTML解析器?

john



The string below is a piece of a longer string of about 20000
characters returned from a web page. I need to isolate the number at
the end of the line containing ''LastUpdated''. I can find
''LastUpdated'' with .find but not sure about how to isolate the
number. ''LastUpdated'' is guaranteed to occur only once. Would
appreciate it if one of you string parsing whizzes would take a stab
at it.

Thanks,

jh

<input type="hidden" name="RFP" value="-1"/>
<!--<input type="hidden" name="EnteredBy" value="johnxxxx"/>-->
<input type="hidden" name="EnteredBy" value="john"/>
<input type="hidden" name="ServiceIndex" value="1"/>
<input type="hidden" name="LastUpdated" value="1178658863"/>
<input type="hidden" name="NextPage" value="../active/active.php"/>
<input type="hidden" name="ExistingStatus" value="10" ?>
<table width="98%" cellpadding="0" cellspacing="0" border="0"
align="center"

解决方案

En Tue, 08 May 2007 22:09:52 -0300, HMS Surprise <jo**@datavoiceint.com>
escribió:

The string below is a piece of a longer string of about 20000
characters returned from a web page. I need to isolate the number at
the end of the line containing ''LastUpdated''. I can find
''LastUpdated'' with .find but not sure about how to isolate the
number. ''LastUpdated'' is guaranteed to occur only once. Would
appreciate it if one of you string parsing whizzes would take a stab
at it.

<input type="hidden" name="RFP" value="-1"/>
<!--<input type="hidden" name="EnteredBy" value="johnxxxx"/>-->
<input type="hidden" name="EnteredBy" value="john"/>
<input type="hidden" name="ServiceIndex" value="1"/>
<input type="hidden" name="LastUpdated" value="1178658863"/>
<input type="hidden" name="NextPage" value="../active/active.php"/>
<input type="hidden" name="ExistingStatus" value="10" ?>
<table width="98%" cellpadding="0" cellspacing="0" border="0"
align="center"

You really should use an html parser here. But assuming that the page will
not change a lot its structure you could use a regular expression like
this:

expr = re.compile(r''name\s*=\s*"LastUpdated"\s+value\s*=\ s*"(.*?)"'',
re.IGNORECASE)
number = expr.search(text).group(1)
(Handling of "not found" and "duplicate" cases is left as an exercise for
the reader)

Note that <input value="1178658863" type="hidden" name="LastUpdated" /is
as valid as your html, but won''t match the expression.

--
Gabriel Genellina


On 8 May 2007 18:09:52 -0700, HMS Surprise <jo**@datavoiceint.comwrote:

>
The string below is a piece of a longer string of about 20000
characters returned from a web page. I need to isolate the number at
the end of the line containing ''LastUpdated''. I can find
''LastUpdated'' with .find but not sure about how to isolate the
number. ''LastUpdated'' is guaranteed to occur only once. Would
appreciate it if one of you string parsing whizzes would take a stab
at it.

Does this help?

In [7]: s = ''<input type="hidden" name="LastUpdated"
value="1178658863"/>''

In [8]: int(s.split("=")[-1].split(''"'')[1])
Out[8]: 1178658863

There''s probably a hundred different ways of doing this, but this is
the first that came to mind.

Cheers,

Tim

Thanks,

jh

<input type="hidden" name="RFP" value="-1"/>
<!--<input type="hidden" name="EnteredBy" value="johnxxxx"/>-->
<input type="hidden" name="EnteredBy" value="john"/>
<input type="hidden" name="ServiceIndex" value="1"/>
<input type="hidden" name="LastUpdated" value="1178658863"/>
<input type="hidden" name="NextPage" value="../active/active.php"/>
<input type="hidden" name="ExistingStatus" value="10" ?>
<table width="98%" cellpadding="0" cellspacing="0" border="0"
align="center"

--
http://mail.python.org/mailman/listinfo/python-list


Thanks for posting. Could you reccommend an HTML parser that can be
used with python or jython?
john


这篇关于字符串解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆