字符串解析 [英] String parsing
问题描述
下面的字符串是一段从网页返回的约20000
字符的较长字符串。我需要在包含''LastUpdated''的行的末尾隔离
的数字。我可以找到
''LastUpdated''与.find但不确定如何隔离
数字。 ''LastUpdated''保证只发生一次。如果你们其中一个字符串解析whizzes会刺伤
就会感激它。
谢谢,
jh
< input type =" hidden"名称= QUOT; RFP" value =" -1" />
<! - < input type =" hidden"名称= QUOT; EnteredBy" value =" johnxxxx" /> - >
< input type =" hidden"名称= QUOT; EnteredBy" value =" john" />
< input type =" hidden"名称= QUOT; ServiceIndex" value =" 1" />
< input type =" hidden"名称= QUOT; LASTUPDATED" value =" 1178658863" />
< input type =" hidden"名称= QUOT;下一页" value =" ../ active / active.php" />
< input type =" hidden"名称= QUOT; ExistingStatus"值= QUOT; 10" ?>
< table width =" 98%" CELLPADDING = QUOT; 0" CELLSPACING = QUOT; 0" border =" 0"
align =" center"
En Tue,May May 2007 22:09:52 -0300,HMS惊喜< jo ** @ datavoiceint.com>
$ b $bescribió:
下面的字符串是一段从网页返回的约20000
字符的较长字符串。我需要在包含''LastUpdated''的行的末尾隔离
的数字。我可以找到
''LastUpdated''与.find但不确定如何隔离
数字。 ''LastUpdated''保证只发生一次。如果你们其中一个字符串解析whizzes会刺伤
就会感激它。
< input type =" hidden"名称= QUOT; RFP" value =" -1" />
<! - < input type =" hidden"名称= QUOT; EnteredBy" value =" johnxxxx" /> - >
< input type =" hidden"名称= QUOT; EnteredBy" value =" john" />
< input type =" hidden"名称= QUOT; ServiceIndex" value =" 1" />
< input type =" hidden"名称= QUOT; LASTUPDATED" value =" 1178658863" />
< input type =" hidden"名称= QUOT;下一页" value =" ../ active / active.php" />
< input type =" hidden"名称= QUOT; ExistingStatus"值= QUOT; 10" ?>
< table width =" 98%" CELLPADDING = QUOT; 0" CELLSPACING = QUOT; 0" border =" 0"
align =" center"
你真的应该在这里使用html解析器。但是假设页面将不会改变它的结构,你可以使用像
这样的正则表达式:
expr = re.compile(r''name\s * = \s *" LastUpdated" \s + value \s * = \ s *"(。*?)"'',
re.IGNORECASE)
number = expr.search(text).group(1)
(处理未找到和重复的案例留作练习
读者)
请注意< input value =" 1178658863"类型= QUOT;隐藏"名称= QUOT; LASTUPDATED" /是
与你的html一样有效,但不符合表达式。
-
Gabriel Genellina
2007年5月8日18:09:52 -0700,HMS Surprise< jo ** @ datavoiceint.comwrote:
>
下面的字符串是一段从网页返回的约20000
字符的较长字符串。我需要在包含''LastUpdated''的行的末尾隔离
的数字。我可以找到
''LastUpdated''与.find但不确定如何隔离
数字。 ''LastUpdated''保证只发生一次。如果你们其中一个字符串解析whizzes会刺伤
就会感激它。
这个帮助?
在[7]中:s =''< input type =" hidden" name =" LastUpdated"
value =" 1178658863" />''
在[8]中:int(s.split(" =")[ - 1] .split(''"'')[1])
Out [8]:1178658863
有'这可能是一百种不同的方式,但这是第一次想到的。
干杯,
>
Tim
谢谢,
jh
< ; input type =" hidden"名称= QUOT; RFP" value =" -1" />
<! - < input type =" hidden"名称= QUOT; EnteredBy" value =" johnxxxx" /> - >
< input type =" hidden"名称= QUOT; EnteredBy" value =" john" />
< input type =" hidden"名称= QUOT; ServiceIndex" value =" 1" />
< input type =" hidden"名称= QUOT; LASTUPDATED" value =" 1178658863" />
< input type =" hidden"名称= QUOT;下一页" value =" ../ active / active.php" />
< input type =" hidden"名称= QUOT; ExistingStatus"值= QUOT; 10" ?>
< table width =" 98%" CELLPADDING = QUOT; 0" CELLSPACING = QUOT; 0" border =" 0"
align =" center"
-
http://mail.python.org/mailman/listinfo/python-list
感谢发布。你能否推荐一个可以与python或jython一起使用的HTML解析器?
john
The string below is a piece of a longer string of about 20000
characters returned from a web page. I need to isolate the number at
the end of the line containing ''LastUpdated''. I can find
''LastUpdated'' with .find but not sure about how to isolate the
number. ''LastUpdated'' is guaranteed to occur only once. Would
appreciate it if one of you string parsing whizzes would take a stab
at it.
Thanks,
jh
<input type="hidden" name="RFP" value="-1"/>
<!--<input type="hidden" name="EnteredBy" value="johnxxxx"/>-->
<input type="hidden" name="EnteredBy" value="john"/>
<input type="hidden" name="ServiceIndex" value="1"/>
<input type="hidden" name="LastUpdated" value="1178658863"/>
<input type="hidden" name="NextPage" value="../active/active.php"/>
<input type="hidden" name="ExistingStatus" value="10" ?>
<table width="98%" cellpadding="0" cellspacing="0" border="0"
align="center"
En Tue, 08 May 2007 22:09:52 -0300, HMS Surprise <jo**@datavoiceint.com>
escribió:
The string below is a piece of a longer string of about 20000
characters returned from a web page. I need to isolate the number at
the end of the line containing ''LastUpdated''. I can find
''LastUpdated'' with .find but not sure about how to isolate the
number. ''LastUpdated'' is guaranteed to occur only once. Would
appreciate it if one of you string parsing whizzes would take a stab
at it.
<input type="hidden" name="RFP" value="-1"/>
<!--<input type="hidden" name="EnteredBy" value="johnxxxx"/>-->
<input type="hidden" name="EnteredBy" value="john"/>
<input type="hidden" name="ServiceIndex" value="1"/>
<input type="hidden" name="LastUpdated" value="1178658863"/>
<input type="hidden" name="NextPage" value="../active/active.php"/>
<input type="hidden" name="ExistingStatus" value="10" ?>
<table width="98%" cellpadding="0" cellspacing="0" border="0"
align="center"You really should use an html parser here. But assuming that the page will
not change a lot its structure you could use a regular expression like
this:
expr = re.compile(r''name\s*=\s*"LastUpdated"\s+value\s*=\ s*"(.*?)"'',
re.IGNORECASE)
number = expr.search(text).group(1)
(Handling of "not found" and "duplicate" cases is left as an exercise for
the reader)
Note that <input value="1178658863" type="hidden" name="LastUpdated" /is
as valid as your html, but won''t match the expression.
--
Gabriel Genellina
On 8 May 2007 18:09:52 -0700, HMS Surprise <jo**@datavoiceint.comwrote:>
The string below is a piece of a longer string of about 20000
characters returned from a web page. I need to isolate the number at
the end of the line containing ''LastUpdated''. I can find
''LastUpdated'' with .find but not sure about how to isolate the
number. ''LastUpdated'' is guaranteed to occur only once. Would
appreciate it if one of you string parsing whizzes would take a stab
at it.
Does this help?
In [7]: s = ''<input type="hidden" name="LastUpdated"
value="1178658863"/>''
In [8]: int(s.split("=")[-1].split(''"'')[1])
Out[8]: 1178658863
There''s probably a hundred different ways of doing this, but this is
the first that came to mind.
Cheers,
Tim
Thanks,
jh
<input type="hidden" name="RFP" value="-1"/>
<!--<input type="hidden" name="EnteredBy" value="johnxxxx"/>-->
<input type="hidden" name="EnteredBy" value="john"/>
<input type="hidden" name="ServiceIndex" value="1"/>
<input type="hidden" name="LastUpdated" value="1178658863"/>
<input type="hidden" name="NextPage" value="../active/active.php"/>
<input type="hidden" name="ExistingStatus" value="10" ?>
<table width="98%" cellpadding="0" cellspacing="0" border="0"
align="center"
--
http://mail.python.org/mailman/listinfo/python-list
Thanks for posting. Could you reccommend an HTML parser that can be
used with python or jython?
john
这篇关于字符串解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!