我如何循环搜索下一个数据 [英] how do i loop a re.search for the next data

查看:88
本文介绍了我如何循环搜索下一个数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组数据,我使用正则表达式从html表中抓取数据



数据:

 < div class =info> 
< div class =name>< td>随机< / td>< / div>
< div class =hp>< td> 123456< / td>< / div>
< div class =email>< td> random@mail.com< / td>< / div>
< / div>

< div class =info>
< div class =name>< td> random123< / td>< / div>
< div class =hp>< td> 654321< / td>< / div>
< div class =email>< td> random123@mail.com< / td>< / div>
< / div>

正则表达式:

  matchname = re.search('\< div class =name>< td>(。*?)< / td>',match3).group(1)
matchhp = re.search('\< div class =hp>< td>(。*?)< / td>',match3).group(1)
matchemail = re .search('\< div class =email>< td>(。*?)< / td>',match3).group(1)



所以使用正则表达式我可以取出

 随机

123456

random@mail.com



<所以在将这组数据保存到我的数据库后,我想保存下一组数据,我如何获取下一组数据?我尝试使用findall然后插入到我的分贝,但一切都在1行。我需要数据在由set设置的db中。



新的python请评论哪些部分不清楚会尝试编辑

解决方案

你不应该用正则表达式解析HTML。这只是一团糟,用BS4来做。正确的做法:

 汤= BeautifulSoup(match3,html.parser)
names = []
allTds = soup.find_all(td)
for i,枚举项目(allTds [:: 3]):
#firstname hp email
names.append(( item.text,allTds [(i * 3)+1] .text,allTds [(i * 3)+2] .text))

为了回答所问的问题,我想我会加入一个你永远不应该使用的可怕的丑陋正则表达式。特别是因为它是html,所以永远不要使用正则表达式来解析html。 (请不要使用它)

 对于re.findall中的thisMatch(r< td>(。+?) (。+?)< / td>,match3,re.DOTALL):$ b(。+?)< / td> $ b print(thisMatch [0],thisMatch [1],thisMatch [2])


I have a 2 set of data i crawled from a html table using regex expression

data:

 <div class = "info"> 
   <div class="name"><td>random</td></div>
   <div class="hp"><td>123456</td></div>
   <div class="email"><td>random@mail.com</td></div> 
 </div>

 <div class = "info"> 
   <div class="name"><td>random123</td></div>
   <div class="hp"><td>654321</td></div>
   <div class="email"><td>random123@mail.com</td></div> 
 </div>

regex:

matchname = re.search('\<div class="name"><td>(.*?)</td>' , match3).group(1)
matchhp = re.search('\<div class="hp"><td>(.*?)</td>' , match3).group(1)
matchemail = re.search('\<div class="email"><td>(.*?)</td>' , match3).group(1)

so using the regex i can take out

random

123456

random@mail.com

so after saving this set of data into my database i want to save the next set how do i get the next set of data? i tried using findall then insert into my db but everything was in 1 line. I need the data to be in the db set by set.

New to python please comment on which part is unclear will try to edit

解决方案

You should not be parsing HTML with regex. It's just a mess, do it with BS4. Doing it the right way:

soup = BeautifulSoup(match3, "html.parser")
names = []
allTds = soup.find_all("td")
for i,item in enumerate(allTds[::3]):
    #            firstname   hp                email
    names.append((item.text, allTds[(i*3)+1].text, allTds[(i*3)+2].text))

And for the sake of answering the question asked I guess I'll include a horrible ugly regex that you should never use. ESPECIALLY because it's html, don't ever use regex for parsing html. (please don't use this)

for thisMatch in re.findall(r"<td>(.+?)</td>.+?<td>(.+?)</td>.+?<td>(.+?)</td>", match3, re.DOTALL):
    print(thisMatch[0], thisMatch[1], thisMatch[2])

这篇关于我如何循环搜索下一个数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆