我如何抓取不在标签中的网页数据 [英] How can i crawl web data that not in tags

查看：120 发布时间：2018/6/25 14:33:21 python html beautifulsoup web-crawler python-requests

本文介绍了我如何抓取不在标签中的网页数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

<div id="main-content" class="content">
<div class="metaline">
<span class="article-meta author">jorden</span>
</div>
 "
 1.name:jorden> 
 2.age:28

  --
 "
 <span class="D2"> from 111.111.111.111 </span>
  </div>

我只需要

I only need

1.name:jorden

2.age：28

1.name:jorden
2.age:28

xxx.select（'＃main-content'）这将返回所有内容，但我只需要其中的一部分。
因为它们不在任何标签中，所以我不知道该怎么做。

xxx.select('#main-content') this will return all things, but i only need part of them. Because they are not in any tags, i don't know how to do.

推荐答案

您想查找之前的标记（在您的情况下，< div class =metaline> ），然后查看下一步在HTML解析树中同步：

You want to find the tag before the text in question (in your case, <div class="metaline">) and then look at the next sibling in the HTML parse tree:

text = soup.find("div", class_='metaline').next_sibling print(text) # " # 1.name:jorden> # 2.age:28 # # -- # " #

一旦获得原始文本，它等等。

Once you get the raw text, strip it, etc.

这篇关于我如何抓取不在标签中的网页数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

我如何抓取不在标签中的网页数据 [英] How can i crawl web data that not in tags

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

我如何抓取不在标签中的网页数据 [英] How can i crawl web data that not in tags

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭