使用Python刮嵌套的div，横跨在微博？ [英] Using Python to Scrape Nested Divs and Spans in Twitter?

查看：272 发布时间：2016/8/5 18:57:52 python html twitter web-scraping beautifulsoup

本文介绍了使用Python刮嵌套的div，横跨在微博？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想从Twitter搜索结果刮喜好和锐推。

I'm trying to scrape the likes and retweets from the results of a Twitter search.

运行Python的下方后，我得到一个空列表， [] 。我不使用Twitter的API，因为它不会在通过包括hashtag鸣叫看这远。

After running the Python below, I get an empty list, []. I'm not using the Twitter API because it doesn't look at the tweets by hashtag this far back.

在code，我使用的是：

The code I'm using is:

from bs4 import BeautifulSoup
import requests

url = 'https://twitter.com/search?q=%23bangkokbombing%20since%3A2015-08-10%20until%3A2015-09-30&src=typd&lang=en'
r  = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "lxml")
all_likes = soup.find_all('span', class_='ProfileTweet-actionCountForPresentation')
print(all_likes)

我可以成功地保存HTML使用这种code到文件。它缺少了大量的信息，当我搜索的文本，比如我找...

I can successfully save the html to file using this code. It is missing large amounts of information when I search the text, such as the class names I am looking for...

所以（的一部分）的问题显然是在准确访问源$ C $ C。

So (part of) the problem is apparently in accurately accessing the source code.

 filename = 'newfile2.txt'
 with open(filename, 'w') as handle:
      handle.writelines(str(data))

该截图显示，我试图刮跨度。

This screenshot shows the span that I'm trying to scrape.

我看这个问题，和其他人一样，但我不太到达那里。结果
<一href=\"http://stackoverflow.com/questions/27355051/how-can-i-use-beautifulsoup-to-get-deeply-nested-div-values\">How我可以使用BeautifulSoup来获得深度嵌套的DIV值？

I've looked at this question, and others like it, but I'm not quite getting there.
How can I use BeautifulSoup to get deeply nested div values?

推荐答案

看来你的GET请求返回有效的HTML但在#timeline时间单元没有鸣叫的元素。然而，增加一个用户代理请求头，似乎解决这个问题。

It seems that your GET request returns valid HTML but with no tweet elements in the #timeline element. However, adding a user agent to the request headers seems to remedy this.

from bs4 import BeautifulSoup
import requests

url = 'https://twitter.com/search?q=%23bangkokbombing%20since%3A2015-08-10%20until%3A2015-09-30&src=typd&lang=en'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
r = requests.get(url, headers=headers)
data = r.text
soup = BeautifulSoup(data, "lxml")
all_likes = soup.find_all('span', class_='ProfileTweet-actionCountForPresentation')
print(all_likes)

这篇关于使用Python刮嵌套的div，横跨在微博？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Python刮嵌套的div，横跨在微博？ [英] Using Python to Scrape Nested Divs and Spans in Twitter?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

使用Python刮嵌套的div，横跨在微博？ [英] Using Python to Scrape Nested Divs and Spans in Twitter?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭