使用Python刮嵌套的div,横跨在微博? [英] Using Python to Scrape Nested Divs and Spans in Twitter?

查看:272
本文介绍了使用Python刮嵌套的div,横跨在微博?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从Twitter搜索结果刮喜好和锐推。

I'm trying to scrape the likes and retweets from the results of a Twitter search.

运行Python的下方后,我得到一个空列表, [] 。我不使用Twitter的API,因为它不会在通过包括hashtag鸣叫看这远。

After running the Python below, I get an empty list, []. I'm not using the Twitter API because it doesn't look at the tweets by hashtag this far back.

在code,我使用的是:

The code I'm using is:

from bs4 import BeautifulSoup
import requests

url = 'https://twitter.com/search?q=%23bangkokbombing%20since%3A2015-08-10%20until%3A2015-09-30&src=typd&lang=en'
r  = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "lxml")
all_likes = soup.find_all('span', class_='ProfileTweet-actionCountForPresentation')
print(all_likes)

我可以成功地保存HTML使用这种code到文件。它缺少了大量的信息,当我搜索的文本,比如我找...

I can successfully save the html to file using this code. It is missing large amounts of information when I search the text, such as the class names I am looking for...

所以(的一部分)的问题显然是在准确访问源$ C ​​$ C。

So (part of) the problem is apparently in accurately accessing the source code.

 filename = 'newfile2.txt'
 with open(filename, 'w') as handle:
      handle.writelines(str(data))

该截图显示,我试图刮跨度。

This screenshot shows the span that I'm trying to scrape.

恰好的跨度和内容我想凑截图。

我看这个问题,和其他人一样,但我不太到达那里。结果
<一href=\"http://stackoverflow.com/questions/27355051/how-can-i-use-beautifulsoup-to-get-deeply-nested-div-values\">How我可以使用BeautifulSoup来获得深度嵌套的DIV值?

I've looked at this question, and others like it, but I'm not quite getting there.
How can I use BeautifulSoup to get deeply nested div values?

推荐答案

看来你的GET请求返回有效的HTML但在#timeline时间单元没有鸣叫的元素。然而,增加一个用户代理请求头,似乎解决这​​个问题。

It seems that your GET request returns valid HTML but with no tweet elements in the #timeline element. However, adding a user agent to the request headers seems to remedy this.

from bs4 import BeautifulSoup
import requests

url = 'https://twitter.com/search?q=%23bangkokbombing%20since%3A2015-08-10%20until%3A2015-09-30&src=typd&lang=en'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
r = requests.get(url, headers=headers)
data = r.text
soup = BeautifulSoup(data, "lxml")
all_likes = soup.find_all('span', class_='ProfileTweet-actionCountForPresentation')
print(all_likes)

这篇关于使用Python刮嵌套的div,横跨在微博?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆