如何使用Python每天一次抓取每日新闻? [英] How to web scrape daily news once a day using Python?

查看:711
本文介绍了如何使用Python每天一次抓取每日新闻?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图建立一个应用程序,我需要从几个网站获得每日新闻提要。一种方法是使用Python的 BeautifulSoup 库。但是,这对于在一个静态页面上有新闻的页面很有用。



让我们考虑一个像 http://www.techcrunch.com 。他们只有一个标题和更多消息,你需要点击阅读更多。对于其他几个新闻网站来说,它是相似的。如何提取这些信息并将其转储到文件txt / .dmp或任何其他类型的文件中?我应该使用什么工具?我应该采用什么方法来在Python中实现这个功能?



我需要这个脚本自动从几个网站下载新闻,每个单独的一天并将它存储在一个文件中, as,标题,日期,内容等等。我将在apache2服务器上上传这个脚本。任何建议?

解决方案


如何提取这些信息并将其转储到文件txt / .dmp或其他类型的文件?我应该使用什么工具?



获取更多消息,您需要点击阅读更多内容。

您可能利用的工具是Selenuim作为其纯浏览器自动化或iMacros。


  1. 这里是一个在服务器端利用Python的Selenium的例子。

  2. 此处是使用iMacros进行数据提取的文章(和视频)。既然你每天只需要一次,你可以安排在Win或Mac上定期运行它。


I am trying to build an application for which I need daily news feed from several websites. One way to do this is by using BeautifulSoup library of Python. However this is good for pages which have their news on one static page.

Let's consider a site like http://www.techcrunch.com. They have only one their headlines and for more news you need to click on "Read more". For several other news websites, it is similar. How do I extract such information and dump it in a file- txt/.dmp or any other kind of file? What tool should I use? What approach should I take to implement this in Python?

I need this script to automatically download news from several websites ONCE EVERY SINGLE DAY and store it in a file with categories such as, heading, date, content, etc. I would be uploading this script on apache2 server. Any suggestions?

解决方案

How do I extract such information and dump it in a file- txt/.dmp or any other kind of file? What tool should I use?

for more news you need to click on "Read more".

The tools you might leverage are Selenuim as its pure browser automation or iMacros.

  1. Here is an example of leveraging Selenium in Python, server side.
  2. Here is a post (and video) on data extraction using iMacros. Since you need it only once a day you might schedule to run it regulary in Win or Mac.

这篇关于如何使用Python每天一次抓取每日新闻?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆