FeedParser,删除特殊字符并写入CSV [英] FeedParser, Removing Special Characters and Writing to CSV

查看:124
本文介绍了FeedParser,删除特殊字符并写入CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习Python.我为自己设定了构建RSS刮板的微不足道的目标.我正在尝试收集作者,链接和标题.从那里我要写一个CSV.

I'm learning Python. I've set myself a wee goal of building a RSS scraper. I'm trying to gather the Author, Link and Title. From there I want to write to a CSV.

我遇到一些问题.自昨晚以来,我一直在寻找答案,但似乎找不到解决方案.我确实有种感觉,就是在解析什么feedparser并将其移动到CSV之间时,我会丢失一些知识,但是我还没有词汇可以了解Google的内容.

I'm encountering some problems. I've search for the answer since last night but can't seem to find a solution. I do have a feeling that is a bit of knowledge that I'm missing between what feedparser is parsing and moving it to a CSV but I don't have the vocabulary yet to know what to Google.

  1. 如何删除特殊字符(例如'['和''')?
  2. 创建新文件时,如何写作者,链接和标题到新行?

1个特殊字符

rssurls = 'http://feeds.feedburner.com/TechCrunch/'

techart = feedparser.parse(rssurls)
# feeds = []

# for url in rssurls:
#     feedparser.parse(url)
# for feed in feeds:
#     for post in feed.entries:
#         print(post.title)

# print(feed.entires)

techdeets = [post.author + " , " + post.title + " , " + post.link  for post in techart.entries]
techdeets = [y.strip() for y in techdeets]
techdeets

输出:我得到了所需的信息,但是.strip标签没有剥离.

Output: I get the information I need but the .strip tag doesn't strip.

['Darrell Etherington,Spin推出首个由城市批准的无船坞码头 湾区的单车共享 http://feedproxy.google.com/~r/Techcrunch/~3/BF74UZWBinI/','Ryan Lawler,凭借530万美元的融资,CarDash希望改变您的方式 让您的汽车得到维修, http://feedproxy.google.com/~r/Techcrunch/~3/pkamfdPAhhY/','Ron Miller的AlienVault插件在Dark Web上搜索被盗的密码 , http://feedproxy.google.com/~r/Techcrunch/~ 3/VbmdS0ODoSo/','Lucas Matney,适用于Windows的Firefox获得了本机WebVR支持,性能 碰到最新更新, http://feedproxy.google.com/~r/Techcrunch/〜3/j91jQJm-f2E/',...]

['Darrell Etherington , Spin launches first city-sanctioned dockless bike sharing in Bay Area , http://feedproxy.google.com/~r/Techcrunch/~3/BF74UZWBinI/', 'Ryan Lawler , With $5.3 million in funding, CarDash wants to change how you get your car serviced , http://feedproxy.google.com/~r/Techcrunch/~3/pkamfdPAhhY/', 'Ron Miller , AlienVault plug-in searches for stolen passwords on Dark Web , http://feedproxy.google.com/~r/Techcrunch/~3/VbmdS0ODoSo/', 'Lucas Matney , Firefox for Windows gets native WebVR support, performance bumps in latest update , http://feedproxy.google.com/~r/Techcrunch/~3/j91jQJm-f2E/',...]

2)写入CSV

import csv

savedfile = open('/test1.txt', 'w')
savedfile.write(str(techdeets) + "/n")
savedfile.close()

import pandas as pd
df = pd.read_csv('/test1.txt', encoding='cp1252')
df

输出: 输出是一个只有1行多列的数据框.

Output: The output was a dataframe with only 1 row and multiple columns.

推荐答案

您快到了:-)

首先使用熊猫创建一个数据框然后保存它,例如从代码继续"之类的方法:

How about using pandas to create a dataframe first then save it, something like this "continuing from your code":

df = pd.DataFrame(columns=['author', 'title', 'link'])
for i, post in enumerate(techart.entries):
    df.loc[i] = post.author, post.title, post.link

然后您可以保存它:

df.to_csv('myfilename.csv', index=False)

OR

您还可以直接从feedparser条目写入数据框:

OR

you can also write into the dataframe straight from the feedparser entries:

>>> import feedparser
>>> import pandas as pd
>>>
>>> rssurls = 'http://feeds.feedburner.com/TechCrunch/'
>>> techart = feedparser.parse(rssurls)
>>>
>>> df = pd.DataFrame()
>>>
>>> df['author'] = [post.author for post in techart.entries]
>>> df['title'] = [post.title for post in techart.entries]
>>> df['link'] = [post.link for post in techart.entries]

这篇关于FeedParser,删除特殊字符并写入CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆