通过 Tweepy 删除推文中的换行符 [英] Stripping Line Breaks in Tweets via Tweepy

查看:29
本文介绍了通过 Tweepy 删除推文中的换行符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找从 Twitter API 中提取数据并创建一个管道分隔文件,我可以对其进行进一步处理.我的代码目前看起来像这样:

I'm looking pull data from the Twitter API and create a pipe separated file that I can do further processing on. My code currently looks like this:

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)

out_file = "tweets.txt"

tweets = api.search(q='foo')
o = open(out_file, 'a')

for tweet in tweets:
        id = str(tweet.id)
        user = tweet.user.screen_name
        post = tweet.text
        post = post.encode('ascii', 'ignore')
        post = post.strip('|') # so pipes in tweets don't create unwanted separators
        post = post.strip('\r\n')
        record = id + "|" + user + "|" + post
        print>>o, record

当用户的推文包含换行符时,我遇到了一个问题,这使得输出数据看起来像这样:

I have a problem when a user's tweet includes line breaks which makes the output data look like this:

473565810326601730|usera|this is a tweet 
473565810325865901|userb|some other example 
406478015419876422|userc|line 
separated 
tweet
431658790543289758|userd|one more tweet

我想去掉第三条推文的换行符.除了上述之外,我还尝试过 post.strip('\n') 和 post.strip('0x0D 0x0A') 但似乎都不起作用.有什么想法吗?

I want to strip out the line breaks on the third tweet. I've tried post.strip('\n') and post.strip('0x0D 0x0A') in addition to the above but none seem to work. Any ideas?

推荐答案

那是因为 strip 返回删除了前导尾随字符的字符串副本".

That is because strip returns "a copy of the string with leading and trailing characters removed".

您应该使用 replace 作为新行和管道:

You should use replace for the new line and for the pipe:

post = post.replace('|', ' ')
post = post.replace('\n', ' ')

这篇关于通过 Tweepy 删除推文中的换行符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆