Python将文件保存到csv [英] Python save file to csv
问题描述
我有以下代码在Twitter的tweets中获取,应该处理数据,然后保存到一个新的文件。
这是代码:
#import regex
/ pre>
import re
#start process_tweet
def processTweet :
#处理tweets
#转换为小写
tweet = tweet.lower()
#转换www。*或https?:// * URL
tweet = re.sub('((www\。[\s] +)|(https?:// [^ \s] +))','URL',tweet)
#Convert @username to AT_USER
tweet = re.sub('@ [^ \s] +','AT_USER',tweet)
#移除额外的空格
tweet = re.sub('[\s] +','',tweet)
#Replace #word with word
tweet = re.sub(r'#([^ \s] + )',r'\1',tweet)
#trim
tweet = tweet.strip('\'')
return tweet
#end
#逐个读取tweets并处理
input = open('withoutEmptylines.csv','rb')
output = open('editedTweets.csv','wb' )
line = input.readline()
while:
processedTweet = processTweet(line)
print(processedTweet)
output .write(processedTweet)
line = input.readline()
input.close()
output.close()
我在输入文件中的数据看起来像这样,所以每一个tweet在一行:
<$ p $她想骑我的宝马去我的宝马lol http://t.co/FeoNg48AQZ
宝马看到美国作为2015年顶级市场i8 http:// t .co / kkFyiBDcaP
我的功能运行良好,但我不满意的输出看起来像这个:
她想骑我的宝贝去乘坐我的bmw lol URL rt AT_USERÐunbmw es mucho?哟:宝马。 -AT_USER veeergaaa! hahahahahahahahahaha nos hiciste la noche caray!
,所以它将所有内容放在一行,而不是每行中的tweet,文件。
有人想在一行中获得每条推文吗?
p>使用这样的示例文件:
tweet第一个
tweet第二个
tweet第三个
此代码:
file = open('tweets.txt')
文件中的行:
打印行
产生此输出:
tweet第一个
tweet第二个
推文第三个
Python正在阅读的最后一行,但是你的脚本通过正则表达式替换它们。
此正则表达式替换:
tweet = re.sub [\ s] +','',tweet)
正在转换所有的空格字符(例如制表符和新行)转换为单个空格。
在输出之前,在推文上添加尾标,或者修改正则表达式, / p>
tweet = re.sub('[] +','',tweet)
编辑:我把测试替换命令放在那里。该建议已修复。
I have the following code that gets in Twitter tweets and should process the data and after that save into a new file.
This is the code:
#import regex
import re
#start process_tweet
def processTweet(tweet):
# process the tweets
#Convert to lower case
tweet = tweet.lower()
#Convert www.* or https?://* to URL
tweet = re.sub('((www\.[\s]+)|(https?://[^\s]+))','URL',tweet)
#Convert @username to AT_USER
tweet = re.sub('@[^\s]+','AT_USER',tweet)
#Remove additional white spaces
tweet = re.sub('[\s]+', ' ', tweet)
#Replace #word with word
tweet = re.sub(r'#([^\s]+)', r'\1', tweet)
#trim
tweet = tweet.strip('\'"')
return tweet
#end
#Read the tweets one by one and process it
input = open('withoutEmptylines.csv', 'rb')
output = open('editedTweets.csv','wb')
line = input.readline()
while line:
processedTweet = processTweet(line)
print (processedTweet)
output.write(processedTweet)
line = input.readline()
input.close()
output.close()
My data in the input file looks like this, so each tweet in one line:
She wants to ride my BMW the go for a ride in my BMW lol http://t.co/FeoNg48AQZ
BMW Sees U.S. As Top Market For 2015 i8 http://t.co/kkFyiBDcaP
my function is working good, but I am not happy with the output which looks like this:
she wants to ride my bmw the go for a ride in my bmw lol URL rt AT_USER Ðun bmw es mucho? yo: bmw. -AT_USER veeergaaa!. hahahahahahahahaha nos hiciste la noche caray!
so it puts everything in one row and not each tweet in one row as was the format in the input file.
Has someone an idea to get each tweet in one line?
With a example file like this:
tweet number one
tweet number two
tweet number three
This code:
file = open('tweets.txt')
for line in file:
print line
Produces this output:
tweet number one
tweet number two
tweet number three
Python is reading in the endlines just fine, but your script is replacing them via regular expression substitution.
this regex substitution:
tweet = re.sub('[\s]+', ' ', tweet)
Is converting all of your white space characters (e.g tabs and new lines) into single spaces.
Either add a endline onto the tweet before you output it, or modify your regex to not substitute endlines like so:
tweet = re.sub('[ ]+', ' ', tweet)
EDIT: I put my test substitution command in there. the suggestion has been fixed.
这篇关于Python将文件保存到csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!