Python将文件保存到csv [英] Python save file to csv

查看:205
本文介绍了Python将文件保存到csv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码在Twitter的tweets中获取,应该处理数据,然后保存到一个新的文件。



这是代码:

  #import regex 
import re

#start process_tweet
def processTweet :
#处理tweets

#转换为小写
tweet = tweet.lower()
#转换www。*或https?:// * URL
tweet = re.sub('((www\。[\s] +)|(https?:// [^ \s] +))','URL',tweet)
#Convert @username to AT_USER
tweet = re.sub('@ [^ \s] +','AT_USER',tweet)
#移除额外的空格
tweet = re.sub('[\s] +','',tweet)
#Replace #word with word
tweet = re.sub(r'#([^ \s] + )',r'\1',tweet)
#trim
tweet = tweet.strip('\'')
return tweet
#end

#逐个读取tweets并处理
input = open('withoutEmptylines.csv','rb')
output = open('editedTweets.csv','wb' )

line = input.readline()

while:
processedTweet = processTweet(line)
print(processedTweet)
output .write(processedTweet)
line = input.readline()

input.close()
output.close()
/ pre>

我在输入文件中的数据看起来像这样,所以每一个tweet在一行:



<$ p $她想骑我的宝马去我的宝马lol http://t.co/FeoNg48AQZ
宝马看到美国作为2015年顶级市场i8 http:// t .co / kkFyiBDcaP

我的功能运行良好,但我不满意的输出看起来像这个:

 她想骑我的宝贝去乘坐我的bmw lol URL rt AT_USERÐunbmw es mucho?哟:宝马。 -AT_USER veeergaaa! hahahahahahahahahaha nos hiciste la noche caray! 

,所以它将所有内容放在一行,而不是每行中的tweet,文件。



有人想在一行中获得每条推文吗?

解决方案

p>使用这样的示例文件:

  tweet第一个
tweet第二个
tweet第三个

此代码:

  file = open('tweets.txt')
文件中的行:
打印行

产生此输出:

  tweet第一个

tweet第二个

推文第三个

Python正在阅读的最后一行,但是你的脚本通过正则表达式替换它们。



此正则表达式替换:

  tweet = re.sub [\ s] +','',tweet)

正在转换所有的空格字符(例如制表符和新行)转换为单个空格。



在输出之前,在推文上添加尾标,或者修改正则表达式, / p>

  tweet = re.sub('[] +','',tweet)

编辑:我把测试替换命令放在那里。该建议已修复。


I have the following code that gets in Twitter tweets and should process the data and after that save into a new file.

This is the code:

#import regex
import re

#start process_tweet
def processTweet(tweet):
    # process the tweets

    #Convert to lower case
    tweet = tweet.lower()
    #Convert www.* or https?://* to URL
    tweet = re.sub('((www\.[\s]+)|(https?://[^\s]+))','URL',tweet)
    #Convert @username to AT_USER
    tweet = re.sub('@[^\s]+','AT_USER',tweet)
    #Remove additional white spaces
    tweet = re.sub('[\s]+', ' ', tweet)
    #Replace #word with word
    tweet = re.sub(r'#([^\s]+)', r'\1', tweet)
    #trim
    tweet = tweet.strip('\'"')
    return tweet
#end

#Read the tweets one by one and process it
input = open('withoutEmptylines.csv', 'rb')
output = open('editedTweets.csv','wb')

line = input.readline()

while line:
    processedTweet = processTweet(line)
    print (processedTweet)
    output.write(processedTweet)
    line = input.readline()

input.close()
output.close()

My data in the input file looks like this, so each tweet in one line:

She wants to ride my BMW the go for a ride in my BMW lol http://t.co/FeoNg48AQZ
BMW Sees U.S. As Top Market For 2015 i8 http://t.co/kkFyiBDcaP

my function is working good, but I am not happy with the output which looks like this:

she wants to ride my bmw the go for a ride in my bmw lol URL rt AT_USER Ðun bmw es mucho? yo: bmw. -AT_USER veeergaaa!. hahahahahahahahaha nos hiciste la noche caray! 

so it puts everything in one row and not each tweet in one row as was the format in the input file.

Has someone an idea to get each tweet in one line?

解决方案

With a example file like this:

tweet number one
tweet number two
tweet number three

This code:

file = open('tweets.txt')
for line in file:
   print line

Produces this output:

tweet number one

tweet number two

tweet number three

Python is reading in the endlines just fine, but your script is replacing them via regular expression substitution.

this regex substitution:

tweet = re.sub('[\s]+', ' ', tweet)

Is converting all of your white space characters (e.g tabs and new lines) into single spaces.

Either add a endline onto the tweet before you output it, or modify your regex to not substitute endlines like so:

tweet = re.sub('[ ]+', ' ', tweet)

EDIT: I put my test substitution command in there. the suggestion has been fixed.

这篇关于Python将文件保存到csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆