在Python中将多个JSON写入CSV - 将字典转换为CSV [英] Writing multiple JSON to CSV in Python - Dictionary to CSV

查看:340
本文介绍了在Python中将多个JSON写入CSV - 将字典转换为CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Tweepy来传输tweets,并希望将它们录制为CSV格式,以便我们可以玩它们或在数据库中加载它们。
请记住,我是一个noob,但我意识到有多种处理方式(建议是非常欢迎)。



长故事短,我需要转换和追加多个Python字典到CSV文件。
我已经做了我的研究(如何将Python字典写入csv文件?),并尝试使用DictWriter和writer方法。



但是,需要完成:



2)由于新的推文是流,需要附加值而不覆盖之前的行。



3)如果值缺少记录NULL。



4)跳过/解决ascii编解码器错误。 p>

这是我想要结束的格式(每个值在其单个单元格中):



Header1_Key_1 Header2_Key_2 Header3_Key_3 ...



Row1_Value_1 Row1_Value_2 Row1_Value_3 ...



Row2_Value_1 Row2_Value_2 Row2_Value_3 ...



Row3_Value_1 Row3_Value_2 Row3_Value_3 ...



Row4_Value_1 Row4_Value_2 Row4_Value_3 ...



这是我的代码:

  from tweepy.streaming import StreamListener 
from tweepy import OAuthHandler
from tweepy import Stream
import csv
import json

consumer_key =XXXX
consumer_secret =XXXX
access_token =XXXX
access_token_secret =XXXX

class StdOutListener(StreamListener):

def on_data(self,data):
json_data = json.loads )

data_header = json_data.keys()
data_row = json_data.values()

try:
with open('csv_tweet3.csv', 'wb')as f:
w = csv.DictWriter(f,data_header)
w.writeheader(data_header)
w.writerow(json_data)
except BaseException,e:
print'Something is wrong',str(e)

return True

def on_error(self,status):
打印状态

if __name__ =='__main__':
l = StdOutListener()
auth = OAuthHandler(consumer_key,consumer_secret)
auth.set_access_token(access_token,access_token_secret)

stream = Stream(auth,l)
stream.filter(track = ['world cup'])


b $ b

提前谢谢!

解决方案

我已经做了类似的事情与facebook的图形API =https://pypi.python.org/pypi/facepy/1.0.6 =nofollow> facepy module)!

  from tweepy.streaming import StreamListener 
from tweepy import OAuthHandler
from tweepy import Stream
import csv
import json

consumer_key =XXXX
consumer_secret =XXXX
access_token =XXXX
access_token_secret =XXXX

class StdOutListener(StreamListener):
_headers = None
def __init __(self,headers,* args,** keys):
StreamListener .__ init __(self,* args,** keys)
self._headers = headers

def on_data(self,data):
json_data = json.loads(data)

#data_header = json_data.keys()
#data_row = json_data .values()

try:
with open('csv_tweet3.csv','ab')as f:#a for append
w = csv.writer b $ b#write!
w.writerow(self._valToStr(json_data [header])
如果json_data中的标头,否则为'
为self._headers中的头)
(Exception除外)e:
print'Something is wrong',str(e)

return True

@static_method
def _valToStr(o):
#json一组数据类型 - 根据情况解析
#https://docs.python.org/2/library/json.html#encoders-and-decoders
如果type(o)== unicode:return self._removeNonASCII(o)
elif type(o)== bool:return str(o)
elif type(o)== None:return''
elif ...
...

def _removeNonASCII:
return''.join(i if ord(i)<128 else''for i in s)

def on_error(self,status):
print status

如果__name__ =='__main__':
headers = ['look','at' twitter','api',
'to','find','all','possible',
'keys']


with open('csv_tweet3.csv','wb')as f:
w = csv.writer(headers)

l = StdOutListener(headers)
auth = OAuthHandler(consumer_key,consumer_secret)
auth.set_access_token(access_token,access_token_secret)

stream = Stream(auth,l)
stream.filter(track = ['world cup'] )

这不是复制和粘贴准备,但很清楚,你应该能够完成它。

对于性能,您可能想查看打开文件,写几个记录,然后关闭文件。这样,你不能始终打开,初始化csv writer,附加,然后关闭文件。我不熟悉tweepy API,所以我不知道这将如何工作,但它值得研究。



如果你遇到任何麻烦,我很乐意帮助 - 享受!


I am using Tweepy to stream tweets and would like to record them in a CSV format so I can play around with them or load them in database later. Please keep in mind that I am a noob, but I do realize there are multiple ways of handling this (suggestions are very welcome).

Long story short, I need to convert and append multiple Python dictionaries to a CSV file. I already did my research (How do I write a Python dictionary to a csv file?) and tried doing this with DictWriter and writer methods.

However, there are few more things that need to be accomplished:

1) Write key as header only once.

2) As new tweet is streamed, value needs to be appended without overwriting previous rows.

3) If value is missing record NULL.

4) Skip/fix ascii codec errors.

Here is the format of what I would like to end up with (each value is in its individual cell):

Header1_Key_1 Header2_Key_2 Header3_Key_3...

Row1_Value_1 Row1_Value_2 Row1_Value_3...

Row2_Value_1 Row2_Value_2 Row2_Value_3...

Row3_Value_1 Row3_Value_2 Row3_Value_3...

Row4_Value_1 Row4_Value_2 Row4_Value_3...

Here is my code:

from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import csv
import json

consumer_key="XXXX"
consumer_secret="XXXX"
access_token="XXXX"
access_token_secret="XXXX"

class StdOutListener(StreamListener):

    def on_data(self, data):
        json_data = json.loads(data)

        data_header = json_data.keys()
        data_row = json_data.values()

        try:
            with open('csv_tweet3.csv', 'wb') as f:
                w = csv.DictWriter(f, data_header)
                w.writeheader(data_header)
                w.writerow(json_data)
        except BaseException, e:
            print 'Something is wrong', str(e)

        return True

    def on_error(self, status):
        print status

if __name__ == '__main__':
    l = StdOutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)

    stream = Stream(auth, l)
    stream.filter(track=['world cup'])

Thank you in advance!

解决方案

I have done a similar thing with facebook's graph API (facepy module)!

from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import csv
import json

consumer_key="XXXX"
consumer_secret="XXXX"
access_token="XXXX"
access_token_secret="XXXX"

class StdOutListener(StreamListener):
    _headers = None
    def __init__(self,headers,*args,**keys):
        StreamListener.__init__(self,*args,**keys)
        self._headers = headers

    def on_data(self, data):
        json_data = json.loads(data)

        #data_header = json_data.keys()
        #data_row = json_data.values()

        try:
            with open('csv_tweet3.csv', 'ab') as f: # a for append
                w = csv.writer(f)
                # write!
                w.writerow(self._valToStr(json_data[header])
                           if header in json_data else ''
                           for header in self._headers)
        except Exception, e:
            print 'Something is wrong', str(e)

        return True

    @static_method
    def _valToStr(o):
        # json returns a set number of datatypes - parse dependingly
        # https://docs.python.org/2/library/json.html#encoders-and-decoders
        if type(o)==unicode: return self._removeNonASCII(o)
        elif type(o)==bool: return str(o)
        elif type(o)==None: return ''
        elif ...
        ...

    def _removeNonASCII(s):
        return ''.join(i if ord(i)<128 else '' for i in s)

    def on_error(self, status):
        print status

if __name__ == '__main__':
    headers = ['look','at','twitter','api',
               'to','find','all','possible',
               'keys']

    # initialize csv file with header info
    with open('csv_tweet3.csv', 'wb') as f:
        w = csv.writer(headers)

    l = StdOutListener(headers)
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)

    stream = Stream(auth, l)
    stream.filter(track=['world cup'])

It's not copy&paste ready, but it's clear enough to where you should be able to finish it.
For performance, you may want to look opening the file, write several records, then close the file. This way you're not consistently opening, initializing the csv writer, appending, then closing the file. I'm not familiar with the tweepy API, so I'm not sure exactly how this would work - but it's worth looking into.

If you run into any trouble, I'll be happy to help - enjoy!

这篇关于在Python中将多个JSON写入CSV - 将字典转换为CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆