使用 Python (stream twitter) 将多个 JSON 文件合并为一个文件 [英] Merge multiple JSON files into one file by using Python (stream twitter)

查看:29
本文介绍了使用 Python (stream twitter) 将多个 JSON 文件合并为一个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从 Twitter 中提取了数据.目前,数据在多个文件中,我无法将其合并为一个文件.

注意:所有文件均为 JSON 格式.

我使用的代码是这里此处.

有人建议使用glop 编译JSON文件

我在一些关于使用 Python 合并 JSON 的教程中看到了这个代码

from glob 导入 glob导入json将熊猫导入为 pdwith open('Desktop/json/finalmerge.json', 'w') as f:for fname in glob('Desktop/json/*.json'): # 从当前目录读取所有 json将 open(fname) 设为 j:f.write(str(j.read()))f.write('\n')

我成功合并了所有文件,现在文件是 finalmerge.json.

现在我按照几个线程的建议使用了这个:

<预><代码>df_lines = pd.read_json('finalmerge.json',lines=True)df_lines1000000*23 列那么,我应该怎么做才能在单独的列中制作每个功能?我不确定为什么 JSON 文件有什么问题,我检查了我合并的文件,发现它作为 JSON 文件无效?我应该怎么做才能将其作为数据框?我问这个问题的原因是我有非常基本的 Python 知识,而且我发现的类似问题的所有答案都比我能理解的要复杂得多.请帮助这个新的 python 用户将多个 Json 文件转换为一个 JSON 文件.谢谢

解决方案

我认为问题在于您的文件并不是真正的 json(或者更好的是,它们的结构为 jsonl ).您有两种处理方式:

  1. 您可以将每个文件作为文本文件读取并逐行合并
  2. 您可以将它们转换为 json(在文件的开头添加一个方括号,并在每个 json 元素的末尾添加一个逗号).

尝试关注这个问题,让我知道它是否解决了您的问题:加载 JSONL 文件作为 JSON 对象

您也可以尝试以这种方式编辑您的代码:

 with open('finalmerge.json', 'w') as f:对于 glob('Desktop/json/*.json') 中的 fname:将 open(fname) 设为 j:f.write(str(j.read()))f.write('\n')

每一行都是不同的 json 元素.

I've pulled data from Twitter. Currently, the data is in multiple files and I could not merge it into one single file.

Note: all files are in JSON format.

The code I have used is here and here.

It has been suggested to work with glop to compile JSON files

I write this code as I have seen in some tutorials about merge JSON by using Python

from glob import glob 
import json
import pandas as pd

with open('Desktop/json/finalmerge.json', 'w') as f: 
    for fname in glob('Desktop/json/*.json'): # Reads all json from the current directory 
        with open(fname) as j: 
            f.write(str(j.read())) 
            f.write('\n') 
            

I successfully merge all files and now the file is finalmerge.json.

Now I used this as suggested in several threads:


df_lines = pd.read_json('finalmerge.json', lines=True)
df_lines


1000000*23 columns 

Then, what I should do to make each feature in separate columns?



I'm not sure why what's wrong with JSON files, I checked the file that I merge and I found it's not valid as JSON file? what I should do to make this as a data frame?

The reason I am asking this is that I have very basic python knowledge and all the answers to similar questions that I have found are way more complicated than I can understand. Please help this new python user to convert multiple Json fils to one JSON file.

Thank you

解决方案

I think that the problem is that your files are not really json (or better, they are structured as jsonl ). You have two ways of proceding:

  1. you could read every file as a text file and merge them line by line
  2. you could convert them to json (add a square bracket at the beginning of the file and a comma at the end of every json element).

Try following this question and let me know if it solves your problem: Loading JSONL file as JSON objects

You can also try to edit your code this way:

with open('finalmerge.json', 'w') as f:
    for fname in glob('Desktop/json/*.json'): 
        with open(fname) as j:
            f.write(str(j.read()))
            f.write('\n')

Every line will be a different json element.

这篇关于使用 Python (stream twitter) 将多个 JSON 文件合并为一个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆