用 Python 将 JSON 文件切成不同的时间截距 [英] Slice JSON File into Different Time Intercepts with Python

查看:82
本文介绍了用 Python 将 JSON 文件切成不同的时间截距的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于当前的研究项目,我试图将 JSON 文件分成不同的时间截距.基于对象日期",我想按季度分析 JSON 文件的内容,即 1 月 1 日 - 3 月 31 日,4 月 1 日 - 6 月 20 日等.

理想情况下,代码必须选择文件中最早的日期,并在其上添加季度时间.关于这一点我已经研究过,但还没有找到任何有用的方法.

有什么聪明的方法可以将其包含在代码中吗?JSON 文件具有以下结构:

<预><代码>[{"No":"121","Stock Symbol":"A","Date":"05/11/2017","Text Main":"Sample text"}]

现有的相关代码摘录如下所示:

将pandas导入为pd文件 = pd.read_json (r'Glassdoor_A.json')数据 = json.load(文件)# 创建一个空字典d = 字典()# 加工:对于数据行:line = row['Text Main']# 删除前导空格和换行符line = line.strip()# 将行中的字符转换为# 小写以避免大小写不匹配line = line.lower()# 删除行中的标点符号line = line.translate(line.maketrans("", "", string.punctuation))# 将行拆分为时间间隔line.sort_values(by=['Date'])line.tshift(d, int = 90, freq=timedelta, axis='Date')# 将行拆分为单词words = line.split(" ")# 遍历行中的每个单词逐字逐句:# 检查单词是否已经在字典中如果 d 中的单词:# 将字数增加 1d[词] = d[词] + 1别的:# 将单词添加到字典中,计数为 1d[字] = 1# 打印字典内容对于列表中的键(d.keys()):打印(键,:",d [键])# 统计单词总数总计 = sum(d.values())打印(d[key],总计)

解决方案

请在下面找到问题的解决方案.通过分配开始日期和结束日期并将 JSON Date 对象与这些日期进行比较,可以使用 Pandas 对数据进行切片.

重要提示:在处理信息之前,必须对数据进行规范化,并且必须将日期转换为 Pandas 日期时间格式.

导入字符串导入json导入 csv将熊猫导入为 pd导入日期时间将 numpy 导入为 np# 加载和读取数据集file = open("Glassdoor_A.json", "r")数据 = json.load(文件)df = pd.json_normalize(data)df['Date'] = pd.to_datetime(df['Date'])# 创建一个空字典d = 字典()# 按日期过滤start_date = "01/01/2018"end_date = "31/03/2018"after_start_date = df["Date"] >= start_datebefore_end_date = df["Date"] <= end_datebetween_two_dates = after_start_date &before_end_date过滤日期 = df.loc[between_two_dates]打印(过滤日期)

For a current research project, I am trying to slice a JSON file into different time intercepts. Based on the object "Date", I want to analyse content of the JSON file by quarter, i.e. 01 January - 31 March, 01 April - 20 June etc.

The code would ideally have to pick the oldest date in the file and add quarterly time incercepts on top of that. I have done research on this point but not found any helpful methods yet.

Is there any smart way to include this in the code? The JSON file has the following structure:

[
{"No":"121","Stock Symbol":"A","Date":"05/11/2017","Text Main":"Sample text"}
]

And the existing relevant code excerpt looks like this:

import pandas as pd

file = pd.read_json (r'Glassdoor_A.json')
data = json.load(file)

# Create an empty dictionary
d = dict()

# processing:
for row in data:
    line = row['Text Main']
    # Remove the leading spaces and newline character
    line = line.strip()

    # Convert the characters in line to
    # lowercase to avoid case mismatch
    line = line.lower()

    # Remove the punctuation marks from the line
    line = line.translate(line.maketrans("", "", string.punctuation))

    # Split the line into time intervals
    line.sort_values(by=['Date'])
    line.tshift(d, int = 90, freq=timedelta, axis='Date')

    # Split the line into words
    words = line.split(" ")

    # Iterate over each word in line
    for word in words:
        # Check if the word is already in dictionary
        if word in d:
            # Increment count of word by 1
            d[word] = d[word] + 1
        else:
            # Add the word to dictionary with count 1
            d[word] = 1

# Print the contents of dictionary
for key in list(d.keys()):
    print(key, ":", d[key])

    # Count the total number of words
    total = sum(d.values())
    print(d[key], total)

解决方案

Please find below the solution to the question. The data can be sliced with Pandas by allocating a start and an end date and comparing the JSON Date object with these dates.

Important note: the data must be normalised and dates have to be converted into a Pandas datetime format before processing the information.

import string
import json
import csv

import pandas as pd
import datetime

import numpy as np


# Loading and reading dataset
file = open("Glassdoor_A.json", "r")
data = json.load(file)
df = pd.json_normalize(data)
df['Date'] = pd.to_datetime(df['Date'])


# Create an empty dictionary
d = dict()


# Filtering by date
start_date = "01/01/2018"
end_date = "31/03/2018"

after_start_date = df["Date"] >= start_date
before_end_date = df["Date"] <= end_date

between_two_dates = after_start_date & before_end_date
filtered_dates = df.loc[between_two_dates]

print(filtered_dates)

这篇关于用 Python 将 JSON 文件切成不同的时间截距的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆