Python / Pandas CSV解析 [英] Python/ Pandas CSV Parsing

查看：118 发布时间：2017/2/26 15:32:17 python parsing csv pandas

本文介绍了Python / Pandas CSV解析的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用JotForm可配置列表小部件收集数据，但有麻烦解析正确的结果数据。当我使用

  testdf = pd.read_csv（TestLoad.csv）

数据作为两个记录读入，详细信息存储在信息列中。我理解为什么它被解析的方式，但我想把细节分成多个记录，如下所示。

任何帮助将不胜感激。 p>

样本CSV

  Information，Type
2015-12-06，First：Tom，Last：Smith，School：MCAA; First：Tammy，Last：Smith，School：MCAA;，New
2015-12-06，First：Jim，Last：Jones，School：MCAA; First：Jane，Last：Jones，School：MCAA;，New

当前结果

 日期信息类型
 2015-12-06第一：Tom，Last：Smith，School：MCAA;第一：Tammy，最后：史密斯，学校：MCAA;新
 2015-12-06第一名：Jim，最后：Jones，学校：MCAA;第一：简，最后：琼斯，学校：MCAA;新建

所需结果

 日期第一个最后学校类型
 2015-12-06 Tom Smith MCAA新
 2015-12-06 Tammy Smith MCAA新
 2015- 12-06 Jim Jones MCAA新
 2015-12-06 Jane Jones MCAA新

解决方案

这是一个无用的文本，需要保持一个答案被主持人downvote。这是我使用的数据：

 日期，信息，类型
2015-12 -07，First：Jim，Last：Jones，School：MCAA; First：Jane，Last：Jones，School：MCAA;，Old
2015-12-06， ，Last：Smith，School：MCAA; First：Tammy，Last：Smith，School：MCAA;，New

  import pandas as pd 
 import numpy as np 
 import csv 
 import re 
 import itertools as it 
 import pprint 
 import datetime as dt 
 
 records = []＃为每个人构建一个完整的记录
 
 colon_pairs = r
（\w +）＃在组1中捕获一个或多个字符，后跟$ .. 
：#A冒号，后跟... 
 \s * #Whitespace，0次或更多次，后面是... 
（\w +）＃在组2中捕获一个或多个字符的'字'字符。

 
 colon_pairs_per_person = 3 
 
 with open（csv1.csv，encoding ='utf-8'）as f：
 next ）#skip header line 
 record = {} 
 
日期，信息，csv.reader（f）中的the_type：
 info_parser = re.finditer（colon_pairs，info，flags = re.X）
 
 for i，match_obj in enumerate（info_parser）：
 key，val = match_obj.groups（）
 record [key] = val 
 
 if（i + 1）％colon_pairs_per_person == 0：＃再用一个人的信息完成
 record ['Date'] = dt.datetime.strptime（date，'％Y-％m- ％d'）＃可以按日期对DataFrame行进行排序。 
 record ['Type'] = the_type 
 
 records.append（record）
 record = {} 
 
 pprint.pprint $ b df = pd.DataFrame（
 sorted（records，key = lambda record：record ['Date']）
）
 print（df）
 df.set_index Date'，inplace = True）
 print（df）
 
 --output： -  
 [{'Date'：datetime.datetime（2015，12，7，0 ，0），
'First'：'Jim'，
'Last'：'Jones'，
'School'：'MCAA'，
' '}，
 {'Date'：datetime.datetime（2015，12，7，0，0），
'First'：'Jane'，
' ，
'School'：'MCAA'，
'Type'：'Old'}，
 {'Date'：datetime.datetime（2015，12，6，0，0） 
'First'：'Tom'，
'Last'：'Smith'，
'School'：'MCAA'，
'Type'：'New'}，
 {'Date'：datetime.datetime（2015，12，6，0，0），
'First'：'Tammy'，
'Last'：'Smith'，
'学校'：'MCAA'，
'类型'：'新'}] 
 
日期优先最后学校类型
 0 2015-12-06 Tom Smith MCAA New 
 1 2015-12-06 Tammy Smith MCAA新
 2 2015-12-07 Jim Jones MCAA旧
 3 2015-12-07 Jane Jones MCAA旧
 
第一上学类型
日期
 2015-12-06 Tom Smith MCAA新
 2015-12-06 Tammy Smith MCAA新
 2015-12-07 Jim Jones MCAA旧
 2015-12-07 Jane Jones MCAA旧

I used JotForm Configurable list widget to collect data, but having troubles parsing the resulting data correctly. When I use

testdf = pd.read_csv ("TestLoad.csv")

The data is read in as two records and the details are stored in the "Information" column. I understand why it is parsed the way it is, but I would like to break out the details into multiple records as noted below.

Any help would be appreciated.

Sample CSV

"Date","Information","Type"
"2015-12-06","First: Tom, Last: Smith, School: MCAA; First: Tammy, Last: Smith, School: MCAA;","New"
"2015-12-06","First: Jim, Last: Jones, School: MCAA; First: Jane, Last: Jones,  School: MCAA;","New"

Current Result

Date        Information                                                                      Type
2015-12-06  First: Tom, Last: Smith, School: MCAA; First: Tammy, Last: Smith, School: MCAA;  New
2015-12-06  First: Jim, Last: Jones, School: MCAA; First: Jane, Last: Jones,  School: MCAA;  New

Desired Result

Date        First  Last   School Type
2015-12-06  Tom    Smith  MCAA   New
2015-12-06  Tammy  Smith  MCAA   New
2015-12-06  Jim    Jones  MCAA   New
2015-12-06  Jane   Jones  MCAA   New

解决方案

This is useless text that is required to keep an answer from being downvoted by the moderators. Here is the data I used:

"Date","Information","Type"
"2015-12-07","First: Jim, Last: Jones, School: MCAA; First: Jane, Last: Jones,  School: MCAA;","Old"
"2015-12-06","First: Tom, Last: Smith, School: MCAA; First: Tammy, Last: Smith, School: MCAA;","New"

import pandas as pd
import numpy as np
import csv
import re
import itertools as it
import pprint
import datetime as dt

records = [] #Construct a complete record for each person

colon_pairs = r"""
    (\w+)   #Match a 'word' character, one or more times, captured in group 1, followed by..
    :       #A colon, followed by...
    \s*     #Whitespace, 0 or more times, followed by...
    (\w+)   #A 'word' character, one or more times, captured in group 2.
"""

colon_pairs_per_person = 3

with open("csv1.csv", encoding='utf-8') as f:
    next(f) #skip header line
    record = {}

    for date, info, the_type in csv.reader(f):
        info_parser = re.finditer(colon_pairs, info, flags=re.X)

        for i, match_obj in enumerate(info_parser):
            key, val = match_obj.groups()
            record[key] = val

            if (i+1) % colon_pairs_per_person == 0: #then done with info for a person
                record['Date'] = dt.datetime.strptime(date, '%Y-%m-%d') #So that you can sort the DataFrame rows by date.
                record['Type'] = the_type

                records.append(record)
                record = {}

pprint.pprint(records)
df = pd.DataFrame(
        sorted(records, key=lambda record: record['Date'])
)
print(df)
df.set_index('Date', inplace=True)
print(df)

--output:--
[{'Date': datetime.datetime(2015, 12, 7, 0, 0),
  'First': 'Jim',
  'Last': 'Jones',
  'School': 'MCAA',
  'Type': 'Old'},
 {'Date': datetime.datetime(2015, 12, 7, 0, 0),
  'First': 'Jane',
  'Last': 'Jones',
  'School': 'MCAA',
  'Type': 'Old'},
 {'Date': datetime.datetime(2015, 12, 6, 0, 0),
  'First': 'Tom',
  'Last': 'Smith',
  'School': 'MCAA',
  'Type': 'New'},
 {'Date': datetime.datetime(2015, 12, 6, 0, 0),
  'First': 'Tammy',
  'Last': 'Smith',
  'School': 'MCAA',
  'Type': 'New'}]

        Date  First   Last School Type
0 2015-12-06    Tom  Smith   MCAA  New
1 2015-12-06  Tammy  Smith   MCAA  New
2 2015-12-07    Jim  Jones   MCAA  Old
3 2015-12-07   Jane  Jones   MCAA  Old

            First   Last School Type
Date                                
2015-12-06    Tom  Smith   MCAA  New
2015-12-06  Tammy  Smith   MCAA  New
2015-12-07    Jim  Jones   MCAA  Old
2015-12-07   Jane  Jones   MCAA  Old

这篇关于Python / Pandas CSV解析的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python / Pandas CSV解析 [英] Python/ Pandas CSV Parsing

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python / Pandas CSV解析 [英] Python/ Pandas CSV Parsing

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭