将.txt文件的内容分隔为.csv文件中的多个单元格 [英] Separate the .txt file contents to multiple cells in .csv file

查看:319
本文介绍了将.txt文件的内容分隔为.csv文件中的多个单元格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python 2.7,我有一个这样的txt文件,我正在用python打开它:

I'm using Python 2.7, i have got a txt file like this which one i'm opening it with python :

TIME    FLIGHT  FROM    AIRLINE AIRCRAFT        STATUS
8:40 AM LH1334  
Frankfurt (FRA)
Lufthansa   A320 (D-AIPP)   
Landed 8:40 AM
8:45 AM OK786   
Prague (PRG)
Czech Airlines  AT45 (OK-KFP)   
Landed 8:32 AM

我想以正确的模式将其导出到csv到6列(时间,航班,发件人,航空公司,飞机,状态),我想获取此信息:

I want to export it to csv in the correct mode to 6 columns (Time, Flight, From, Airline, Aircraft, Status), i want to get this:

TIME            FLIGHT  FROM            AIRLINE         AIRCRAFT      STATUS
Jul 21 8:40 AM  LH1334  Frankfurt (FRA) Lufthansa   A320 (D-AIPP) Landed 8:40 AM
...

这对我来说有点困难,因为连续有多个单词,所以我没有任何有用的主意,如何知道这种形式.

Its a little bit hard for me, because in a row there are multiple words, so i haven't got any useful idea, how i can reach this form.

我的代码:

import unicodecsv as csv
import os
import sys
import io
import time
import datetime
import pandas as pd

def to_2d(l,n):
    return [l[i:i+n] for i in range(0, len(l), n)]

f = open('proba.txt', 'r')
x = f.read()

filename=r'output.csv'

resultcsv=open(filename,"wb")
output=csv.writer(resultcsv, delimiter=';',quotechar = '"', quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1')

maindatatable = to_2d(x, 6)
print maindatatable
output.writerows(x)

resultcsv.close()

推荐答案

看起来它们被分为4行.

Looks like they're grouped as 4 lines each.

我们可以处理第一行

8:40 AM LH1334

如下:

import re

matches = re.match('(\d{1,2}:\d{2} [APM]{2}) (\w+\d+)', line)
time = matches.group(1)
flight = matches.group(2)

编辑:这有点过头了.有一个选项卡将它们分开,因此实际上非常简单:

This bit is overkill. There is a tab separating them, so it's actually very easy:

time, flight = line.split('\t')

第二行:

Frankfurt (FRA)

不重要:

from_ = line

第三行:

Lufthansa   A320 (D-AIPP)

可以处理:

airline, aircraft = line.split('\t')

第四行:

Landed 8:40 AM

也是微不足道的:

status = line


总共,您可以分四行分别处理它们:


Altogether, you can process them in batches of four lines each:

from itertools import islice

with open('my.txt') as f:
    header = f.readline()  # skip header

    while True:
        # read four lines
        lines = list(islice(f, 4))
        if len(lines) < 4:
            break

        time, flight = lines[0].split('\t')
        from_ = lines[1]
        airline, aircraft = lines[2].split('\t')
        status = lines[3]

        # Output a row into your csv file here

这篇关于将.txt文件的内容分隔为.csv文件中的多个单元格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆