将.txt文件的内容分隔为.csv文件中的多个单元格 [英] Separate the .txt file contents to multiple cells in .csv file
问题描述
我正在使用Python 2.7,我有一个这样的txt文件,我正在用python打开它:
I'm using Python 2.7, i have got a txt file like this which one i'm opening it with python :
TIME FLIGHT FROM AIRLINE AIRCRAFT STATUS
8:40 AM LH1334
Frankfurt (FRA)
Lufthansa A320 (D-AIPP)
Landed 8:40 AM
8:45 AM OK786
Prague (PRG)
Czech Airlines AT45 (OK-KFP)
Landed 8:32 AM
我想以正确的模式将其导出到csv到6列(时间,航班,发件人,航空公司,飞机,状态),我想获取此信息:
I want to export it to csv in the correct mode to 6 columns (Time, Flight, From, Airline, Aircraft, Status), i want to get this:
TIME FLIGHT FROM AIRLINE AIRCRAFT STATUS
Jul 21 8:40 AM LH1334 Frankfurt (FRA) Lufthansa A320 (D-AIPP) Landed 8:40 AM
...
这对我来说有点困难,因为连续有多个单词,所以我没有任何有用的主意,如何知道这种形式.
Its a little bit hard for me, because in a row there are multiple words, so i haven't got any useful idea, how i can reach this form.
我的代码:
import unicodecsv as csv
import os
import sys
import io
import time
import datetime
import pandas as pd
def to_2d(l,n):
return [l[i:i+n] for i in range(0, len(l), n)]
f = open('proba.txt', 'r')
x = f.read()
filename=r'output.csv'
resultcsv=open(filename,"wb")
output=csv.writer(resultcsv, delimiter=';',quotechar = '"', quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1')
maindatatable = to_2d(x, 6)
print maindatatable
output.writerows(x)
resultcsv.close()
推荐答案
看起来它们被分为4行.
Looks like they're grouped as 4 lines each.
我们可以处理第一行
8:40 AM LH1334
如下:
import re
matches = re.match('(\d{1,2}:\d{2} [APM]{2}) (\w+\d+)', line)
time = matches.group(1)
flight = matches.group(2)
编辑:这有点过头了.有一个选项卡将它们分开,因此实际上非常简单:
This bit is overkill. There is a tab separating them, so it's actually very easy:
time, flight = line.split('\t')
第二行:
Frankfurt (FRA)
不重要:
from_ = line
第三行:
Lufthansa A320 (D-AIPP)
可以处理:
airline, aircraft = line.split('\t')
第四行:
Landed 8:40 AM
也是微不足道的:
status = line
总共,您可以分四行分别处理它们:
Altogether, you can process them in batches of four lines each:
from itertools import islice
with open('my.txt') as f:
header = f.readline() # skip header
while True:
# read four lines
lines = list(islice(f, 4))
if len(lines) < 4:
break
time, flight = lines[0].split('\t')
from_ = lines[1]
airline, aircraft = lines[2].split('\t')
status = lines[3]
# Output a row into your csv file here
这篇关于将.txt文件的内容分隔为.csv文件中的多个单元格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!