拆分表 [英] splitting tables

查看:85
本文介绍了拆分表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我有一个小python程序的问题我正在努力写

我希望有人可以帮助我。我正在研究这种类型的桌子:


CGA 1988 06 21 13 48 G500-050 ​​D 509.62 J .. R1 1993 01 28 00 00 880006

CGA 1988 06 21 14 04 G500-051 D 550.62 J .. R1 1993 01 28 00 00 880007

我必须阅读表格的每一行并将其写入逗号 - 分开

列出这些以供以后操作:


CGA,1988,06,21,13,48,G500-050,D,509.62,J。 。,R1,1993,01,28,00,00,880006

CGA,1988,06,21,14,04,G500-051,D,550.62,J ..,R1,1993 ,01,28,00,00,880007


''拆分''功能非常好用,除非原来有错误原因
数据表。例如,如果一个元素在一行中错过,

就像这样:

CGA 1990 08 15 13 16 G500-105 D 524.45 J .. R1 1993 01 29 00 00 900069

CGA 1990 08 16 01 22 D 508.06 J .. R1 1993 01 27 00 00 900065


此错误经常在我身上发生数据集和表格太大了b / b $ b $手动检查它。在这种情况下,我分裂

行字符串当然是这样的:


CGA,1990,08,15,13,​​16,G500-105 ,D,524.45,J ..,R1,1993,01,29,00,00,900069

CGA,1990,08,16,01,22,D,508.06,J .., R1,1993,01,27,00,00,900065

当程序试图在第二个列表上工作时,它会停止(当然!)。

有什么方法可以避免这个问题吗?这种错误通常在我的数据集中发生,并且表格通常太大而无法手动检查它。\\ b
。非常感谢任何建议。


R


Hi, I have a problem with a small python program I''m trying to write
and I hope somebody may help me. I''m working on tables of this kind:

CGA 1988 06 21 13 48 G500-050 D 509.62 J.. R1 1993 01 28 00 00 880006
CGA 1988 06 21 14 04 G500-051 D 550.62 J.. R1 1993 01 28 00 00 880007

I have to read each line of the table and put it into comma-separated
lists like these for later manipulation:

CGA,1988,06,21,13,48,G500-050,D,509.62,J..,R1,1993,01,28,00,00,880006
CGA,1988,06,21,14,04,G500-051,D,550.62,J..,R1,1993,01,28,00,00,880007

The ''split'' function works pretty well, except when there is an error in
the original data table. For example if an element is missin in a line,
like this:

CGA 1990 08 15 13 16 G500-105 D 524.45 J.. R1 1993 01 29 00 00 900069
CGA 1990 08 16 01 22 D 508.06 J.. R1 1993 01 27 00 00 900065

This error happens quite often in my dataset and the tables are too
large to check for it manually. In this case what I get splitting the
line string is of course this:

CGA,1990,08,15,13,16,G500-105,D,524.45,J..,R1,1993,01,29,00,00,900069
CGA,1990,08,16,01,22,D,508.06,J..,R1,1993,01,27,00 ,00,900065

And when the program tries to work on the second list it stops (of course!).
Is there any way to avoid this problem? This kind of error happens quite
often in my dataset and the tables are usually too large to check for it
manually. Thanks a lot for any suggestions.

R

推荐答案



r> ; ''拆分''功能非常有效,除非在

r>中出现错误。原始数据表。例如,如果一个元素在一行中错过,

r>像这样:


r> CGA 1990 08 15 13 16 G500-105 D 524.45 J .. R1 1993 01 29 00 00 900069

r> CGA 1990 08 16 01 22 D 508.06 J .. R1 1993 01 27 00 00 900065

如果数据是真正的固定宽度,只需切割字符串:

r> The ''split'' function works pretty well, except when there is an error in
r> the original data table. For example if an element is missin in a line,
r> like this:

r> CGA 1990 08 15 13 16 G500-105 D 524.45 J.. R1 1993 01 29 00 00 900069
r> CGA 1990 08 16 01 22 D 508.06 J.. R1 1993 01 27 00 00 900065

If the data are truly fixed width, just slice the strings:
s
''CGA 1990 08 16 01 22 D 508.06 J .. R1 1993 01 27 00 00 900065's [ 0:3],s [4:8],s [9:11],s [12:14],s [15:17],s [18:20],s [21:29],s [30 :31]
s ''CGA 1990 08 16 01 22 D 508.06 J.. R1 1993 01 27 00 00 900065'' s[0:3], s[4:8], s[9:11], s[12:14], s[15:17], s[18:20], s[21:29], s[30:31]



('''CGA'',''1990'',''08'',''16'','' 01'',''22'','''',''D'')


跳过


(''CGA'', ''1990'', ''08'', ''16'', ''01'', ''22'', '' '', ''D'')

Skip


2004年2月7日,robsom< - no*****@no.mail.it 写道:
On 7 Feb 2004, robsom <- no*****@no.mail.it wrote:
当程序试图在第二个列表上工作时,它会停止(当然!)。
有什么方法可以避免这个问题吗?这种错误发生得相当
And when the program tries to work on the second list it stops (of course!).
Is there any way to avoid this problem? This kind of error happens quite




你想做什么?要查看项目是否丢失是微不足道的:

只需检查分割线的长度(列表)。但在这种情况下,正确的行动是属于你的;应该询问用户吗?

总是丢失相同的列?是否可以区分

条目而没有彼此错误,因此程序可以决定丢失哪个
列?


KP


-

''Twas brillig,以及狡猾的手铐

在wabe中做过gyre和gimble;

所有的模仿都是无聊的,

并且mome raths outgrabe。 Lewis Carroll Jabberwocky



What do you want to be done? To see if an item is missing is trivial:
just check the length of the splitted line (a list). But what the right
action in that case is belongs to you; should the user be asked? is
always the same column missing? is it possible to distinguish the
entries without errors from each other so the programm can decide which
column is missing?

KP

--
''Twas brillig, and the slithy toves
Did gyre and gimble in the wabe;
All mimsy were the borogoves,
And the mome raths outgrabe. "Lewis Carroll" "Jabberwocky"


2004年2月7日星期六20:08:50 +0000(UTC),robsom< no ***** @ no.mail。它>写道:
On Sat, 7 Feb 2004 20:08:50 +0000 (UTC), robsom <no*****@no.mail.it> wrote:

我有一个小python程序的问题我正在努力写作
我希望有人可以帮助我。我正在研究这种类型的桌子:

CGA 1988 06 21 13 48 G500-050 ​​D 509.62 J .. R1 1993 01 28 00 00 880006
CGA 1988 06 21 14 04 G500-051 D 550.62 J .. R1 1993 01 28 00 00 880007

我必须阅读表格的每一行并将其放入以逗号分隔的列表中,以便以后操作:

CGA,1988,06,21,13,48,G500-050,D,509.62,J ..,R1,1993,01,28,00,00,880006
CGA ,1988,06,21,14,04,G500-051,D,550.62,J ..,R1,1993,01,28,00,00,880007

''拆分''功能很好,除非原始数据表中有错误。例如,如果一个元素在一行中错过,
如下:

CGA 1990 08 15 13 16 G500-105 D 524.45 J .. R1 1993 01 29 00 00 900069 CGA 1990 08 16 01 22 D 508.06 J .. R1 1993 01 27 00 00 900065

这个错误在我的数据集中经常发生,表格太大,无法手动检查。在这种情况下,我分裂
线串当然是这样的:

CGA,1990,08,15,13,​​16,G500-105,D,524.45,J .. ,R1,1993,01,29,00,00,900069
CGA,1990,08,16,01,22,D,508.06,J ..,R1,1993,01,27,0 0,00 ,900065

当程序试图在第二个列表上工作时,它会停止(当然!)。
有什么办法可以避免这个问题吗?这种错误经常在我的数据集中发生,并且表格通常太大而无法手动检查它。非常感谢任何建议。

Hi, I have a problem with a small python program I''m trying to write
and I hope somebody may help me. I''m working on tables of this kind:

CGA 1988 06 21 13 48 G500-050 D 509.62 J.. R1 1993 01 28 00 00 880006
CGA 1988 06 21 14 04 G500-051 D 550.62 J.. R1 1993 01 28 00 00 880007

I have to read each line of the table and put it into comma-separated
lists like these for later manipulation:

CGA,1988,06,21,13,48,G500-050,D,509.62,J..,R1,1993,01,28,00,00,880006
CGA,1988,06,21,14,04,G500-051,D,550.62,J..,R1,1993,01,28,00,00,880007

The ''split'' function works pretty well, except when there is an error in
the original data table. For example if an element is missin in a line,
like this:

CGA 1990 08 15 13 16 G500-105 D 524.45 J.. R1 1993 01 29 00 00 900069
CGA 1990 08 16 01 22 D 508.06 J.. R1 1993 01 27 00 00 900065

This error happens quite often in my dataset and the tables are too
large to check for it manually. In this case what I get splitting the
line string is of course this:

CGA,1990,08,15,13,16,G500-105,D,524.45,J..,R1,1993,01,29,00,00,900069
CGA,1990,08,16,01,22,D,508.06,J..,R1,1993,01,27,0 0,00,900065

And when the program tries to work on the second list it stops (of course!).
Is there any way to avoid this problem? This kind of error happens quite
often in my dataset and the tables are usually too large to check for it
manually. Thanks a lot for any suggestions.
s =""" ... CGA 1990 08 15 13 16 G500-105 D 524.45 J .. R1 1993 01 29 00 00 900069

... CGA 1990 08 16 01 22 D 508.06 J .. R1 1993 01 27 00 00 900065

..." ;"" import re
rxo = re.compile(
...''(...)(....)(..)(..)(..)(..)(。 .......)(。)''

...''(......)(...)(..)(....)( ..)(..)(..)(..)(......)''

...)import csv
import sys
writer = csv.writer(sys.stdout)
for s.splitlines()中的行:writer.writerow(* rxo.findall(line))
s = """... CGA 1990 08 15 13 16 G500-105 D 524.45 J.. R1 1993 01 29 00 00 900069
... CGA 1990 08 16 01 22 D 508.06 J.. R1 1993 01 27 00 00 900065
... """ import re
rxo = re.compile( ... ''(...) (....) (..) (..) (..) (..) (........) (.) ''
... ''(......) (...) (..) (....) (..) (..) (..) (..) (......)''
... ) import csv
import sys
writer = csv.writer(sys.stdout)
for line in s.splitlines(): writer.writerow(*rxo.findall(line))



...

CGA,1990,08,15,13,​​16,G500-105,D,524.45,J ..,R1,1993,01,29,00,00 ,900069

CGA,1990,08,16,01,22,,D,508.06,J ..,R1,1993,01,27,00,00,900065


要将csv行写入文件而不是sys.stdout,请替换(未经测试)

文件(''your_csv_output_file.csv'')代替sys.stdout在上面,并从类似的东西中获取你的

行(注意删掉尾随的换行符)


for line in file(''your_table_file''):

line = line.rstrip(''\ n'')


而不是

for line in s.splitlines()


如果你有可能产生不匹配的短线,你需要检查那些

在解压缩之前(使用带前缀的*)写入writer.writerow的arg列表。


这就是今天的clp ;-)


问候,

Bengt Richter


...
CGA,1990,08,15,13,16,G500-105,D,524.45,J..,R1,1993,01,29,00,00,900069
CGA,1990,08,16,01,22, ,D,508.06,J..,R1,1993,01,27,00,00,900065

To write the csv lines to a file instead of sys.stdout, substitute (untested)
file(''your_csv_output_file.csv'') in place of sys.stdout in the above, and get your
lines from something like (note chopping off the trailing newline)

for line in file(''your_table_file''):
line = line.rstrip(''\n'')

instead of

for line in s.splitlines()

If you have possible short lines that create no match, you''ll need to check for those
before unpacking (by using the prefixed *) into writer.writerow''s arg list.

That''s it for clp today ;-)

Regards,
Bengt Richter


这篇关于拆分表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆