浏览文字; Python正确的工具? [英] Browsing text ; Python the right tool?

查看:43
本文介绍了浏览文字; Python正确的工具?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个工具来浏览大小为10-20 Mb的文本文件。这些

文件的固定记录长度为800字节(CR / LF),并且包含用于创建外部公司打印页面的
记录。


每行(记录)包含一个2字符的标识符,如''A0''或

''C1''。标识符标识行的记录格式,

从而允许在文本文件中使用不同的记录格式。

例如:


A0记录可能包括:

recordnumber [1:4]

名称[5:25]

填充[26: 800]

而C1记录包括:

recordnumber [1:4]

phonenumber [5:15]

zipcode [16:20]

填充[21:800]


如您所见,所有记录都有固定的列格式。我想

构建一个实用程序,允许我(在Windows环境中)打开一个

文本文件并浏览记录(理想情况下搜索一下

选项),其中每个记录类型根据其

recordformat(''Attributename:Value''格式)显示。这意味着

从A0到C1记录浏览会在屏幕上产生不同的

属性+值列表,这样我就可以分析数据了

比我现在做的更容易,在文本编辑器中浏览,手头有一叠

叠印的记录格式。


这是当然这是在文本文件中编码数据的常用方法。

我试图找到一个基于文本的通用浏览器,它允许我这样做,但是b $ b,但是找不到任何东西。输入Python;我知道语言

的名字,我知道它处理文本就好了,但我不是真的好/ b $ b刚才有兴趣学习Python,我只需要一个工具来做什么

我想要。


我真正想要的是在

单独定义中定义标准记录格式的方式,比如:

- 定义一个共同的记录长度;

- 定义不同的记录格式(属性,

行的位置);

- 并定义何时使用特定记录格式,依赖于记录中的一个或多个标识符



我可以可能从头开始构建一些东西,但是如果我可以(重新)使用

已经存在的东西它会更好更快......

和一个实用程序要做我刚刚描述的将是非常有用的

很多环境。


这意味着我有以下问题:


1.现在是否有人使用通用工具(不是ne以cessarily Python为基础)

完成我已经概述的工作?

2.如果没有,是否有一些Python中的框架或小部件我可以适应

做我想做的事情?

3.如果没有,我是否应该考虑从头开始构建所有这些?
Python - 这可能意味着不仅仅是学习Python,但有些

其他GUI相关模块?

4.或者我应该忘记Python并在另一个

环境中构建一些?


任何帮助都将不胜感激。

I need a tool to browse text files with a size of 10-20 Mb. These
files have a fixed record length of 800 bytes (CR/LF), and containt
records used to create printed pages by an external company.

Each line (record) contains an 2-character identifier, like ''A0'' or
''C1''. The identifier identifies the record format for the line,
thereby allowing different record formats to be used in a textfile.
For example:

An A0 record may consist of:
recordnumber [1:4]
name [5:25]
filler [26:800]

while a C1 record consists of:
recordnumber [1:4]
phonenumber [5:15]
zipcode [16:20]
filler [21:800]

As you see, all records have a fixed column format. I would like to
build a utility which allows me (in a windows environment) to open a
textfile and browse through the records (ideally with a search
option), where each recordtype is displayed according to its
recordformat (''Attributename: Value'' format). This would mean that
browsing from a A0 to C1 record results in a different list of
attributes + values on the screen, allowing me to analyze the data
generated a lot easier then I do now, browsing in a text editor with a
stack of printed record formats at hand.

This is of course quite a common way of encoding data in textfiles.
I''ve tried to find a generic text-based browser which allows me to do
just this, but cannot find anything. Enter Python; I know the language
by name, I know it handles text just fine, but I am not really
interested in learning Python just now, I just need a tool to do what
I want.

What I would REALLY like is way to define standard record formats in a
separate definition, like:
- defining a common record length;
- defining the different record formats (attributes, position of the
line);
- and defining when a specific record format is to be used, dependent
on 1 or more identifiers in the record.

I CAN probably build something from scratch, but if I can (re)use
something that already exists it would be so much better and faster...
And a utility to do what I just described would be REALLY usefull in
LOTS of environments.

This means I have the following questions:

1. Does anybody now of a generic tool (not necessarily Python based)
that does the job I''ve outlined?
2. If not, is there some framework or widget in Python I can adapt to
do what I want?
3. If not, should I consider building all this just from scratch in
Python - which would probably mean not only learning Python, but some
other GUI related modules?
4. Or should I forget about Python and build someting in another
environment?

Any help would be appreciated.

推荐答案

这是一个基本的建议。编写一个

Python脚本来制作文本文件中的csv文件并不困难,在

添加逗号分隔字段的适当位置。然后可以在Excel(或其他一些电子表格)中浏览csv文件

。 A0和C1记录可以将
写入单独的csv文件。


有Python程序可以创建Excel电子表格,它们可以

用于以更复杂的方式格式化数据。

Here is an elementary suggestion. It would not be difficult to write a
Python script to make a csv file from your text files, adding commas at
the appropriate places to separate fields. Then the csv file can be
browsed in Excel (or some other spreadsheet). A0 and C1 records could
be written to separate csv files.

There are Python programs to create Excel spreadsheets, and they could
be used to format the data in more sophisticated ways.




Paul Kooistra写道:

Paul Kooistra wrote:
我需要一个工具来浏览大小为10-20 Mb的文本文件。这些
文件具有800字节(CR / LF)的固定记录长度,以及用于由外部公司创建打印页面的包含
记录。

每行(记录)包含一个2个字符的标识符,如''A0''或
''C1''。标识符标识行的记录格式,从而允许在文本文件中使用不同的记录格式。
例如:

A0记录可能包括:
recordnumber [1:4]
name [5:25]
填充[26:800]


1. Python语法称这些为[0:4] ],[4:25]等等。必须从

文件中给出的起始栏位置扣除1的习惯。




2.那么A0是什么?这些记录真的是804字节宽 - A0

加上以上加CR LF?什么是记录号 - 不能是一行

数字(4位数 - >最大10k; 10k * 800 - >仅8Mb);看起来太小了

是客户标识符;它是生成

A0,C1等的映射的关键?

而C1记录包括:
recordnumber [ 1:4]
phonenumber [5:15]
zipcode [16:20]
填充[21:800]

如你所见,所有记录都有固定列格式。我想构建一个实用程序,它允许我(在windows环境中)打开一个
文本文件并浏览记录(最好用搜索
选项),其中显示每个记录类型根据它的
recordformat(''Attributename:Value''格式)。这意味着从A0到C1记录的浏览会在屏幕上产生不同的属性+值列表,这样我就可以轻松地分析生成的数据了。 ,在文本编辑器中浏览,手头有
a叠印的记录格式。

这当然是在文本文件中编码数据的常用方法。
我试过了找到一个通用的基于文本的浏览器,它允许我这样做,但找不到任何东西。输入Python;我知道名字的
语言,我知道它处理文本很好,但我现在对学习Python并不感兴趣,我只需要一个工具来做我想要的。

我真正想要的是在
a单独定义中定义标准记录格式的方式,例如:
- 定义一个共同的记录长度;
- 定义不同的记录格式(属性,
行的位置);


添加类型,小数位数等等。

- 并定义何时使用特定记录格式,依赖<记录中的一个或多个标识符。

我可以从头开始构建一些东西,但是如果我可以(重新)使用已经存在的东西,它会好得多并且
更快......而且我做的事情的实用工具在很多环境中都非常有用。

这意味着我有以下问题:
1.现在是否有人使用通用工具(不一定是基于Python的)
来完成我已经概述的工作?


不,但如果你听说过,请发帖。

2.如果没有,是否有一些Python中的框架或小部件我可以适应 3.如果没有,我应该考虑在Python中从头开始构建所有这些 - 这可能意味着不仅要学习Python,还要考虑其他一些与GUI相关的GUI模块?


方法我使用的是你的建议,但没有

GUI。

我有一个获取布局信息和输入文件的Python脚本可以使用以下两种格式之一生成输出文件:


格式1:

类似于:

Rec:A0 recordnumber:0001 phonenumber:(123)555-1234 zipcode:12345


这通常比固定长度记录,因为你

省略了填充物(检查后它们是空白的!),并从字母数字字段中删除

尾随空格。你是否留下整数,按照文件或者翻译成人类可读的形式,或者翻译成人类可读的形式取决于谁将会阅读它。


然后使用健壮的文本编辑器(最好是在其查找函数中支持

正则表达式的编辑器)来浏览输出文件。


格式2:

Rec:A0

recordnumber:0001

等等,即每行一个字段?你为什么问?如果您是

这样的文件的消费者,那么你可以把它放在一小部分,把它放到

Excel,测试人员复制,制作大量多汁的测试数据通过

运行它来制作一个平面文件。

4.或者我应该忘记Python并在另一个
环境中构建一些东西?
I need a tool to browse text files with a size of 10-20 Mb. These
files have a fixed record length of 800 bytes (CR/LF), and containt
records used to create printed pages by an external company.

Each line (record) contains an 2-character identifier, like ''A0'' or
''C1''. The identifier identifies the record format for the line,
thereby allowing different record formats to be used in a textfile.
For example:

An A0 record may consist of:
recordnumber [1:4]
name [5:25]
filler [26:800]
1. Python syntax calls these [0:4], [4:25], etc. One has to get into
the habit of deducting 1 from the start column position given in a
document.

2. So where''s the "A0"? Are the records really 804 bytes wide -- "A0"
plus the above plus CR LF? What is "recordnumber" -- can''t be a line
number (4 digits -> max 10k; 10k * 800 -> only 8Mb); looks too small to
be a customer identifier; is it the key to a mapping that produces
"A0", "C1", etc?

while a C1 record consists of:
recordnumber [1:4]
phonenumber [5:15]
zipcode [16:20]
filler [21:800]

As you see, all records have a fixed column format. I would like to
build a utility which allows me (in a windows environment) to open a
textfile and browse through the records (ideally with a search
option), where each recordtype is displayed according to its
recordformat (''Attributename: Value'' format). This would mean that
browsing from a A0 to C1 record results in a different list of
attributes + values on the screen, allowing me to analyze the data
generated a lot easier then I do now, browsing in a text editor with a stack of printed record formats at hand.

This is of course quite a common way of encoding data in textfiles.
I''ve tried to find a generic text-based browser which allows me to do
just this, but cannot find anything. Enter Python; I know the language by name, I know it handles text just fine, but I am not really
interested in learning Python just now, I just need a tool to do what
I want.

What I would REALLY like is way to define standard record formats in a separate definition, like:
- defining a common record length;
- defining the different record formats (attributes, position of the
line);
Add in the type, number of decimal places, etc as well ..
- and defining when a specific record format is to be used, dependent
on 1 or more identifiers in the record.

I CAN probably build something from scratch, but if I can (re)use
something that already exists it would be so much better and faster... And a utility to do what I just described would be REALLY usefull in
LOTS of environments.

This means I have the following questions:

1. Does anybody now of a generic tool (not necessarily Python based)
that does the job I''ve outlined?
No, but please post if you hear of one.
2. If not, is there some framework or widget in Python I can adapt to
do what I want?
3. If not, should I consider building all this just from scratch in
Python - which would probably mean not only learning Python, but some
other GUI related modules?
Approach I use is along the lines of what you suggested, but w/o the
GUI.
I have a Python script that takes layout info and an input file and can
produce an output file in one of two formats:

Format 1:
something like:
Rec:A0 recordnumber:0001 phonenumber:(123) 555-1234 zipcode:12345

This is usually much shorter than the fixed length record, because you
leave out the fillers (after checking they are blank!), and strip
trailing spaces from alphanumeric fields. Whether you leave integers,
money, date etc fields as per file or translated into human-readable
form depends on who will be reading it.

You then use a robust text editor (preferably one which supports
regular expressions in its find function) to browse the output file.

Format 2:
Rec:A0
recordnumber:0001
etc etc i.e. one field per line? Why, you ask? If you are a consumer of
such files, so that you can take small chunks of this, drop it into
Excel, testers take copy, make lots of juicy test data, run it through
another script which makes a flat file out of it.
4. Or should I forget about Python and build someting in another
environment?




没办法!



No way!




Paul Kooistra写道:

Paul Kooistra wrote:
我需要一个工具来浏览大小为10-20 Mb的文本文件。这些
文件具有800字节(CR / LF)的固定记录长度,以及用于由外部公司创建打印页面的包含
记录。

每行(记录)包含一个2个字符的标识符,如''A0''或
''C1''。标识符标识行的记录格式,从而允许在文本文件中使用不同的记录格式。
例如:

A0记录可能包括:
recordnumber [1:4]
name [5:25]
填充[26:800]


1. Python语法称这些为[0:4] ],[4:25]等等。必须从

文件中给出的起始栏位置扣除1的习惯。




2.那么A0是什么?这些记录真的是804字节宽 - A0

加上以上加CR LF?什么是记录号 - 不能是一行

数字(4位数 - >最大10k; 10k * 800 - >仅8Mb);看起来太小了

是客户标识符;它是生成

A0,C1等的映射的关键?

而C1记录包括:
recordnumber [ 1:4]
phonenumber [5:15]
zipcode [16:20]
填充[21:800]

如你所见,所有记录都有固定列格式。我想构建一个实用程序,它允许我(在windows环境中)打开一个
文本文件并浏览记录(最好用搜索
选项),其中显示每个记录类型根据它的
recordformat(''Attributename:Value''格式)。这意味着从A0到C1记录的浏览会在屏幕上产生不同的属性+值列表,这样我就可以轻松地分析生成的数据了。 ,在文本编辑器中浏览,手头有
a叠印的记录格式。

这当然是在文本文件中编码数据的常用方法。
我试过了找到一个通用的基于文本的浏览器,它允许我这样做,但找不到任何东西。输入Python;我知道名字的
语言,我知道它处理文本很好,但我现在对学习Python并不感兴趣,我只需要一个工具来做我想要的。

我真正想要的是在
a单独定义中定义标准记录格式的方式,例如:
- 定义一个共同的记录长度;
- 定义不同的记录格式(属性,
行的位置);


添加类型,小数位数等等。

- 并定义何时使用特定记录格式,依赖<记录中的一个或多个标识符。

我可以从头开始构建一些东西,但是如果我可以(重新)使用已经存在的东西,它会好得多并且
更快......而且我做的事情的实用工具在很多环境中都非常有用。

这意味着我有以下问题:
1.现在是否有人使用通用工具(不一定是基于Python的)
来完成我已经概述的工作?


不,但如果你听说过,请发帖。

2.如果没有,是否有一些Python中的框架或小部件我可以适应 3.如果没有,我应该考虑在Python中从头开始构建所有这些 - 这可能意味着不仅要学习Python,还要考虑其他一些与GUI相关的GUI模块?


方法我使用的是你的建议,但没有

GUI。

我有一个获取布局信息和输入文件的Python脚本可以使用以下两种格式之一生成输出文件:


格式1:

类似于:

Rec:A0 recordnumber:0001 phonenumber:(123)555-1234 zipcode:12345


这通常比固定长度记录,因为你

省略了填充物(检查后它们是空白的!),并从字母数字字段中删除

尾随空格。你是否留下整数,按照文件或者翻译成人类可读的形式,或者翻译成人类可读的形式取决于谁将会阅读它。


然后使用健壮的文本编辑器(最好是在其查找函数中支持

正则表达式的编辑器)来浏览输出文件。


格式2:

Rec:A0

recordnumber:0001

等等,即每行一个字段?你为什么问?如果您是

这样的文件的消费者,那么你可以把它放在一小部分,把它放到

Excel,测试人员复制,制作大量多汁的测试数据通过

运行它来制作一个平面文件。

4.或者我应该忘记Python并在另一个
环境中构建一些东西?
I need a tool to browse text files with a size of 10-20 Mb. These
files have a fixed record length of 800 bytes (CR/LF), and containt
records used to create printed pages by an external company.

Each line (record) contains an 2-character identifier, like ''A0'' or
''C1''. The identifier identifies the record format for the line,
thereby allowing different record formats to be used in a textfile.
For example:

An A0 record may consist of:
recordnumber [1:4]
name [5:25]
filler [26:800]
1. Python syntax calls these [0:4], [4:25], etc. One has to get into
the habit of deducting 1 from the start column position given in a
document.

2. So where''s the "A0"? Are the records really 804 bytes wide -- "A0"
plus the above plus CR LF? What is "recordnumber" -- can''t be a line
number (4 digits -> max 10k; 10k * 800 -> only 8Mb); looks too small to
be a customer identifier; is it the key to a mapping that produces
"A0", "C1", etc?

while a C1 record consists of:
recordnumber [1:4]
phonenumber [5:15]
zipcode [16:20]
filler [21:800]

As you see, all records have a fixed column format. I would like to
build a utility which allows me (in a windows environment) to open a
textfile and browse through the records (ideally with a search
option), where each recordtype is displayed according to its
recordformat (''Attributename: Value'' format). This would mean that
browsing from a A0 to C1 record results in a different list of
attributes + values on the screen, allowing me to analyze the data
generated a lot easier then I do now, browsing in a text editor with a stack of printed record formats at hand.

This is of course quite a common way of encoding data in textfiles.
I''ve tried to find a generic text-based browser which allows me to do
just this, but cannot find anything. Enter Python; I know the language by name, I know it handles text just fine, but I am not really
interested in learning Python just now, I just need a tool to do what
I want.

What I would REALLY like is way to define standard record formats in a separate definition, like:
- defining a common record length;
- defining the different record formats (attributes, position of the
line);
Add in the type, number of decimal places, etc as well ..
- and defining when a specific record format is to be used, dependent
on 1 or more identifiers in the record.

I CAN probably build something from scratch, but if I can (re)use
something that already exists it would be so much better and faster... And a utility to do what I just described would be REALLY usefull in
LOTS of environments.

This means I have the following questions:

1. Does anybody now of a generic tool (not necessarily Python based)
that does the job I''ve outlined?
No, but please post if you hear of one.
2. If not, is there some framework or widget in Python I can adapt to
do what I want?
3. If not, should I consider building all this just from scratch in
Python - which would probably mean not only learning Python, but some
other GUI related modules?
Approach I use is along the lines of what you suggested, but w/o the
GUI.
I have a Python script that takes layout info and an input file and can
produce an output file in one of two formats:

Format 1:
something like:
Rec:A0 recordnumber:0001 phonenumber:(123) 555-1234 zipcode:12345

This is usually much shorter than the fixed length record, because you
leave out the fillers (after checking they are blank!), and strip
trailing spaces from alphanumeric fields. Whether you leave integers,
money, date etc fields as per file or translated into human-readable
form depends on who will be reading it.

You then use a robust text editor (preferably one which supports
regular expressions in its find function) to browse the output file.

Format 2:
Rec:A0
recordnumber:0001
etc etc i.e. one field per line? Why, you ask? If you are a consumer of
such files, so that you can take small chunks of this, drop it into
Excel, testers take copy, make lots of juicy test data, run it through
another script which makes a flat file out of it.
4. Or should I forget about Python and build someting in another
environment?




没办法!



No way!


这篇关于浏览文字; Python正确的工具?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆