邮件提取问题(拆分方法有问题) [英] Mail extraction problem (something's wrong with split methods)

查看:104
本文介绍了邮件提取问题(拆分方法有问题)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,


我有一点问题,虽然它很小但是非常困难

我来形容它,但是我会试试。

我写了一个程序,提取我收到的某些部分

电子邮件。电子邮件的内容实际上是可以预测的,它有一个非常长的数字列表,看起来像这样:


[34234,35435, 657789,6756735,12312378,09678567,23424]

当然连接到POP3服务器时我无法操纵我的邮件,

所以我决定转发邮件在本地并将其写入文件然后

操纵它。另一个问题是,在电子邮件中有很多输出,

垃圾字符和各种令人讨厌的东西,但不知何故,我设法

来解决它(下载电子邮件和提取有趣的部分),这里

是如何(我只会显示有趣的部分部分):


temp = [mail.read()]

enc_txt =" \ n" .join(temp)

begin = enc_txt.find(",''[" ;)+ len(",''[")

ending = enc_txt.find("]'',")


enc_txt2 =(enc_txt [开头:结尾])

mail.close()

lines = enc_txt2.splitlines()

enc_txt3 =''' '.join([line.strip()for line in lines])

split = re.split(",",enc_txt3)

enc = [int (elem)for elem in split]

enc = map(int,split)


此代码有效!但有个问题!当数字列表是
超过350字节时,在第350个地方我没有得到一个数字,但我得到了

一些引号和逗号和奇怪的事情。当列表长于

700字节时,这个问题会发生两次(实际上它不会发生,因为

解释器抱怨,但这种类型有两个错误)。是否有一个我缺少的东西,可以分割方法处理超过350个字节的

拆分文本?实际上发生了什么。


为了使它更清晰(因为我认为你不会理解它完全是b $ b)我可以上传错误,但它很大,所以我会尽量减少

日志。


[6964,7086,3211,7522,9472,3265,3610,104 ,9729,6706,8035,5439,

7142,360,677,1667,1382,9417,4493,8289,9613,3470,889,1021,3381,

3480,2483,6579,8928,3240,4437,5908,2290,9587,866,202,859,2184,

8328,..........] - 数字列表长705个字节。


当我运行程序时(在我的代码中使用命令print split,看看

发生了什么) :


[''6964'',''7086'',''3211'',''7522'',''9472'',''3265'', ''3610'',''104'',''

9729'',''6706'',''8035'',''5439'',''7142'', ''360'',''677'',''1667 '',''

1382'',''9417'',''4493'',''8289'',''9613'',''3470'',''889 '','''1021'',''

3381'',''3480'',''2483'',''6579'',''8928'',''3240 '','''4437'','''5908'',''

2290'',''9587'',''866'',''202'',''859 '',''2184'',''8328'',....." 6730''",

" ''",''6793''......, ''"," ''6573"'''869''...]


文件OTPAenc_dec.py,第258行,在decr

enc = [ int(elem)for elem in split]

ValueError:int()的文字无效:6730''


请帮助我,任何帮助将不胜感激。


提前致谢。


抱歉我的英语不好,表达方式不好,我真的不知道。 >
如何更彻底地解释它。

Hello,

I have a little problem and although it''s little it''s extremely difficult
for me to describe it, but I''ll try.
I have written a program which extracts certain portions of my received
e-mail. The content of the e-mail is actually predictable, it has one very
long list of numbers, something looking like this:

[34234,35435,657789,6756735,12312378,09678567,23424]

Of course I cannot manipulate my mail while connected to the POP3 server,
so I decided to transfer mail locally and write it to a file and then
manipulate it. Another problem is that in e-mails there is lot of output,
garbage characters and all sorts of nasty things, but somehow, I managed
to solve it (to download e-mail and extract interesting parts), and here
is how (i''ll only show the "interesting parts" part):

temp = [mail.read()]
enc_txt = "\n".join(temp)
begin = enc_txt.find(", ''[")+len(", ''[")
ending = enc_txt.find("]'', ")

enc_txt2 = (enc_txt[begin:ending])
mail.close()
lines = enc_txt2.splitlines()
enc_txt3 = '' ''.join([line.strip() for line in lines])
split = re.split(",", enc_txt3)
enc = [int(elem) for elem in split]
enc = map(int, split)

And this code works! But, there is a problem! When the list of numbers is
longer than 350 bytes, on the 350''th place I don''t get a number, but I get
some quotes and commas and strange things. When the list is longer than
700 bytes, this problem occurs twice (actually it does not occur because
interpretor complains, but there are two mistakes of this type). Is there
a thing I''m missing, can split methods handle more than 350 bytes of
splitting text? What''s actually happening.

To make it more clear (because I think you will not understand it
completely) i could upload errors, but it''s large, so I''ll minimize the
log.

[6964, 7086, 3211, 7522, 9472, 3265, 3610, 104, 9729, 6706, 8035, 5439,
7142, 360, 677, 1667, 1382, 9417, 4493, 8289, 9613, 3470, 889, 1021, 3381,
3480, 2483, 6579, 8928, 3240, 4437, 5908, 2290, 9587, 866, 202, 859, 2184,
8328, ..........] - the list of numbers 705 bytes long.

When I run the program (with command print split inside my code, to see
what''s going on):

[''6964'', '' 7086'', '' 3211'', '' 7522'', '' 9472'', '' 3265'', '' 3610'', '' 104'', ''
9729'', '' 6706'', '' 8035'', '' 5439'', '' 7142'', '' 360'', '' 677'', '' 1667'', ''
1382'', '' 9417'', '' 4493'', '' 8289'', '' 9613'', '' 3470'', '' 889'', '' 1021'', ''
3381'', '' 3480'', '' 2483'', '' 6579'', '' 8928'', '' 3240'', '' 4437'', '' 5908'', ''
2290'', '' 9587'', '' 866'', '' 202'', '' 859'', '' 2184'', '' 8328'', ..... " 6730''",
" ''", '' 6793''...... , " ''", " ''6573", '' 869''...]

File "OTPAenc_dec.py", line 258, in decr
enc = [int(elem) for elem in split]
ValueError: invalid literal for int(): 6730''

Please help me, any help will be appreciated.

Thanks in advance.

Sorry for my bad English and my bad expression style, I really don''t know
how to explain it more throughly.


推荐答案

>文件OTPAenc_dec.py,第258行,在decr
> File "OTPAenc_dec.py", line 258, in decr
enc = [int(elem)for elem in split]
ValueError:int()的无效文字:6730''
enc = [int(elem) for elem in split]
ValueError: invalid literal for int(): 6730''




问题在于你的号码中的尾随' - 当然不能转换为
。我看到数字6573有类似的问题 - 它有一个

领先''。


所以你的拆分代码不起作用,或者你的数据是格式错误 - 没有

更多信息,我不能说什么,但在我看来

后者就是这种情况。

-

问候,


Diez B. Roggisch



The problem is the trailing '' in your number - that of course can''t be
converted. And I see that the number 6573 has similar problems - it has a
leading ''.

So your splitting code does not work, or your data is malformed - without
more information, I can''t say anything about that, but it seems to me the
latter is the case.
--
Regards,

Diez B. Roggisch


2004年9月11日星期六19:01:35 +0200,Diez B. Roggisch写道:
On Sat, 11 Sep 2004 19:01:35 +0200, Diez B. Roggisch wrote:
问题是你的号码中的尾随' - 当然不能转换。而且我看到数字6573有类似的问题 - 它有一个领先的''。



是的,我知道,但我不明白为什么它适用于列表

350字节以下?它工作得很好...

所以你的分裂代码不起作用,或者你的数据格式不正确 -
没有更多的信息,我不能说什么,但似乎<对我来说后者是这样的。
The problem is the trailing '' in your number - that of course can''t be
converted. And I see that the number 6573 has similar problems - it has a
leading ''.

Yes, I know that, but I don''t understand why it works normally for lists
under 350 bytes? It works perfectly...
So your splitting code does not work, or your data is malformed -
without more information, I can''t say anything about that, but it seems
to me the latter is the case.




数据实际上没有格式错误,因为在拆分之前看起来很正常

(我的意思是,没有'或双引号或其他奇怪的字符)。拆分

代码是问题,我不知道如何解决它。我的意思是,如果它是错误的,那么较小的列表也不会起作用,但似乎

问题出现在大清单上。



Data is actually not malformed, because before splitting it looks normal
(I mean, no '' or double quotes or other strange characters). The splitting
code is the problem, and I don''t know how to fix it. I mean, if it would
be wrong, the smaller lists wouldn''t work either, but it seems the
problems occur with big lists.


>是的,我知道,但我不明白为什么它在350字节以下的列表
> Yes, I know that, but I don''t understand why it works normally for lists
正常工作?它工作得很完美...
under 350 bytes? It works perfectly...




肯定有_nothing_与350的大小 - 这个片段工作

完美:


len("," .join([str(i)for i in xrange(20000)])。split('','')

>



That certainly has _nothing_ to do with the size of 350 - this snippet works
perfect:

len(",".join([str(i) for i in xrange(20000)]).split('',''))

所以你的分裂代码不起作用,或者你的数据格式不正确 -
没有更多信息,我不能说什么,但是对我来说似乎是后者。
So your splitting code does not work, or your data is malformed -
without more information, I can''t say anything about that, but it seems
to me the latter is the case.



数据实际上没有格式错误,因为在拆分之前看起来很正常
(我的意思是,没有''或者双引号或其他奇怪的字符)。分裂的代码是问题,我不知道如何解决它。我的意思是,如果它会出错,那么较小的列表也不会起作用,但似乎大型列表会出现问题。



Data is actually not malformed, because before splitting it looks normal
(I mean, no '' or double quotes or other strange characters). The splitting
code is the problem, and I don''t know how to fix it. I mean, if it would
be wrong, the smaller lists wouldn''t work either, but it seems the
problems occur with big lists.




如上所述,它与此无关。除非你提供实际的

数据,否则我不能多说。我只能猜测350个字节有事可做

带有线条边界或类似的东西 - 你遇到某种特殊情况

你不是或不是这样的事情。


发布数据,我相信很快就能解决问题。


-

Diez B. Roggisch



As I proved above, it has nothing to do with that. Unless you provide actual
data I can''t say more. I can only guess that 350 bytes has something to do
with line-boundaries or similar stuff - you hit some sort of special case
you didn''t thing of or such a thing.

Do post the data, and I''m sure things will be sorted out soon.

--
Regards,

Diez B. Roggisch


这篇关于邮件提取问题(拆分方法有问题)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆