在Perl中这可能吗?邮件归档数据操作 [英] Is this possible in Perl? Mail archive data manipulation

查看:53
本文介绍了在Perl中这可能吗?邮件归档数据操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Hello all;


我有一个.csv文件,其中包含从一个论坛导出的消息,我想导入到另一个论坛(phpBB),但我需要做一些首先对原始导出文件进行数据处理。


原始.csv文件在每行包含主题或主题字段。后续行包含回复和其他主题。该文件按时间顺序排列。


这是一个示例布局:

ForumID,主题,UNIX TimeStamp,Poster_Name,消息

12,Wings,830799174,John Doe,Lorem ipsum ...

13,车轮,831213087,Jane Q. Public,Dolor坐下......

12,RE:Wings,860225019,Bob Builder,Praesent adipiscing sapien ut purus ....


我需要在.csv文件的每一行插入一个topic_id。上面示例的topic_id对于第一行和第三行是1而对于第二行是2。


这就是我想象这个topic_id的自动插入是如何工作的:


脚本会查看每一行并读取主题,然后将其与包含主题的表进行比较。如果Subject不存在,那么它将为该主题创建一个topic_id并将其分配给该行并写入topic_id并根据该表进行操作。如果主题已经存在,那么它将从表中查找topic_id并将其分配给行,依此类推。


我认为RE:在每个后续帖子中可能是一个复杂因素,但由于所有帖子都是按时间顺序排列的,因此RE:可以从主题列中删除。


这里可以看到更大的实际文件样本:

http://www.smofco.com/dicky/Sample_fields_for_import_Rev2.csv


完整的文件包含大约35,000行或帖子。


Perl是一个很好的编程语言吗?


我讨厌在写这篇文章时无耻地请求帮助,但我对Perl的了解很少。但是,我愿意学习!因此,任何指针,提示,想法,代码示例等都将不胜感激。


干杯,

Omar Filipovic

Hello all;

I have a .csv file that contains messages exported from one discussion forum that I want to import into another forum (phpBB), but I need to do some data manipulation on the original export file first.

The original .csv file contains a subject or topic field on each line. Subsequent lines contain replies and other subjects. The file is in chronological order.

Here is a sample layout:

ForumID,Subject,UNIX TimeStamp,Poster_Name,Message
12,Wings,830799174,John Doe,Lorem ipsum...
13,Wheels,831213087,Jane Q. Public,Dolor sit amet...
12,RE: Wings,860225019,Bob Builder,Praesent adipiscing sapien ut purus....

I need to insert a topic_id in each line of the .csv file. The topic_id for the sample above would be 1 for the first and third line and 2 for the second line.

Here''s how I imagine this automated insertion of topic_id would work:

A script would look at each row and read the Subject, then compare it to a table with Subjects. If the Subject does not exist, then it would create a topic_id for that subject and assign it to the row and write the topic_id and subject to the table. If the subject already exists, then it would look up the topic_id from the table and assign it to the row, and so on.

I think the "RE:" in each follow-up post may be a complicating factor, but since all posts are in chronological order, the "RE:" could be stripped from the Subject column.

A larger sample of the actual file can be seen here:
http://www.smofco.com/dicky/Sample_fields_for_import_Rev2.csv

The complete file contains about 35,000 lines or posts.

Is Perl a good programing language for such a task?

I hate to shamelessly ask for assistance in writing this, but my knowledge of Perl is minimal. However, I am willing to learn! So, any pointers, hints, ideas, code samples, etc. will be greatly appreciated.

Cheers,
Omar Filipovic

推荐答案

主题id插入的位置是什么?
Where does the topic id get inserted?



主题id插入的位置是什么?
Where does the topic id get inserted?



无处不在;例如,它可以在每一行的末尾。


或者它可以在一个单独的输出文件中,我以后可以简单地将此文件作为Excel中的列添加到现有的csv文件,只要该输出文件与原始csv的排序顺序相同,并且每个topic_id在单独的行上。我知道它不是一个优雅的解决方案,但它会起作用。


然而,如果某一行被跳过,最后有30,000个帖子和29,999个topic_ids那么我' '卡住了......


我还可以从csv文件中分离出Subject列,如果这样可以简化事情。

Anywhere; it can be at the end of each line, for example.

Or it can be in a separate output file and I can later simply add this file as a column in Excel to the existing csv file, as long as that output file is in the same sort order as the original csv and each topic_id is on a separate line. I know it''s not an elegant solution, but it would work.

However, if somehow a row got skipped and at the end there are 30,000 posts and 29,999 topic_ids then I''m stuck...

I can also separate out only the Subject column from the csv file if that will simplify things.


是"受试者"总是独一无二的?
Are the "subjects" always unique?


这篇关于在Perl中这可能吗?邮件归档数据操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆