HTML :: TableExtract:如何运行正确的参数[请参见实时示例] [英] HTML::TableExtract: how to run the right argument [see live example]

查看：94 发布时间：2020/8/14 9:03:33 mysql perl extract separator lwp

本文介绍了HTML :: TableExtract:如何运行正确的参数[请参见实时示例]的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有关解析器的问题.有机会在分隔表的表格中捕获一些分隔符吗... paser脚本可以很好地运行.注意-我想将数据存储到MySQL数据库中.因此，最好有一些分隔符-(逗号，制表符或其他形式-制表符分隔值或逗号分隔值是方便使用的格式...

A question regarding a parser. Is there any chance to catch some separators within the that separate the table... The paser script runs allready nicely. Note - i want to store the data into a MySQL database. So it would be great to have some seperators - (commas, tabs or somewhat else - a tab seperated values or comma seperated values are handy formats to work with...

(此处的数据来自以下网站: http://192.68.214.70/km/asps/schulsuche.asp?q=a&a=20 )

( here the data out of the following site: http://192.68.214.70/km/asps/schulsuche.asp?q=a&a=20 )

lfd. Nr. Schulnummer Schulname StraßePLZ Ort Telefon传真Schulart Webseite 1 0401管理体系 Marienburg，Abenberg，derDiözese EichstättMarienburg 1 91183 Abenberg 09178/509210皇家舒伦 mrs-marienburg.homepage.t-online.de 2 6581大众汽车学院阿本贝格(Grundschule)古苏贝尔斯特(Güssübelstr.) 2个 91183阿本贝格09178/215 09178/905060大众汽车 home.t-online.de/home/vs-abenberg 6 3074私立贝鲁夫学校 Sonderpäd. Förderung， FörderschwerpunktLernen，阿本斯贝格雷根斯堡大街60 93326 阿本斯贝格09443/709191 09443/709193 Berufsschulen zursonderpädog. Förderung www.berufsschule-abensberg.de

lfd. Nr. Schul- nummer Schulname Straße PLZ Ort Telefon Fax Schulart Webseite 1 0401 Mädchenrealschule Marienburg, Abenberg, der Diözese Eichstätt Marienburg 1 91183 Abenberg 09178/509210 Realschulen mrs-marienburg.homepage.t-online.de 2 6581 Volksschule Abenberg (Grundschule) Güssübelstr. 2 91183 Abenberg 09178/215 09178/905060 Volksschulen home.t-online.de/home/vs-abenberg 6 3074 Private Berufsschule zur sonderpäd. Förderung, Förderschwerpunkt Lernen, Abensberg Regensburger Straße 60 93326 Abensberg 09443/709191 09443/709193 Berufsschulen zur sonderpädog. Förderung www.berufsschule-abensberg.de

好吧，我需要将这些行至少分为三列-记录第一条记录.

Well i need to have those lines divided into at least three columns - take the first record.

名称:Volksschule 阿本贝格(Grundschule)街: 古塞尔贝尔斯特(Güssübelstr). 2个邮递区号: 91183 Abenberg传真和电话: 09178/215 09178/905060的类型学校:Volksschulen网站: home.t-online.de/home/vs-abenberg

name: Volksschule Abenberg (Grundschule) street: Güssübelstr. 2 postal-code and town: 91183 Abenberg fax and telephone: 09178/215 09178/905060 type of school: Volksschulen website: home.t-online.de/home/vs-abenberg

还是甚至更好-我已将邮政编码和城镇分为两个单独的列！问题:这可能吗?

Or even better - i have divided the postal-code and town into two seperate columns!? Question: is this possible?

顺便说一句:看到第一条记录:(这里我仅显示学校名称)

By the way: see the first record: (here i only show the names of the school)

10401慕尼黑工业学校马林堡，阿本贝格，6 3074私人 Berufsschule zursonderpäd. Förderung，阿本斯堡的FörderschwerpunktLernen

1 0401 Mädchenrealschule Marienburg, Abenberg, 6 3074 Private Berufsschule zur sonderpäd. Förderung, Förderschwerpunkt Lernen, Abensberg

名字里面有一些逗号；这是否会使创建创建csv-fomate的解析器变得困难?

Those have some commas inside the name; does this make it difficult to create a parser that creates csv-fomate?

任何想法如何在Perl中做到这一点...如果可能的话，那将是非常棒的！！很多关于这个小问题的提示-除此之外，所有这些都很棒并且令人着迷！

any idea how to do this in Perl... If possible it would be just great!! many many thx for a hint regarding this little issue - besides this all is great and fascinating!

零

顺便说一句-如果您愿意-我可以添加代码.没问题.

BTW - if you want - i can add the code. No problem here.

  #!/usr/bin/perl
    use strict;
    use warnings;
    use HTML::TableExtract;
    use LWP::Simple;
    use Cwd;
    use POSIX qw(strftime);
    my $te = HTML::TableExtract->new;
    my $total_records = 0;
    my $suchbegriffe = "e";
    my $treffer = 50;
    my $range = 0;
    my $url_to_process = "http://192.68.214.70/km/asps/schulsuche.asp?q=";
    my $processdir = "processing";
    my $counter = 50;
    my $displaydate = "";
    my $percent = 0;

    &workDir();
    chdir $processdir;
    &processURL();
    print "\nPress <enter> to continue\n";
    <>;
    $displaydate = strftime('%Y%m%d%H%M%S', localtime);
    open OUTFILE, ">webdata_for_$suchbegriffe\_$displaydate.txt";
    &processData();
    close OUTFILE;
    print "Finished processing $total_records records...\n";
    print "Processed data saved to $ENV{HOME}/$processdir/webdata_for_$suchbegriffe\_$displaydate.txt\n";
    unlink 'processing.html';
    die "\n";

    sub processURL() {
    print "\nProcessing $url_to_process$suchbegriffe&a=$treffer&s=$range\n";
    getstore("$url_to_process$suchbegriffe&a=$treffer&s=$range", 'tempfile.html') or die 'Unable to get page';

       while( <tempfile.html> ) {
          open( FH, "$_" ) or die;
          while( <FH> ) {
             if( $_ =~ /^.*?(Treffer <b>)(d+)( - )(d+)(</b> w+ w+ <b>)(d+).*/ ) {
                $total_records = $6;
                print "Total records to process is $total_records\n";
                }
             }
             close FH;
       }
       unlink 'tempfile.html';
    }

    sub processData() {
       while ( $range <= $total_records) {
          getstore("$url_to_process$suchbegriffe&a=$treffer&s=$range", 'processing.html') or die 'Unable to get page';
          $te->parse_file('processing.html');
          my ($table) = $te->tables;
          for my $row ( $table->rows ) {
             cleanup(@$row);
             print OUTFILE "@$row\n";
          }
          $| = 1; 
          print "Processed records $range to $counter";
          print "\r";
          $counter = $counter + 50;
          $range = $range + 50;
          $te = HTML::TableExtract->new;
       }
    }

    sub cleanup() {
       for ( @_ ) {
          s/s+/ /g;
       }
    }

    sub workDir() {
    # Use home directory to process data
    chdir or die "$!";
    if ( ! -d $processdir ) {
       mkdir ("$ENV{HOME}/$processdir", 0755) or die "Cannot make directory $processdir: $!";
       }
    }

HTML :: TableExtract:如何运行正确的参数[请参见实时示例] [英] HTML::TableExtract: how to run the right argument [see live example]

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

HTML :: TableExtract:如何运行正确的参数[请参见实时示例] [英] HTML::TableExtract: how to run the right argument [see live example]

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭