屏幕抓取:自动执行vim脚本 [英] Screen scraping: Automating a vim script

查看:88
本文介绍了屏幕抓取:自动执行vim脚本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在vim中,我将一系列网页(一次一个)加载到vim缓冲区中(使用vim netrw插件),然后解析html(使用vim elinks插件).都好.然后,我使用正则表达式编写了一系列vim脚本,最终结果有几千行,其中每一行都已正确格式化(csv)格式以上传到数据库中.

In vim, I loaded a series of web pages (one at a time) into a vim buffer (using the vim netrw plugin) and then parsed the html (using the vim elinks plugin). All good. I then wrote a series of vim scripts using regexes with a final result of a few thousand lines where each line was formatted correctly (csv) for uploading into a database.

为此,我必须使用vim的标记功能,以便可以遍历文档的特定点,然后将其重新组合成一个csv行.现在,我正在考虑通过使用Perl的"Mechanize"类库(UserAgent等)来实现此自动化.

In order to do that I had to use vim's marking functionality so that I could loop over specific points of the document and reassemble it back together into one csv line. Now, I am considering automating this by using Perl's "Mechanize" library of classes (UserAgent, etc).

问题:

  1. vim是否可以标记"文档的各个部分(以便 在Perl中完成替换操作?
  2. 建议直接使用"elinks"-我的意思是 使用ellinks将页面加载到无头浏览器中并执行Perl 来自那里的内容的脚本(?)
  3. 如果是正确的话,会不会出现部署问题? 将站点从localhost LAMP堆栈设置迁移到时,elink会闪烁 像Bluehost这样的托管公司?
  1. Can vim's ability to "mark" sections of a document (in order to perform substitutions on) be accomplished in Perl?
  2. It was suggested to use "elinks" directly - which I take to mean to load the page into a headless browser using ellinks and perform Perl scripts on the content from there(?)
  3. If that's correct, would there become a deployment problem with elinks when I migrate the site from my localhost LAMP stack setup to a hosting company like Bluehost?

谢谢

尝试将知识从VIM迁移到PERL:

TYRING TO MIGRATE KNOWLEDGE FROM VIM TO PERL:

如果@flesk(在下面)是正确的,那么我将如何执行该例程(用vim编写),该例程在文本文件("i"和"j")中标记"行,然后将其用作范围('i,'j)执行最后两个替换?

If @flesk (below) is right, then how would I go about performing this routine (written in vim) that "marks" lines in a text file ("i" and "j") and then uses that as a range ('i,'j) to perform the last two substitutions?

:g/^\s*\h/d|let@"=substitute(@"[:-2],'\s\+and\s\+',',','')|ki|/\n\s*\h\|\%$/kj|
\   'i,'js/^\s*\(\d\+\)\s\+-\s\+The/\=@".','.submatch(1).','/|'i,'js/\s\+//g

我在perldoc perlre手册中没有看到此功能.我是否缺少对m/或qr/的模块或基本的Perl理解?

I am not seeing this capability in the perldoc perlre manual. Am I missing either a module or some basic Perl understanding of m/ or qr/ ??

推荐答案

我确定您所需要的只是某种 HTML解析器.例如,我正在使用 HTML :: TreeBuilder :: XPath .

I'm sure all you need is some kind of HTML parser. For example I'm using HTML::TreeBuilder::XPath.

这篇关于屏幕抓取:自动执行vim脚本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆