在Perl或Python中模拟类似lex的功能 [英] Emulation of lex like functionality in Perl or Python

查看:52
本文介绍了在Perl或Python中模拟类似lex的功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是交易.有没有办法在基于多个正则表达式的行中标记字符串?

Here's the deal. Is there a way to have strings tokenized in a line based on multiple regexes?

一个例子:

我必须获取所有href标记,其对应的文本以及基于不同正则表达式的其他一些文本. 所以我有3个表达式,想对行进行标记并提取与每个表达式匹配的文本的标记.

I have to get all href tags, their corresponding text and some other text based on a different regex. So I have 3 expressions and would like to tokenize the line and extract tokens of text matching every expression.

我实际上是使用flex(不要与Adobe混淆)完成此操作的,它是很好的旧lex的实现. lex提供 通过基于表达式执行操作"来完成此操作的一种优雅方法.一个可以控制的方式 lex也读取文件(基于块/行的读取).

I have actually done this using flex (not to be confused with Adobe), which is an implementation of the good old lex. lex provides an elegant way to do this by executing "actions" based on expressions. One can control the way lex reading a file too (block / line based read).

问题在于flex实际上会产生实际上执行标记化工作的C/C ++代码.我有一个 制作包含所有这些内容的文件. 我想知道perl/python是否可以以某种方式做同样的事情.只是我想做的一切 我喜欢一种编程语言本身.

The problem is that flex actually produces C/ C++ code which actually does the tokenizing job. I have a make file which wraps all these things. I was wondering if perl /python can in some way do the same thing. Its just that I would like to do everything I like in a single programming language itself.

标记化只是我想在应用程序中要做的一件事.

Tokenizing is just one of the things that I want to do as part of my application.

除perl或python外,任何语言(也可以起作用)可以做到这一点吗?

Apart from perl or python can any language (functional also) do this?

我确实在这里阅读了有关PLY和ANTLR的信息(解析,在哪里可以我了解了).

I did read about PLY and ANTLR here (Parsing, where can I learn about it).

但是有没有办法在python本身中自然地做到这一点?请原谅我的无知,但是这些工具是否在任何流行的产品/服务中使用?

But is there a way to do it naturally in python itself? pardon my ignorance, but are these tools used in any popular products / services?

谢谢.

推荐答案

如果您是专门分析网页中的链接,那么Perl的

If you're specifically after parsing links out of web-pages, then Perl's WWW::Mechanize module will figure things out for you in a very elegant fashion. Here's a sample program that grabs the first page of Stack Overflow and parses out all the links, printing their text and corresponding URLs:

#!/usr/bin/perl
use strict;
use warnings;
use WWW::Mechanize;

my $mech = WWW::Mechanize->new;

$mech->get("http://stackoverflow.com/");

$mech->success or die "Oh no! Couldn't fetch stackoverflow.com";

foreach my $link ($mech->links) {
    print "* [",$link->text, "] points to ", $link->url, "\n";
}

在主循环中,每个$link都是 WWW :: Mechanize :: Link 对象,因此您不仅限于获取文本和URL.

In the main loop, each $link is a WWW::Mechanize::Link object, so you're not just constrained to getting the text and URL.

祝一切顺利,

保罗

这篇关于在Perl或Python中模拟类似lex的功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆