使用正则表达式解析文本文件 [英] Parse text file with regex

查看:61
本文介绍了使用正则表达式解析文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析一些js文件(ExtJS),并找到该文件中类使用的所有依赖项.

I'm trying to parse some js files (ExtJS) and find all dependencies that are used by class in that file.

示例js文件如下所示:

Sample js file looks like so:

Ext.define('Pandora.controller.Station', {
    extend: 'Ext.app.Controller',

    refs: [{
        ref: 'stationsList',
        selector: 'stationslist'
    }],

    stores: ['Stations', 'RecentSongs'],
    ...

我想要得到的是 Ext.app.Controller .

使用我的代码,我可以获得包含 extend

With my code I'm able to get all lines that contains extend

public void ReadAndFilter(string path)
{
    using (var reader = new StreamReader(path))
    {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
            if (line.Contains("extend"))
            {
                listBox2.Items.Add(line);
            }
        }
    }
}

但这还会返回注释和其他不必要的内容.我的想法是使用RegEx查找所有字符串.

But this also returns comments and other unnecessary things. My idea was to use RegEx to find all strings.

我的问题是,有时行在扩展的前后都有空格.
以下是一些可以在js文件中找到的示例:

My problem is that sometimes line has some spaces in front and after extend.
Here are some samples that can be found in js files:

extend          : 'Ext.AbstractPlugin',
extend: 'Ext.util.Observable',
@extends Sch.feature.AbstractTimeSpan
extend      : "Sch.feature.AbstractTimeSpan",
extend              : "Sch.plugin.Lines",
extend : "Sch.util.DragTracker",

在此运行RegEx应该返回:

Running RegEx on this should return:

Ext.AbstractPlugin
Ext.util.Observable
Sch.feature.AbstractTimeSpan
Sch.plugin.Lines
Sch.util.DragTracker

这是我的尝试: extend [] *:[] * ['] [a-zA-Z.] * ['"] ,我已经对其进行了测试此处,但我只想在双引号或双引号之间使用该部分(也可以通过验证吗?这样我们就可以排除那些带有第一个引号和第二个双引号).

Here is my attempt: extend[ ]*:[ ]*['"][a-zA-Z.]*['"], I've tested it here, but I want only to get part between quotes or double quotes (can this be also validated? So that we can exclude those with first quote and second double quote).

RegEx可能不是最快的,但我不知道该怎么做.
欢迎提出任何建议.

RegEx aren't maybe fastest, but I have no idea how else I could do that.
Any advices are welcome.

推荐答案

您可以简单地使用捕获组.您将所需的部分放在括号之间:

You can simply use a capture group; you wrap the required part between parentheses:

extend[ ]*:[ ]*['"]([a-zA-Z.]*)['"]

您可以通过 .Groups [1] .Value

根据要求:

extend *: *('|")(?<inside>[a-zA-Z.]*)\1

使用此代码,您可以使用 .Groups ["inside"].Value

With this one, you can access the captured group with .Groups["inside"].Value

这篇关于使用正则表达式解析文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆