从文本文件中提取行 [英] Extract line from a text file

查看:119
本文介绍了从文本文件中提取行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好。我想就我的实用工具问一些帮助,我现在一直在努力。

我想根据日期和时间以及IP地址提取线路。

这是示例文件:IPAddress,DateandTime,Get value,





172.21.128.221 - - [22 / Jan / 2013 :16:00:00 +0900]GET /iv/iv_1_5_1/js/annt/con_annt_resize.js?v=200711220 HTTP / 1.0304 -

10.144.100.63 - - [22 / Jan / 2013:16:00:00 +0900]GET /iv/iv_lite/files/iv.key HTTP / 1.0200 114



我不是知道如何将其解析为3个部分,以便我可以对每个部分进行过滤。

我已经完成的工作只是输出如果某行包含172.21.128.221



即:



Hi. I would like to ask some help regarding on my utility which I have been working out right now.
I want to extract the lines according to what date and time and IP address.
This is the example file: IPAddress, DateandTime, Get value,


172.21.128.221 - - [22/Jan/2013:16:00:00 +0900] "GET /iv/iv_1_5_1/js/annt/con_annt_resize.js?v=200711220 HTTP/1.0" 304 -
10.144.100.63 - - [22/Jan/2013:16:00:00 +0900] "GET /iv/iv_lite/files/iv.key HTTP/1.0" 200 114

I don''t know how to parse this into 3 parts so that I can make a filter on each.
What I have already done was only to output if there is certain line contains "172.21.128.221"

which is:

using (StreamReader reader = new StreamReader(txtSource))
           {
               using (StreamWriter writer = new StreamWriter(txtOutput))
               {
                   string line;
                   while ((line = reader.ReadLine()) != null)
                   {
                       if (line.Contains(Filter.Text)) // Filter.Text = "172.21.128.221"
                       {
                           writer.WriteLine(line);
                           counter++;
                           dr[col1] = line;
                       }
                   }
                   dt.Rows.Add(dr);
                   dgvResult.DataSource = dt;
               }
           }

推荐答案

在不了解您的数据以及您想要用它做什么的情况下,我会建议一个正则表达式。假设您已经将文本分成几行:

Without knowing more about your data and what you want to do with it, I would suggest a regex. Assuming you have already broken your text into lines:
public static Regex regex = new Regex("(?<IPAddr>\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}).*?\\[(?<Date>.*?)\\] (?<Action>.*)",
                                      RegexOptions.CultureInvariant | RegexOptions.Compiled);

    ...
    string InputText = "172.21.128.221 - - [22/Jan/2013:16:00:00 +0900] \"GET /iv/iv_1_5_1/js/annt/con_annt_resize.js?v=200711220 HTTP/1.0\" 304 -";
    Match m = regex.Match(InputText);
    if (m.Groups["IPAddr"].Value == Filter.text)
        {
        string date = m.Groups["Date"].Value;
        string action = m.Groups["Action"].Value;
        ...
        }





哦。对不起。这里。

鉴于我有这些数据:






"Oh. I''m sorry. Here.
Given that I have this data:


172.21.128.221 - - [22/Jan/2013:16:00:00 +0900] "GET /iv/iv_1_5_1/js/annt/con_annt_resize.js?v=200711220 HTTP/1.0" 304 -
10.144.100.63 - - [22/Jan/2013:16:00:00 +0900] "GET /iv/iv_lite/files/iv.key HTTP/1.0" 200 114 
10.144.100.64 - - [22/Jan/2013:16:00:00 +0900] "POST /iv/iv_lite/files/iv.key HTTP/1.0" 200 114 



我希望用日期提取数据: [2013年1月22日:16:00:00 +0900]和方法:GET



我怎样才能成功?




尝试:


and I would like to extract the data with the Date: [22/Jan/2013:16:00:00 +0900] and the Method: GET

How would I make my condition?"


Try:

(?<IPAddr>\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}).*?\\[(?<Date>.*?)\\]\\s\"(?<Method>\\w+)(?<Action>.*)



然后,您将在比赛中添加名为方法的组



获取 Expresso [ ^ ] - 它'是免费的,它检查并生成正则表达式。


You will then get a Group called "Method" added to your Match

Get a copy of Expresso [^] - it''s free, and it examines and generates Regular expressions.


要从单行解析部分,可以使用正则表达式 [ ^ ]如下:

To parse parts from single line, you can use regular expression[^] like this:
^(?<ip>\S*) - - \[(?<date>.*)\] "(?<method>\S+) (?<url>.+)" (?<code>\d+) (?<tail>.+)



零件可在 ip 日期,<$ c $下访问c>方法, url 代码 tail 这样的命名组:


Parts will be accessible under ip, date, method, url, code and tail named groups like this:

if (line.StartsWith("172.21.128.221"))
{
    var match = ParseLineRegex.Match(line);

    if (match.Success)
    {
        var ip = match.Groups["ip"].Value;
        var date = match.Groups["date"].Value;
        // and so on...
    }
}



出于性能原因,请确保您的正则表达式实例只创建一次并编译:


For the performance reasons, make sure that your regex instance is created only once and is compiled:

static readonly Regex ParseLineRegex = new Regex(@"^(?<ip>\S*)...", RegexOptions.Compiled)


以下可以是您的代码流程:

1.逐行开始读取文件(您已经这样做了)

2.按字符读取每一行

foreach(字符ch in strLine)

{

}

3.开始附加一个字符串直到你得到一个空格

4.一旦你得到一个空格,在IP地址变量中保存附加的字符串并将附加的字符串作为空白

5。现在开始跳过所有角色,直到你打开方括号''[''

6.再次开始追加,直到你得到一个接近方括号'']''

7.一旦你得到'']''保存附加字符串作为dtDatetime变量并使附加字符串为空白

8.现在开始跳过所有字符,直到你得到双引号'''' '

9.再次开始追加,直到你再次获得双引号''''''

10.一旦你第二次得到''''',保存附加string as getValue

11.重复步骤2到10直到文件结束。



希望这会对你有帮助。



~Amol
Following can be your code flows:
1. start reading file line by line (You are already doing that)
2. read each line character by character
foreach(character ch in strLine)
{
}
3. start appending a string till you get a space
4. as soon as you get a space, save appended string in IP address variable and make appended string as blank
5. now start skipping all the characters till you get opening square bracket ''[''
6. Again start appending till you get a closeing square bracket '']''
7. As soon as you get a '']'' save appended string as dtDatetime variable and make appended string as blank
8. now start skipping all the characters till you get double quotes ''"''
9. Again start appending till you again get double quotes ''"''
10. As soon as you get ''"'' second time, save appended string as getValue
11. repeat step 2 to 10 till end of file.

Hope this will help you.

~Amol


这篇关于从文本文件中提取行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆