PHP - 合并两个TXT文件与条件 [英] PHP - combine two TXT files with conditions

查看:425
本文介绍了PHP - 合并两个TXT文件与条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(对不起,提前为长的问题 - 这个问题其实很简单 - 但要解释它也许不是那么简单了)

(sorry in advance for the long question - the problem is actually simple - but to explain it is maybe not so simple )

在PHP我noobie技能受此质疑:

My noobie skills in PHP are challenged by this :

2 TXT文件的输入,像这样的结构:

Input of 2 TXT files with a structure like this :

$rowidentifier //number,letter,string etc..
$some semi-fixed-string $somedelimiter $semi-fixed-string
$content //with unknown length or strings or lines number.

阅读上述,我在半固定字符串的意义,也就是说,它是一个具有已知结构的字符串,但未知内容..

reading the above , my meaning in "semi-fixed-string, means that it is a string with a KNOWN structure, but UNKNOWN content..

要给出一个实际的例子,let's采取SRT文件(我只是把它作为一个豚鼠的结构非常相似,我需要):

to give a practical example, let´s take an SRT file (i just use it as a guinea pig as the structure is very similar to what I need ):

1
00:00:12,759 --> 00:00:17,458
"some content here "
that continues here

2
00:00:18,298 --> 00:00:20,926
here we go again...

3
00:00:21,368 --> 00:00:24,565
...and this can go forever...

4
.
.
.

我想要做的,就是从一个文件采取的$内容的一部分,并把它放在正确的地点第二个文件。

what I want to do , is to take the the $content part from one file, and put it IN THE RIGHT PLACE at the second file .

要回例子SRT,有:

//file1 

    1
    00:00:12,759 --> 00:00:17,458
    "this is the italian content "
    which continues in italian here

    2
    00:00:18,298 --> 00:00:20,926
    here we go talking italian again ...

//file2 

    1
    00:00:12,756 --> 00:00:17,433
    "this is the spanish, chinese, or any content "
    which continues in spanish, or chinese here

    2
    00:00:16,293 --> 00:00:20,96
    here we go talking spanish, chinese or german again ...

将导致

//file3 

        1
        00:00:12,756 --> 00:00:17,433
        "this is the italian content "
        which continues in italian here
        "this is the spanish, chinese, or any content "
        which continues in spanish, or chinese here

        2
        00:00:16,293 --> 00:00:20,96
        here we go talking italian again ...
        here we go talking spanish, chinese or german again ...

以上PHP这样的:

or more php like :

$rowidentifier //unchanged
$some semi-fixed-string $somedelimiter $semi-fixed-string //unchanged, except maybe an option to choose if to keep file1 or file2 ...
$content //from file 1
$content //from file 2

所以,这一切的介绍后 - 这是我有什么(这相当于实际并没有任何..)

so, after all this introduction - this is what I have (which amounts to nothing actually..)

$first_file = file('file1.txt'); // no need to comment right ?
$second_file = file('file2.txt'); // see above comment
$result_array = array(); /construct array
foreach($first_file as $key=>$value) //loop array and.... 
$result_array[]= trim($value).'/r'.trim($second_file[$key]); //..here is my problem ...

// $Value is $content - but LINE BY LINE , and in our case, it could be 2-3- or even 4 lines
// should i go by delimiters /n/r ??  (not a good idea - how can i know they are there ?? )
// or should i go for regex to lookup for string patterns ? that is insane , no ?

$fp = fopen('merge.txt', 'w+'); fwrite($fp, join("\r\n", $result_array); fclose($fp);

这将做行线 - 这是不是我所需要的。我需要的条件..
也 - 我敢肯定这是不是一个聪明的code,或有很多更好的方法去它 - 因此,任何帮助将是AP preciated ...

this will do line by line - which is not what i need. I need conditions.. also - I am sure this is not a smart code, or that there are many better ways to go at it - so any help would be appreciated ...

推荐答案

你真正想要做的是遍历并行两个文件,​​然后合并属于彼此的一部分。

What you actually want to do is to iterate over both files in parallel and then combine the part belonging to each other.

但你不能使用行号,因为这些可能会有所不同。所以,你需要使用的条目(块)的数量。所以,你需要给它一个数字以上precise,走出从一个文件一个接一个的项目。

But you can not use the line numbers, because those might differ. So you need to use the number of the entry (block). So you need to give it a "number" or more precise, to get out one entry after the other from a file.

所以你需要有问题的数据是能够把一些行成块的迭代器。

So you need an iterator for the data in question that is able to turn some lines into a block.

而不是:

foreach($first_file as $number => $line)

foreach($first_file_blocks as $number => $block)

这可以通过编写自己的迭代器,它接受一个文件的行作为输入,然后将转换线成块的飞行来完成。对于需要分析数据,这是一个基于状态的解析器,可以行转换成块的一个小例子:

This can be done by writing your own iterator that takes a file's line as input and will then convert lines into blocks on the fly. For that you need to parse the data, this is a small example of a state based parser that can convert lines into blocks:

$state = 0;
$blocks = array();
foreach($lines as $line)
{
    switch($state)
    {
        case 0:
            unset($block);
            $block = array();
            $blocks[] = &$block;
            $block['number'] = $line;
            $state = 1;
            break;
        case 1:
            $block['range'] = $line;
            $state = 2;
            break;
        case 2:
            $block['text'] = '';
            $state = 3;
            # fall-through intended
        case 3:
            if ($line === '') {
                $state = 0;
                break;
            }
            $block['text'] .= ($block['text'] ? "\n" : '') . $line;
            break;
        default:
            throw new Exception(sprintf('Unhandled %d.', $state));
    }
}
unset($block);

它只是沿着线路运行,并改变它的状态。基于该状态下,各个线被处理,因为它的块的一部分。如果开始一个新块,它会被创建。它为你在你的问题已经大纲SRT文件,演示

为了它的使用更加灵活,把它变成一个迭代器,它接受 $行在它的构造函数,并提供块,而迭代。这需要一些小小的通过语法分析器如何获取线路上工作,但它的工作原理大致相同。

To make the use of it more flexible, turn it into an iterator which takes $lines in it's constructor and offers the blocks while iterating. This needs some little adoption how the parser gets the lines to work on but it works generally the same.

class SRTBlocks implements Iterator
{
    private $lines;
    private $current;
    private $key;
    public function __construct($lines)
    {
        if (is_array($lines))
        {
            $lines = new ArrayIterator($lines);
        }
        $this->lines = $lines;
    }
    public function rewind()
    {
        $this->lines->rewind();
        $this->current = NULL;
        $this->key = 0;
    }
    public function valid()
    {
        return $this->lines->valid();
    }
    public function current()
    {
        if (NULL !== $this->current)
        {
            return $this->current;
        }
        $state = 0;
        $block = NULL;
        while ($this->lines->valid() && $line = $this->lines->current())
        {
            switch($state)
            {
                case 0:
                    $block = array();
                    $block['number'] = $line;
                    $state = 1;
                    break;
                case 1:
                    $block['range'] = $line;
                    $state = 2;
                    break;
                case 2:
                    $block['text'] = '';
                    $state = 3;
                    # fall-through intended
                case 3:
                    if ($line === '') {
                        $state = 0;
                        break 2;
                    }
                    $block['text'] .= ($block['text'] ? "\n" : '') . $line;
                    break;
                default:
                    throw new Exception(sprintf('Unhandled %d.', $state));
            }
            $this->lines->next();
        }
        if (NULL === $block)
        {
            throw new Exception('Parser invalid (empty).');
        }
        $this->current = $block;
        $this->key++;
        return $block;
    }
    public function key()
    {
        return $this->key;
    }
    public function next()
    {
        $this->lines->next();
        $this->current = NULL;
    }
}

的基本用法如下,输出可以在演示可以看出:

$blocks = new SRTBlocks($lines); 
foreach($blocks as $index => $block)
{
    printf("Block #%d:\n", $index);
    print_r($block);
}

所以现在可以遍历所有块一个SRT文件。现在剩下的唯一的事情就是迭代并行两种SRT文件。由于PHP 5.3的SPL配备了 MultipleIterator 做这个。它现在pretty直线前进,为例子,我用同样的思路两次:

So now it's possible to iterate over all blocks in a SRT file. The only thing left now is to iterate over both SRT files in parallel. Since PHP 5.3 the SPL comes with the MultipleIterator that does this. It's now pretty straight forward, for the example I use the same lines twice:

$multi = new MultipleIterator();
$multi->attachIterator(new SRTBlocks($lines));
$multi->attachIterator(new SRTBlocks($lines));

foreach($multi as $blockPair)
{
    list($block1, $block2) = $blockPair;
    echo $block1['number'], "\n", $block1['range'], "\n", 
        $block1['text'], "\n", $block2['text'], "\n\n";
}

要在字符串(而不是输出)保存到一个文件中是相当琐碎,所以我离开了这一点答案。

To store the string (instead of outputting) into a file is rather trivial, so I leave this out of the answer.

所以要此话是什么?首先,像在一个文件中的行顺序数据可以很容易地在一个环和一些状态解析。这不仅适用于文件中的行,但也可以跨字符串。

So what to remark? First, sequential data like lines in a file can be easily parsed in a loop and some state. That works not only for lines in a file but also across strings.

二,为什么我在这里建议一个迭代器?首先,它很容易使用。这只是从并行处理一个文件,两个文件的一小步。旁边,迭代器可以在另一迭代器实际操作为好。例如与 SPLFileObject 类。它提供了文件中的所有行的迭代器。如果您有大量的文件,你可以使用 SPLFileObject (而不是数组),您将不必先加载这两个文件到阵列中,经过一个小除了 SRTBlocks ,从每一行的末尾删除尾随EOL字符:

Second, why did I suggest an iterator here? First it's easy to use. It was only a small step from handling one file to two files in parallel. Next to that, the iterator can actually operate on another iterator as well. For example with the SPLFileObject class. It provides an iterator over all lines in a file. If you have large files, you can just use the SPLFileObject (instead an array) and you won't need to load both files into arrays first, after a small addition to SRTBlocks that removes trailing EOL characters from the end of each line:

$line = rtrim($line, "\n\r");

这只是工作:

$multi = new MultipleIterator();
$multi->attachIterator(new SRTBlocks(new SplFileObject($file1)));
$multi->attachIterator(new SRTBlocks(new SplFileObject($file2)));

foreach($multi as $blockPair)
{
    list($block1, $block2) = $blockPair;
    echo $block1['number'], "\n", $block1['range'], "\n", 
        $block1['text'], "\n", $block2['text'], "\n\n";
}

这做,你可以处理甚至(几乎)相同的code真正的大文件。灵活的,不是吗? 的充分论证

That done you can process even really large files with (nearly) the same code. Flexible, isn't it? The full demonstration.

这篇关于PHP - 合并两个TXT文件与条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆