使大型处理工作变小 [英] Making a large processing job smaller

查看:155
本文介绍了使大型处理工作变小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我在使用解决方案时使用的代码。

This is the code I'm using as I work my way to a solution.

 public function indexAction()
    {
        //id3 options
        $options = array("version" => 3.0, "encoding" => Zend_Media_Id3_Encoding::ISO88591, "compat" => true);
        //path to collection
        $path = APPLICATION_PATH . '/../public/Media/Music/';//Currently Approx 2000 files
        //inner iterator
        $dir = new RecursiveDirectoryIterator($path, RecursiveDirectoryIterator::SKIP_DOTS);
        //iterator
        $iterator = new RecursiveIteratorIterator($dir, RecursiveIteratorIterator::SELF_FIRST);
        foreach ($iterator as $file) {
            if (!$file->isDir() && $file->getExtension() === 'mp3') {
                //real path to mp3 file
                $filePath = $file->getRealPath();
                Zend_Debug::dump($filePath);//current results: accepted path no errors
                $id3 = new Zend_Media_Id3v2($filePath, $options);
                foreach ($id3->getFramesByIdentifier("T*") as $frame) {
                    $data[$frame->identifier] = $frame->text;
                }
                Zend_Debug::dump($data);//currently can scan the whole collection without timing out, but APIC data not being processed.
            }
        }
    }

问题: 在多个目录中处理mp3文件的文件系统。将id3标签数据提取到数据库(3个表),并将标签中的封面图像提取到单独的文件中。

The problem: Process a file system of mp3 files in multiple directories. Extract id3 tag data to a database (3 tables) and extract the cover image from the tag to a separate file.

我可以处理实际的提取和数据处理。我的问题是输出。

I can handle the actual extraction and data handling. My issue is with output.

使用Zend Framework 1.x处理输出缓冲的方式,输出指示文件正在处理的指标很困难。在没有输出缓冲的旧式PHP脚本中,您可以在循环的每次迭代中打印出一些html并且有一些进度指示。

With the way that Zend Framework 1.x handles output buffering, outputting an indicator that the files are being processed is difficult. In an old style PHP script, without output buffering, you could print out a bit of html with every iteration of the loop and have some indication of progress.

我希望能够处理每个专辑的目录,输出结果然后继续到下一个专辑的目录。只需要用户干预某些错误。

I would like to be able to process each album's directory, output the results and then continue on to the next album's directory. Only requiring user intervention on certain errors.

任何帮助都将不胜感激。

Any help would be appreciated.

Javascript不是我的解决方案我正在寻找。我认为这应该可以在PHP和ZF 1 MVC的构造中实现。

Javascript is not the solution I'm looking for. I feel that this should be possible within the constructs of PHP and a ZF 1 MVC.

我这样做主要是为了我自己的启蒙,似乎是学习一些重要概念的好方法。



好​​的,怎么样关于如何将其分解为更小的块的一些想法。处理一个块,提交,处理下一个块,一种东西。进出ZF。


Ok, how about some ideas on how to break this down into smaller chunks. Process one chunk, commit, process next chunk, kind of thing. In or out of ZF.



我开始看到我的问题了我想要完成。似乎输出缓冲不仅仅发生在ZF中,它从ZF一直到浏览器都在发生。嗯...


I'm beginning to see the problem with what I'm trying to accomplish. It seems that output buffering is not just happening in ZF, it's happening everywhere from ZF all the way to the browser. Hmmmmm...

推荐答案

简介



这是一个典型的例子你不该做什么因为


  • 你试图解析 ID3标签使用PHP很慢并且试图同时拥有多个解析文件肯定会让它更慢

  • You are trying to parse ID3 tag with PHP which is slow and trying to have multiple parse files at once would definitely make it even slower

RecursiveDirectoryIterator 将加载文件夹和子文件夹中的所有文件,我认为没有限制..它可以是 2,000 今天第二天 100,000 ?总处理时间是不可预测的,在某些情况下肯定需要几个小时

RecursiveDirectoryIterator would load all the files in a folder and sub folder from what i see there is no limit .. it can be 2,000 today the 100,000 the next day ? Total processing time is unpredictable and this can definitely take some hours in some cases

高度依赖单个文件系统,当前架构文件存储在本地系统因此很难拆分文件并进行适当的负载平衡

High dependence on single file system, with your current architecture the files are stored in local system so it would be difficult to split the files and do proper load balancing

您没有检查文件信息是否已被提取过,结果是循环和提取重复

You are not checking if the file information has been extracted before and this results Loop and extraction Duplication

无锁定系统。这意味着这个过程可以同时启动,导致服务器性能普遍下降

No locking system .. this means that this process can be initiated simultaneously resulting to general slow performance on the server

我的建议是不要使用 loop RecursiveDirectoryIterator 批量处理文件。

My advice is not to use loop or RecursiveDirectoryIterator to process the files in bulk.

将文件上传传输后立即定位到服务器。这样你一次只能处理一个文件这样可以分散处理时间。

Target the file as soon as they are uploaded or transferred to the server. That way you are only working with one file at a time this way to can spread the processing time.

您的问题正是作业队列的目的您也不仅限于使用 PHP ..您利用 C C ++ 获得性能

Your problem is exactly what Job Queue are designed to do you are also not limited to implementing the parsing with PHP .. you take advantage of C or C++ for performance

优势


  • 将工作转移到其他机器或进程更适合做这项工作

  • 它允许你并行工作,负载均衡处理

  • 减少页面浏览量的延迟通过异步运行耗时的任务来实现卷Web应用程序

  • PHP中的多语言客户端 C中的服务器

  • Transfer Jobs to other machines or processes that are better suited to do the work
  • It allows you to do work in parallel, to load balance processing
  • Reduce the latency of page views in high-volume web applications by running time-consuming tasks asynchronously
  • Multiple Languages client in PHP sever in C

示例已经过测试

  • ZemoMQ
  • Gearman
  • Beanstalkd

预期Process Client


  • 连接到作业队列,例如德语

  • 连接到数据库,例如MongoDB或者Redis

  • 循环文件夹路径

  • 检查文件扩展名

  • 如果文件是mp3,则生成文件哈希例如。 sha1_file

  • 检查文件是否已发送进行处理

  • 发送哈希,文件到作业服务器

  • Connect To Job Queue eg German
  • Connect to Database eg MongoDB or Redis
  • Loop with folder path
  • Check File extension
  • If file is mp3 , generate file hash eg. sha1_file
  • Check if file has been sent for processing
  • send hash, file to Job Server

预期流程服务器


  • 连接到作业队列,例如德语

  • 连接到数据库,例如MongoDB或Redis

  • 接收哈希/文件

  • 提取ID3标签;

  • 使用ID3标记信息更新数据库

  • Connect To Job Queue eg German
  • Connect to Database eg MongoDB or Redis
  • Receive hash / file
  • Extract ID3 tag ;
  • Update DB with ID3 Tag Information

最后,这个处理可以在多个服务器上并行完成

Finally this processing can be done on multiple servers in parallel

这篇关于使大型处理工作变小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆