PHP:通过CSV文件搜索OOP方式 [英] PHP: Searching through a CSV file the OOP way

查看:198
本文介绍了PHP:通过CSV文件搜索OOP方式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要编写一个脚本来搜索CSV文件,并对其执行某些搜索功能;


  1. 查找重复的条目

  2. 查找与另一列中禁止条目列表匹配

  3. 通过指定列上的正则表达式匹配查找条目

现在,我在程序上没有任何问题,但是现在我正在进行面向对象编程,我想使用类和



然而,在OOP中的想法对我来说并不自然,所以我不完全确定要走哪条路。我不是在寻找具体的代码,而是关于如何设计脚本的建议。



我目前的想法是这样的;


  1. 创建文件类。这将处理数据的导入/导出

  2. 创建搜索类。文件的子类。这将包含各种搜索方法

如何在index.php中运行:


  1. 从index.php文件对象中的csv获取数组

  2. 创建一个循环以遍历数组的值

  3. 从搜索对象调用循环中的方法并回显

这种方法是这样的;




  • 我想指向数组中的不同元素来查看特定的列。我可以把我的循环在一个函数,并传递这个作为一个参数,但这种失败的OOP点,我觉得

  • 我的搜索方法将工作在不同的方式。要搜索重复的条目是非常简单的嵌套循环,但我不需要嵌套循环来做一个简单的单词或正则表达式搜索。



我应该这样吗?


  1. 创建文件类。这将处理数据的导入/导出

  2. 创建循环类文件类的子类。这将包含处理对数组进行迭代的方法。

  3. 创建一个搜索类。循环的子类。这将包含各种搜索方法

我的主要问题是,它似乎可能需要多个搜索对象,在我的循环类。



任何帮助将非常感谢。我是OOP的新手,虽然我理解各个部分,我还不能看到更大的图片。

解决方案

div>

我打算举例说明一个合理的方法来设计OOP代码,以满足你的需求。虽然我坚信下面提出的想法是合理的,请注意:




  • 设计可以改进 - 显示方法,而不是最终产品

  • 实施只是作为示例 - 如果(几乎不工作) >





如何去做



高度设计的解决方案将首先尝试定义数据的接口。也就是说,想想什么是数据的表示,允许您执行所有的查询操作。 是的有限集合

  • 行是值的有限集合。每个值都是一个字符串,并且可以根据其基于零的索引(即列索引)进行访问。数据集中的所有行具有完全相同的值数。



  • 此定义足以实现您提及的所有三种类型的查询循环遍历行并对特定列的值执行某种类型的测试。



    下一步是定义一个描述代码中的上述接口。一个不是特别好但仍然足够的方法是:

      interface IDataSet {
    public function getRowCount
    public function getValueAt($ row,$ column);
    }

    现在这部分已经完成,你可以去定义一个具体类实现此接口并可在您的情况下使用:

     类InMemoryDataSet实现IDataSet {
    private $ _data = array );

    public function __construct(array $ data){
    $ this-> _data = $ data;
    }

    public function getRowCount(){
    return count($ this-> _data);
    }

    public function getValueAt($ row,$ column){
    if($ row> = $ this-> getRowCount()){
    throw new OutOfRangeException();
    }

    返回isset($ this-> _data [$ row] [$ column])
    ? $ this-> _data [$ row] [$ column]
    :null;
    }
    }

    下一步是去写一些代码将您的输入数据转换为某种类型的 IDataSet

      function CSVToDataSet $ file){
    return new InMemoryDataSet(array_map('str_getcsv',file($ file))));
    }

    现在你可以轻松地创建一个 IDataSet 从CSV文件,你知道你可以执行你的查询,因为 IDataSet 是明确设计的目的。



    缺少的只是创建一个可重用的类,可以在 IDataSet 。这里是其中之一:

      class DataQuery {
    private $ _dataSet;

    public function __construct(IDataSet $ dataSet){
    $ this-> _dataSet = $ dataSet;
    }

    public static function getRowsWithDuplicates($ columnIndex){
    $ values = array();
    for($ i = 0; $ i< $ this-> _dataSet-> getRowCount(); ++ $ i){
    $ values [$ this-> _dataSet-& - > getValueAt($ i,$ columnIndex)] [] = $ i;
    }

    return array_filter($ values,function($ row){return count($ row)> 1;});
    }
    }

    此代码将返回一个数组,在CSV数据中,值是具有每个值出现的行的基于零的索引的数组。因为只有重复的值被返回,每个数组将至少有两个元素。



    现在你已经准备好了:

      $ dataSet = CSVToDataSet(data.csv); 
    $ query = new DataQuery($ dataSet);
    $ dupes = $ query-> getRowsWithDuplicates(0);



    执行此操作可获得什么





    如果您想添加更多查询操作,请将它们添加到<$ c $

    c> DataQuery ,您可以立即在所有具体类型的数据集上使用它们。



    如果要更改数据的内部表示形式,请修改 InMemoryDataSet 或创建另一个实现 IDataSet 的类,并使用 CSVToDataSet



    如果您需要更改数据集的定义允许更多类型的查询有效地执行),那么你必须修改 IDataSet ,这也将所有具体的数据集类放入图片,可能 DataQuery 。虽然这不会是世界的尽头,这正是你想避免的事情。



    这正是我建议的原因从此开始:如果您为数据集提供了一个很好的定义,其他一切都会落到实处。


    I need to write a script that will search through a CSV file, and perform certain search functions on it;

    1. find duplicate entries in a column
    2. find matches to a list of banned entries in another column
    3. find entries through regular expression matching on a column specified

    Now, I have no problem at all coding this procedurally, but as I am now moving on to Object Orientated Programming, I would like to use classes and instances of objects instead.

    However, thinking in OOP doesn't come naturally to me yet, so I'm not entirely sure which way to go. I'm not looking for specific code, but rather suggestions on how I could design the script.

    My current thinking is this;

    1. Create a file class. This will handle import/export of data
    2. Create a search class. A child class of file. This will contain the various search methods

    How it would function in index.php:

    1. get an array from the csv in the file object in index.php
    2. create a loop to iterate through the values of the array
    3. call the methods in the loop from a search object and echo them out

    The problem I see with this approach is this;

    • I will want to point at different elements in my array to look at particular "columns". I could just put my loop in a function and pass this as a parameter, but this kind of defeats the point of OOP, I feel
    • My search methods will work in different ways. To search for duplicate entries is fairly straight forward with nested loops, but I do not need a nested loop to do a simple word or regular expression searchs.

    Should I instead go like this?

    1. Create a file class. This will handle import/export of data
    2. Create a loop class A child of class of file. This will contain methods that deals with iterating through the array
    3. Create a search class. A child class of loop. This will contain the various search methods

    My main issue with this is that it appears that I may need multiple search objects and iterate through this within my loop class.

    Any help would be much appreciated. I'm very new to OOP, and while I understand the individual parts, I'm not yet able to see the bigger picture. I may be overcomplicating what it is I'm trying to do, or there may be a much simpler way that I can't see yet.

    解决方案

    I 'm going to illustrate a reasonable approach to designing OOP code that serves your stated needs. While I firmly believe that the ideas presented below are sound, please be aware that:

    • the design can be improved -- the aim here is to show the approach, not the final product
    • the implementation is only meant as an example -- if it (barely) works, it's good enough

    How to go about doing this

    A highly engineered solution would start by trying to define the interface to the data. That is, think about what would be a representation of the data that allows you to perform all your query operations. Here's one that would work:

    • A dataset is a finite collection of rows. Each row can be accessed given its zero-based index.
    • A row is a finite collection of values. Each value is a string and can be accessed given its zero-based index (i.e. column index). All rows in a dataset have exactly the same number of values.

    This definition is enough to implement all three types of queries you mention by looping over the rows and performing some type of test on the values of a particular column.

    The next move is to define an interface that describes the above in code. A not particularly nice but still adequate approach would be:

    interface IDataSet {
        public function getRowCount();
        public function getValueAt($row, $column);
    }
    

    Now that this part is done, you can go and define a concrete class that implements this interface and can be used in your situation:

    class InMemoryDataSet implements IDataSet {
        private $_data = array();
    
        public function __construct(array $data) {
            $this->_data = $data;
        }
    
        public function getRowCount() {
            return count($this->_data);
        }
    
        public function getValueAt($row, $column) {
            if ($row >= $this->getRowCount()) {
                throw new OutOfRangeException();
            }
    
            return isset($this->_data[$row][$column])
                ? $this->_data[$row][$column]
                : null;
        }
    }
    

    The next step is to go and write some code that converts your input data to some kind of IDataSet:

    function CSVToDataSet($file) {
        return new InMemoryDataSet(array_map('str_getcsv', file($file)));
    }
    

    Now you can trivially create an IDataSet from a CSV file, and you know that you can perform your queries on it because IDataSet was explicitly designed for that purpose. You 're almost there.

    The only thing missing is creating a reusable class that can perform your queries on an IDataSet. Here is one of them:

    class DataQuery {
        private $_dataSet;
    
        public function __construct(IDataSet $dataSet) {
            $this->_dataSet = $dataSet;
        }
    
        public static function getRowsWithDuplicates($columnIndex) {
            $values = array();
            for ($i = 0; $i < $this->_dataSet->getRowCount(); ++$i) {
                $values[$this->_dataSet->->getValueAt($i, $columnIndex)][] = $i;
            }
    
            return array_filter($values, function($row) { return count($row) > 1; });
        }
    }
    

    This code will return an array where the keys are values in your CSV data and the values are arrays with the zero-based indexes of the rows where each value appears. Since only duplicate values are returned, each array will have at least two elements.

    So at this point you are ready to go:

    $dataSet = CSVToDataSet("data.csv");
    $query = new DataQuery($dataSet);
    $dupes = $query->getRowsWithDuplicates(0);
    

    What you gain by doing this

    Clean, maintainable code that supports being modified in the future without requiring edits all over your application.

    If you want to add more query operations, add them to DataQuery and you can instantly use them on all concrete types of data sets. The data set and any other external code will not need any modifications.

    If you want to change the internal representation of the data, modify InMemoryDataSet accordingly or create another class that implements IDataSet and use that one instead from CSVToDataSet. The query class and any other external code will not need any modifications.

    If you need to change the definition of the data set (perhaps to allow more types of queries to be performed efficiently) then you have to modify IDataSet, which also brings all the concrete data set classes into the picture and probably DataQuery as well. While this won't be the end of the world, it's exactly the kind of thing you would want to avoid.

    And this is precisely the reason why I suggested to start from this: If you come up with a good definition for the data set, everything else will just fall into place.

    这篇关于PHP:通过CSV文件搜索OOP方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆