分割大量文本数据的算法 [英] Algorithm for splitting large amount of text data

查看:110
本文介绍了分割大量文本数据的算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个动态填充的文本区域(具体来说,我在Qt中有一个QPlainTextEdit ,但这对算法建议并不重要)。

I have a text area which i populate dynamically(to be specific I have a QPlainTextEdit in Qt, but its not important for algorithm suggestion).

现在的问题是有时有时会出现大量数据,而随着应用程序中越来越多的数据进入,由于所有文本数据都位于主内存中。

Now problem is sometimes Large amounts of data comes and as more data comes in my application becomes heavy,since all the text data is in main memory.

所以我想到了以下内容。我们可以使用一个文件来存储所有文本数据,并动态地仅显示有限数量的数据,但是与此同时,我必须通过创建在新行时触发的滚动事件来使用户幻想数据大小是文件的大小

So I thought of the following. We can use a file for storing all the text data and display only limited amount of data dynamically, but at the same time I have to illusion the user that the data size is that of the file, by creating scroll events that trigger when new lines comes.

有没有解决此类问题的标准算法?

Is there any standard algorithm for such problem?

推荐答案

子类 QAbstractListModel 在此处实现缓存。
读取单元格值时,您正在从缓存中获取数据,如果缓存中不存在值,则对其进行更新。

Subclass QAbstractListModel implement cache there. When cell value is read you are fetching data from cache and update it if value is not present in cache.

调整 QTableView ,通过更改委托以实现所需的单元格可视化。请注意,您必须使用 QTableView ,因为其他 QAbstractItemView s回收了破碎的物品,并且不能很好地处理非常大的模型( QTableView 没有这样的问题。)

Tweak QTableView, by altering delegate to achieve needed visualization of cells. Note you have to use QTableView since other QAbstractItemViews have broken items recycling and they don't handle very large models well (QTableView doesn't have such issue).

有些时候,我已经写了大文件的十六进制查看器并测试了文件大小为2GB的文件,并且运行良好。

Some time ego I've wrote hex viewer of large files and tested that with file size 2GB and it was working perfectly.

好,我发现了我的旧代码,这可能是一个很好的例子:

Ok, I found my old code which could be a good example:

#include <QAbstractTableModel>

class LargeFileCache;

class LageFileDataModel : public QAbstractTableModel
{
    Q_OBJECT
public:
    explicit LageFileDataModel(QObject *parent);

    // QAbstractTableModel
    int rowCount(const QModelIndex &parent) const;
    int columnCount(const QModelIndex &parent) const;
    QVariant data(const QModelIndex &index, int role) const;

signals:

public slots:
    void setFileName(const QString &fileName);

private:
    LargeFileCache *cachedData;
};

// ----- cpp file -----
#include "lagefiledatamodel.h"
#include "largefilecache.h"
#include <QSize>

static const int kBytesPerRow = 16;

LageFileDataModel::LageFileDataModel(QObject *parent)
    : QAbstractTableModel(parent)
{
    cachedData = new LargeFileCache(this);
}

int LageFileDataModel::rowCount(const QModelIndex &parent) const
{
    if (parent.isValid())
        return 0;
    return (cachedData->FileSize() + kBytesPerRow - 1)/kBytesPerRow;
}

int LageFileDataModel::columnCount(const QModelIndex &parent) const
{
    if (parent.isValid())
        return 0;
    return kBytesPerRow;
}

QVariant LageFileDataModel::data(const QModelIndex &index, int role) const
{
    if (index.parent().isValid())
        return QVariant();
    if (index.isValid()) {
        if (role == Qt::DisplayRole) {
            qint64 pos = index.row()*kBytesPerRow + index.column();
            if (pos>=cachedData->FileSize())
                return QString();
            return QString("%1").arg((unsigned char)cachedData->geByte(pos), 2, 0x10, QChar('0'));
        } else if (role == Qt::SizeHintRole) {
            return QSize(30, 30);
        }
    }

    return QVariant();
}

void LageFileDataModel::setFileName(const QString &fileName)
{
    beginResetModel();
    cachedData->SetFileName(fileName);
    endResetModel();
}

这是一个缓存实现:

class LargeFileCache : public QObject
{
    Q_OBJECT
public:
    explicit LargeFileCache(QObject *parent = 0);

    char geByte(qint64 pos);
    qint64 FileSize() const;

signals:

public slots:
    void SetFileName(const QString& filename);

private:
    static const int kPageSize;

    struct Page {
        qint64 offset;
        QByteArray data;
    };

private:
    int maxPageCount;
    qint64 fileSize;

    QFile file;
    QQueue<Page> pages;
};

// ----- cpp file -----
#include "largefilecache.h"

const int LargeFileCache::kPageSize = 1024*4;

LargeFileCache::LargeFileCache(QObject *parent)
    : QObject(parent)
    , maxPageCount(1024)
    , fileSize(0)
{

}

char LargeFileCache::geByte(qint64 pos)
{
    // largefilecache
    if (pos>=fileSize)
        return 0;

    for (int i=0, n=pages.size(); i<n; ++i) {
        int k = pos - pages.at(i).offset;
        if (k>=0 && k< pages.at(i).data.size()) {
            pages.enqueue(pages.takeAt(i));
            return pages.back().data.at(k);
        }
    }

    Page newPage;
    newPage.offset = (pos/kPageSize)*kPageSize;
    file.seek(newPage.offset);
    newPage.data = file.read(kPageSize);
    pages.push_front(newPage);

    while (pages.count()>maxPageCount)
        pages.dequeue();

    return newPage.data.at(pos - newPage.offset);
}

qint64 LargeFileCache::FileSize() const
{
    return fileSize;
}

void LargeFileCache::SetFileName(const QString &filename)
{
    file.close();
    pages.clear();
    file.setFileName(filename);
    file.open(QFile::ReadOnly);
    fileSize = file.size();
}

我是在处理行数据时手动编写缓存的,但是您可以使用 QCache ,它可以帮助您进行缓存。

I wrote cache manually since I was handling a row data, but you can use QCache which should help you do a caching logic.

这篇关于分割大量文本数据的算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆