将大型数据集拆分为更小的部分以供 Web 查看 [英] Splitting up large data sets into smaller pieces for web viewing

查看：62 发布时间：2021/6/14 19:47:45 php mysql partitioning

本文介绍了将大型数据集拆分为更小的部分以供 Web 查看的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

背景

我正在为 Web 应用程序开发 Web 平台.该平台的关键功能之一是用户管理.因此，我实现了一个登录系统，使注册用户能够登录平台和应用程序.这是使用 MySQL 数据库实现的，其中包含登录凭据、用户 ID、联系信息等...

I am developing a web platform for web applications. One of the key functions of this platform is user management. Therefore I have implemented a login system that enables registered users to log into the platform and the application. This is implemented using a MySQL database which contains login credentials, userids, contact information, etc...

问题

我遇到的问题是用户表.这张桌子可能有很大的潜力.因此，如果管理员想要编辑特定用户的信息，根据用户在列表中的位置，管理员将不得不滚动可能的数千条记录以找到正确的记录.我正在寻找一种算法，它将把它分成大小为 N 的组，并将这些组显示为管理员可以点击的范围:#0-B、CF、GL、MR 等....

The issue that I am running in to is with the users table. This table can have the potential to be quite large. So if an admin wants to edit the information for a specific user, depending on where the user is in the list, the admin will have to scroll through potentially thousands of records to find the correct one. I am looking for is an algorithm that will split this up into groups of about size N and display those groups as ranges that the admin can click on: #0-B, C-F, G-L, M-R, etc....

研究

这是我迄今为止进行的研究:

This is the research that I have conducted so far:

数据库分区 - 用 PHP 还是 MySQL 更好?

1 个非常大的表或3 大桌子?MySQL 性能

关于分区，我在网上找到了很多关于对数据库表进行分区的文章，但都没有解决我想要做的事情.

With partitioning, there are many articles I found on the web to partition a database table, but nothing addresses what I'm trying to do.

到目前为止我所做的

我编写了一个从 Web 服务器离线运行的程序，该程序返回每个桶 0-9 和 A-Z 的计数.对于小于阈值的计数，它将桶组合成一个范围并将其保存到数据库中.因此，如果桶 0-9 和 A、B 的总计数小于阈值，则范围变为 #0-B.如果桶的大小大于阈值，我在试图弄清楚如何从桶中获取范围时遇到问题.

I have written a program that runs offline from the web server that returns the counts for each bucket 0-9 and A-Z. For counts less than the threshold, it combines the buckets into a range and saves that to the database. So if total count of buckets 0-9 and A, B are less than the threshold, then the range becomes #0-B. I'm having problems trying to figure out how to get a range from a bucket if the size of the bucket is greater than the threshold.

我考虑了以下解决方案:

I have considered the following solutions:

一个递归函数，不断深入用户名，直到它在阈值范围内.如果我可以让它工作，这将是理想的.
由于它是离线运行的，我喜欢的另一个解决方案是将整个存储桶加载到内存中并从那里拆分.这样做的缺点是潜在的内存使用量.
另一个让我感到有趣的想法是 #2 的变体，即将存储桶的一部分加载到内存中并对其进行处理.这将或多或少地将内存使用量减少到固定数量，但需要更多时间来处理.

编辑 5-11/2020:

EDIT 5-11/2020:

根据一个答案和评论，我可以有一个搜索字段，当条目数缩小到低于阈值时，该字段将使用 JSON 填充列表(这是个好主意).但是，我确实有代码可以与您分享:

Based on the one answer and the comments, I could have a search field that would use JSON to populate the list when the number of entries is narrowed down to something below the threshold value (That is a good idea). But, I do have code which I will share with you:

此代码有效，但对于测试数据，它在分区表中创建了约 17,500 个条目，并且有相当多的条目为零计数.

This code works, but with the test data, it creates ~17,500 entries in the partition table, and there's quite a few entries that have zero counts.

function part_recurse2($prefix)
{
    global $CONFIGVAR;
    global $charList;
    global $charCount;

    $list = dbCountScan($prefix);
    for ($i = 0; $i < count($list); $i++)
    {
        $char = substr($charList, $i, 1);
        if ($list[$i] >= $CONFIGVAR['user_partition_max_size']['value'])
        {
            part_recurse2($prefix . $char);
        }
        else
        {
            writeRange($prefix . $char, $prefix . $char, $list[$i],
                substr($prefix . $char, 0, 1));
        }
    }
}

以及来自数据库的结果...

And the results from the database...

mysql> select sum(`count`) from userpart;
+--------------+
| sum(`count`) |
+--------------+
|       100004 |
+--------------+
1 row in set (0.01 sec)

mysql> select count(*) from userpart where count = 0;
+----------+
| count(*) |
+----------+
|     1139 |
+----------+
1 row in set (0.01 sec)

此代码部分有效，但计数不相加，并且其中也有零.正确的用户计数是 100,004，但下面的函数产生的总计数为 105,234，这比用户表中的实际条目多 5,230.这个版本的好处是它会合并范围，直到达到阈值.这就是我想要的工作.

This code partially works, but the counts do not add up, and there are zeros in it as well. The correct user count is 100,004 but the below function produces a total count that is 105,234 which is 5,230 more entries in the users table than there actually is. The nice thing about this version is that it will combine ranges until the threshold is met. This is what I would like to get working.

function part_recurse($prefix)
{
    global $charCount;
    global $charList;
    global $CONFIGVAR;

    $list = dbCountScan($prefix);
    $acc = 0;
    $start = substr($charList, 0, 1);
    for ($i = 0; $i < count($list); $i++)
    {
        if ($list[$i] == 0) continue;
        if ($list[$i] > $CONFIGVAR['user_partition_max_size']['value'])
        {
            // If the number of entries > threshold...
            if ($i == 0)
            {
                // Only for the first bucket.
                $start = substr($charList, 1, 1);
            }
            else
            {
                // All other entries.
                if ($i >= $charCount - 1)
                {
                    // Last entry
                    $end = substr($charList, $i - 1, 1);
                    $acc += $list[$i];
                    writeRange($prefix . $start, $prefix . $end, $acc,
                        substr($prefix . substr($charList, $i, 1), 0, 1));
                    $acc = 0;
                }
                else
                {
                    $end = substr($charList, $i - 1, 1);
                    writeRange($prefix . $start, $prefix . $end, $acc,
                        substr($prefix . substr($charList, $i + 1, 1), 0, 1));
                    $acc = 0;
                    $start = substr($charList, $i, 1);
                }
            }
            part_recurse($prefix . substr($charList, $i, 1));
        }
        else
        {
            if (($acc + $list[$i]) >= $CONFIGVAR['user_partition_max_size']['value'])
            {
                $end = substr($charList, $i - 1, 1);
                writeRange($prefix . $start, $prefix . $end, $acc,
                    substr($prefix . substr($charList, $i, 1), 0, 1));
                $start = substr($charList, $i, 1);
                $acc = $list[$i];
            }
            else
            {
                $acc += $list[$i];
            }
        }
    }

    // Write the final entry.
    if ($acc > 0)
    {
        $end = substr($charList, $charCount - 1, 1);
        $bucket = substr($prefix . substr($charList, $i, 1), 0, 1);
        writeRange($prefix . $start, $prefix . $end, $acc, $bucket);
    }
}

以及它的数据库结果:

mysql> select sum(`count`) from userpart;
+--------------+
| sum(`count`) |
+--------------+
|       105234 |
+--------------+
1 row in set (0.00 sec)
mysql> select count(*) from userpart where count = 0;
+----------+
| count(*) |
+----------+
|      316 |
+----------+
1 row in set (0.00 sec)

正确的条目数是 100,004，没有零计数.我会不断完善代码，但如果有人发现我做错了什么，请赐教.书外函数具有以下属性:

The correct number of entries is 100,004 with no zero counts. I will keep refining the code, but if someone sees something that I am doing wrong, please enlighten me. The off book functions have the following properties:

dbCountScan($prefix): 此函数使用 SELECT COUNT(*) FROM USERS WHERE username 像 '?%';在哪里 ?是 $prefix 和 for 循环中当前字母的串联.

dbCountScan($prefix): This function steps through the characters 0-9 and A-Z using SELECT COUNT(*) FROM USERS WHERE username like '?%'; where ? is a concatenation of $prefix and the current letter in the for loop.

WriteRange($start, $end, $count, $bucket): 此函数将范围写入数据库.$start 和 $end 是范围的开始和结束.$count 是范围内的条目数.而 $bucket 是这个范围所属的顶级bucket(0-9，A-Z).

WriteRange($start, $end, $count, $bucket): This function writes the range to the database. $start and $end is the start and end of the range. $count is the number of entries in the range. And $bucket is the top level bucket (0-9, A-Z) that this range belongs to.

这里是数据库表用户部分:

And here's the database table userpart:

CREATE TABLE `userpart` (
  `rangest` varchar(16) NOT NULL COMMENT 'Database query starting range.',
  `rangeed` varchar(16) NOT NULL COMMENT 'Database query ending range.',
  `count` int unsigned NOT NULL COMMENT 'Count in range.',
  `bucket` varchar(5) DEFAULT NULL COMMENT 'Primary bucket.'
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Table that is used to partition usernames.'

将大型数据集拆分为更小的部分以供 Web 查看 [英] Splitting up large data sets into smaller pieces for web viewing

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

将大型数据集拆分为更小的部分以供 Web 查看 [英] Splitting up large data sets into smaller pieces for web viewing

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭