价格筛选分组算法 [英] Price Filter Grouping Algorithm
问题描述
我创建一个电子商务网站,我无法制定一个好的算法,这是从数据库中提取到中途适当的用户组产品进行排序。我试图简单地划分的最高价为4,和基础每组掉了。我也试过围绕均值的标准差。这两种可能会导致与价格范围的所有产品都不会落入,这是不是一个有用的过滤选项。
我也试过的产品取四分,但我的问题是,价格范围从$ 1项至$ 4000元。在$ 4,000个几乎从来没有卖,都远远不那么重要,但他们保持倾斜我的结果。
有什么想法?我应该在统计类的重视......
更新:
我结束了相结合的方法一点。我用了四分/桶的方法,而是通过硬编码在其中更多的价格组会出现一定范围内砍死它一下。
//价格区间算法
排序($价格);
//除以价格的数量分为四组
$ quartilelength =计数($价格)/ 4;
//整到最近的...
$控制简化= 10;
//获取价格总范围
$范围= MAX($价格)-min($价格);
//假设我们确实正在与多个价格
如果($范围大于0)
{
//如果在价格上体面的小号$ P $垫,并有价格体面的数量,给予更多的价格组
如果($范围→20&安培;&安培;计数($价格)→10)
{
$ priceranges [0] =地板($价格[地板($ quartilelength)] / $控制简化)* $控制简化;
}
//总是抢中间价
$ priceranges [1] =地板($价格[地板($ quartilelength * 2)] / $控制简化)* $控制简化;
//如果在价格上体面的小号$ P $垫,并有价格体面的数量,给予更多的价格组
如果($范围> 20安培;&安培;计数($这个 - >数据 - >价格)大于10)
{
$ priceranges [2] =地板($价格[地板($ quartilelength * 3)] / $控制简化)* $控制简化;
}
}
下面是一个想法:基本上你会排序的价格为10桶,每片价格为数组中的键,数值是多少的计数产品是在给定的价格点:
公共职能priceBuckets($价格)
{
排序($价格);
$桶=阵列(阵列());
$ A = 0;
$ C =计数($价格);
为($ i = 0;!$ I == $ C ++ $ I){
如果(计数($桶[$一])=== 10){
++美元;
$水桶[$一] =阵列();
}
如果(使用isset($桶[$一] [$价格[$ i])){
++ $水桶[$一] [$价格[$ i];
}否则,如果(使用isset($桶[$ A - 1] [$价格[$ i])){
++ $水桶[$ A - 1] [$价格[$ i];
} 其他 {
$水桶[$一] [$价格[$ i] = 1;
}
}
返回$桶;
}
//测试code
$价格=阵列();
为($ i = 0;!$ I = = 50; ++ $ I){
$价格[] =兰特(1,100);
}
后续代码var_dump(priceBuckets($价格));
从结果中,可以使用重置和最终获得的最小/最大每个桶的
有点儿蛮力,但可能是有用的...
I am creating an ecommerce site, and I am having trouble developing a good algorithm to sort a products that are pulled from the database into halfway appropriate groups. I have tried simply dividing the highest price into 4, and basing each group off that. I also tried standard deviations based around the mean. Both could result with price ranges that no product would fall into, which isn't a useful filtering option.
I also tried take quartiles of the products, but my problem is that the price ranges from $1 items to $4,000. The $4,000 almost never sell, and are far less important, but they keep skewing my results.
Any thoughts? I should have paid more attention in stats class ...
Update:
I ended up combining methods a bit. I used the quartile/bucket method, but hacked it a bit by hardcoding certain ranges within which a greater number of price groups would appear.
//Price range algorithm
sort($prices);
//Divide the number of prices into four groups
$quartilelength = count($prices)/4;
//Round to the nearest ...
$simplifier = 10;
//Get the total range of the prices
$range = max($prices)-min($prices);
//Assuming we actually are working with multiple prices
if ($range>0 )
{
// If there is a decent spread in price, and there are a decent number of prices, give more price groups
if ($range>20 && count($prices) > 10)
{
$priceranges[0] = floor($prices[floor($quartilelength)]/$simplifier)*$simplifier;
}
// Always grab the median price
$priceranges[1] = floor($prices[floor($quartilelength*2)]/$simplifier)*$simplifier;
// If there is a decent spread in price, and there are a decent number of prices, give more price groups
if ($range>20 && count($this->data->prices) > 10)
{
$priceranges[2] = floor($prices[floor($quartilelength*3)]/$simplifier)*$simplifier;
}
}
Here is an idea: basically you would sort the price into buckets of 10, each price as the key in the array, the value is a count of how many products are at the given price point:
public function priceBuckets($prices)
{
sort($prices);
$buckets = array(array());
$a = 0;
$c = count($prices);
for($i = 0; $i !== $c; ++$i) {
if(count($buckets[$a]) === 10) {
++$a;
$buckets[$a] = array();
}
if(isset($buckets[$a][$prices[$i]])) {
++$buckets[$a][$prices[$i]];
} else if(isset($buckets[$a - 1][$prices[$i]])) {
++$buckets[$a - 1][$prices[$i]];
} else {
$buckets[$a][$prices[$i]] = 1;
}
}
return $buckets;
}
//TEST CODE
$prices = array();
for($i = 0; $i !== 50; ++$i) {
$prices[] = rand(1, 100);
}
var_dump(priceBuckets($prices));
From the result, you can use reset and end to get the min/max of each bucket
Kinda brute force, but might be useful...
这篇关于价格筛选分组算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!