找到字符串数组的共同preFIX [英] finding common prefix of array of strings

查看:98
本文介绍了找到字符串数组的共同preFIX的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的数组

$sports = array(
'Softball - Counties',
'Softball - Eastern',
'Softball - North Harbour',
'Softball - South',
'Softball - Western'
);

和我想找到字符串的最长的共同部分 所以在这种情况下,这将是'垒球 - ';

and i would like to find the longest common part of the string so in this instance, it would be 'Softball - ';

我在想,我会按照这个过程

I am thinking that I would follow the this process

$i = 1;

// loop to the length of the first string
while ($i < strlen($sports[0]) {

  // grab the left most part up to i in length
  $match = substr($sports[0], 0, $i);

  // loop through all the values in array, and compare if they match
  foreach ($sports as $sport) {

     if ($match != substr($sport, 0, $i) {
         // didn't match, return the part that did match
         return substr($sport, 0, $i-1);
     }

  } // foreach

   // increase string length
   $i++;
} // while

// if you got to here, then all of them must be identical

问题

  1. 有没有这样做的内置函数或者更简单的方式?

  1. is there a built in function or much simpler way of doing this ?

我的5线阵这可能是好的,但如果我是做几千线阵列,也将是一个很大的开销,所以我就必须移动计算的$我我的起始值,例如:$ I =串的中途,如果失败,则$ I / 2,直到它的工作原理,然后增加$ i增加1,直到我们取得成功。因此,我们正在做的最少数量的比较得到的结果

for my 5 line array that is probably fine, but if i were to do several thousand line arrays, there would be a lot of overhead, so i would have to be move calculated with my starting values of $i, eg $i = halfway of string, if it fails, then $i/2 until it works, then increment $i by 1 until we succeed. so that we are doing the least number of comparisons to get a result

有一个公式/算法出已经在那里了这样那样的问题?

Is there a formula/algorithm out already out there for this kind of problem ?

感谢

亚历

推荐答案

我实现@diogoriba算法为code,有这样的结果:

I implemented @diogoriba algorithm into code, with this result:

  • 查找前两个字符串的共同preFIX,然后从第三个启动所有下列字符串比较这一点,并修剪常见字符串,如果没有共同的发现,胜在情况下,有更多的共同点在在prefixes比不同。
  • 但是bumperbox原来的算法(除了bug修正)赢得其中字符串有少常见于他们的preFIX比不同。 在code评论详情!

另一个想法,我实现:

首先检查最短的字符串数组中,并以此作比较,而不是简单的第一个字符串。 在code,这与自定义编写的函数arrayStrLenMin()来实现。

First check for the shortest string in the array, and use this for comparison rather than simply the first string. In the code, this is implemented with the custom written function arrayStrLenMin().

  • 可以降低迭代显着,但功能arrayStrLenMin()本身可能导致(或多或少)的迭代。
  • 只要在数组第一个字符串的长度开始显得相当笨拙,但可能会变成有效的,如果arrayStrLenMin()需要多次重复。

code +广泛的测试+备注:

function arrayStrLenMin ($arr, $strictMode = false, $forLoop = false) {
    $errArrZeroLength = -1;	// Return value for error: Array is empty
    $errOtherType = -2;		// Return value for error: Found other type (than string in array)
    $errStrNone = -3;		// Return value for error: No strings found (in array)

    $arrLength = count($arr);
    if ($arrLength <= 0 ) { return $errArrZeroLength; }
    $cur = 0;

    foreach ($arr as $key => $val) {
    	if (is_string($val)) {
    		$min = strlen($val);
    		$strFirstFound = $key;
    		// echo("Key\tLength / Notification / Error\n");
    		// echo("$key\tFound first string member at key with length: $min!\n");
    		break;
    	}
    	else if ($strictMode) { return $errOtherType; } // At least 1 type other than string was found.
    }
    if (! isset($min)) { return $errStrNone; } // No string was found in array.

    // SpeedRatio of foreach/for is approximately 2/1 as dicussed at:
    // http://juliusbeckmann.de/blog/php-foreach-vs-while-vs-for-the-loop-battle.html

    // If $strFirstFound is found within the first 1/SpeedRatio (=0.5) of the array, "foreach" is faster!

    if (! $forLoop) {
    	foreach ($arr as $key => $val) {
    		if (is_string($val)) {
    			$cur = strlen($val);
    			// echo("$key\t$cur\n");
    			if ($cur == 0) { return $cur; } // 0 is the shortest possible string, so we can abort here.
    			if ($cur < $min) { $min = $cur; }
    		}
    	// else { echo("$key\tNo string!\n"); }
    	}
    }

    // If $strFirstFound is found after the first 1/SpeedRatio (=0.5) of the array, "for" is faster!

    else {
    	for ($i = $strFirstFound + 1; $i < $arrLength; $i++) {
    		if (is_string($arr[$i])) {
    			$cur = strlen($arr[$i]);
    			// echo("$i\t$cur\n");
    			if ($cur == 0) { return $cur; } // 0 is the shortest possible string, so we can abort here.
    			if ($cur < $min) { $min = $cur; }
    		}
    		// else { echo("$i\tNo string!\n"); }
    	}
    }

    return $min;
}

function strCommonPrefixByStr($arr, $strFindShortestFirst = false) {
    $arrLength = count($arr);
    if ($arrLength < 2) { return false; }

    // Determine loop length
    /// Find shortest string in array: Can bring down iterations dramatically, but the function arrayStrLenMin() itself can cause ( more or less) iterations.
    if ($strFindShortestFirst) { $end = arrayStrLenMin($arr, true); }
    /// Simply start with length of first string in array: Seems quite clumsy, but may turn out effective, if arrayStrLenMin() needs many iterations.
    else { $end = strlen($arr[0]); }

    for ($i = 1; $i <= $end + 1; $i++) {
    	// Grab the part from 0 up to $i
    	$commonStrMax = substr($arr[0], 0, $i);
    	echo("Match: $i\t$commonStrMax\n");
    	// Loop through all the values in array, and compare if they match
    	foreach ($arr as $key => $str) {
    		echo("  Str: $key\t$str\n");
    		// Didn't match, return the part that did match
    		if ($commonStrMax != substr($str, 0, $i)) {
    				return substr($commonStrMax, 0, $i-1);
    		}
    	}
    }
    // Special case: No mismatch (hence no return) happened until loop end!
    return $commonStrMax; // Thus entire first common string is the common prefix!
}

function strCommonPrefixByChar($arr, $strFindShortestFirst = false) {
    $arrLength = count($arr);
    if ($arrLength < 2) { return false; }

    // Determine loop length
    /// Find shortest string in array: Can bring down iterations dramatically, but the function arrayStrLenMin() itself can cause ( more or less) iterations.
    if ($strFindShortestFirst) { $end = arrayStrLenMin($arr, true); }
    /// Simply start with length of first string in array: Seems quite clumsy, but may turn out effective, if arrayStrLenMin() needs many iterations.
    else { $end = strlen($arr[0]); }

    for ($i = 0 ; $i <= $end + 1; $i++) {
    	// Grab char $i
    	$char = substr($arr[0], $i, 1);
    	echo("Match: $i\t"); echo(str_pad($char, $i+1, " ", STR_PAD_LEFT)); echo("\n");
    	// Loop through all the values in array, and compare if they match
    	foreach ($arr as $key => $str) {
    		echo("  Str: $key\t$str\n");
    		// Didn't match, return the part that did match
    		if ($char != $str[$i]) { // Same functionality as ($char != substr($str, $i, 1)). Same efficiency?
    				return substr($arr[0], 0, $i);
    		}
    	}
    }
    // Special case: No mismatch (hence no return) happened until loop end!
    return substr($arr[0], 0, $end); // Thus entire first common string is the common prefix!
}


function strCommonPrefixByNeighbour($arr) {
    $arrLength = count($arr);
    if ($arrLength < 2) { return false; }

    /// Get the common string prefix of the first 2 strings
    $strCommonMax = strCommonPrefixByChar(array($arr[0], $arr[1]));
    if ($strCommonMax === false) { return false; }
    if ($strCommonMax == "") { return ""; }
    $strCommonMaxLength = strlen($strCommonMax);

    /// Now start looping from the 3rd string
    echo("-----\n");
    for ($i = 2; ($i < $arrLength) && ($strCommonMaxLength >= 1); $i++ ) {
    	echo("  STR: $i\t{$arr[$i]}\n");

    	/// Compare the maximum common string with the next neighbour

    	/*
    	//// Compare by char: Method unsuitable!

    	// Iterate from string end to string beginning
    	for ($ii = $strCommonMaxLength - 1; $ii >= 0; $ii--) {
    		echo("Match: $ii\t"); echo(str_pad($arr[$i][$ii], $ii+1, " ", STR_PAD_LEFT)); echo("\n");
    		// If you find the first mismatch from the end, break.
    		if ($arr[$i][$ii] != $strCommonMax[$ii]) {
    			$strCommonMaxLength = $ii - 1; break;
    			// BUT!!! We may falsely assume that the string from the first mismatch until the begining match! This new string neighbour string is completely "unexplored land", there might be differing chars closer to the beginning. This method is not suitable. Better use string comparison than char comparison.
    		}
    	}
    	*/

    	//// Compare by string

    	for ($ii = $strCommonMaxLength; $ii > 0; $ii--) {
    		echo("MATCH: $ii\t$strCommonMax\n");
    		if (substr($arr[$i],0,$ii) == $strCommonMax) {
    			break;
    		}
    		else {
    			$strCommonMax = substr($strCommonMax,0,$ii - 1);
    			$strCommonMaxLength--;
    		}
    	}
    }
    return substr($arr[0], 0, $strCommonMaxLength);
}





// Tests for finding the common prefix

/// Scenarios

$filesLeastInCommon = array (
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/2",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/2",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/c/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
);

$filesLessInCommon = array (
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/2",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/2",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/c/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
);

$filesMoreInCommon = array (
"/Voluuuuuuuuuuuuuumes/1/a/a/1",
"/Voluuuuuuuuuuuuuumes/1/a/a/2",
"/Voluuuuuuuuuuuuuumes/1/a/b/1",
"/Voluuuuuuuuuuuuuumes/1/a/b/2",
"/Voluuuuuuuuuuuuuumes/2/a/b/c/1",
"/Voluuuuuuuuuuuuuumes/2/a/a/1",
);

$sameDir = array (
"/Volumes/1/a/a/",
"/Volumes/1/a/a/aaaaa/2",
);

$sameFile = array (
"/Volumes/1/a/a/1",
"/Volumes/1/a/a/1",
);

$noCommonPrefix = array (
"/Volumes/1/a/a/",
"/Volumes/1/a/a/aaaaa/2",
"Net/1/a/a/aaaaa/2",
);

$longestLast = array (
"/Volumes/1/a/a/1",
"/Volumes/1/a/a/aaaaa/2",
);

$longestFirst = array (
"/Volumes/1/a/a/aaaaa/1",
"/Volumes/1/a/a/2",
);

$one = array ("/Volumes/1/a/a/aaaaa/1");

$empty = array ( );


// Test Results for finding  the common prefix

/*

I tested my functions in many possible scenarios.
The results, the common prefixes, were always correct in all scenarios!
Just try a function call with your individual array!

Considering iteration efficiency, I also performed tests:

I put echo functions into the functions where iterations occur, and measured the number of CLI line output via:
php <script with strCommonPrefixByStr or strCommonPrefixByChar> | egrep "^  Str:" | wc -l   GIVES TOTAL ITERATION SUM.
php <Script with strCommonPrefixByNeighbour> | egrep "^  Str:" | wc -l   PLUS   | egrep "^MATCH:" | wc -l   GIVES TOTAL ITERATION SUM.

My hypothesis was proven:
strCommonPrefixByChar wins in situations where the strings have less in common in their beginning (=prefix).
strCommonPrefixByNeighbour wins where there is more in common in the prefixes.

*/

// Test Results Table
// Used Functions | Iteration amount | Remarks

// $result = (strCommonPrefixByStr($filesLessInCommon)); // 35
// $result = (strCommonPrefixByChar($filesLessInCommon)); // 35 // Same amount of iterations, but much fewer characters compared because ByChar instead of ByString!
// $result = (strCommonPrefixByNeighbour($filesLessInCommon)); // 88 + 42 = 130 // Loses in this category!

// $result = (strCommonPrefixByStr($filesMoreInCommon)); // 137
// $result = (strCommonPrefixByChar($filesMoreInCommon)); // 137 // Same amount of iterations, but much fewer characters compared because ByChar instead of ByString!
// $result = (strCommonPrefixByNeighbour($filesLeastInCommon)); // 12 + 4 = 16 // Far the winner in this category!

echo("Common prefix of all members:\n");
var_dump($result);





// Tests for finding the shortest string in array

/// Arrays

// $empty = array ();
// $noStrings = array (0,1,2,3.0001,4,false,true,77);
// $stringsOnly = array ("one","two","three","four");
// $mixed = array (0,1,2,3.0001,"four",false,true,"seven", 8888);

/// Scenarios

// I list them from fewest to most iterations, which is not necessarily equivalent to slowest to fastest!
// For speed consider the remarks in the code considering the Speed ratio of foreach/for!

//// Fewest iterations (immediate abort on "Found other type", use "for" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, true, true) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: Found other type!

    NEW ANALYSIS:
    Key	Length / Notification / Error
    0	Found first string member at key with length: 3!
    1	3
    2	5
    3	4
    Result: 3

    NEW ANALYSIS:
    Result: Found other type!

*/

//// Fewer iterations (immediate abort on "Found other type", use "foreach" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, true, false) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: Found other type!

    NEW ANALYSIS:
    Key	Length / Notification / Error
    0	Found first string member at key with length: 3!
    0	3
    1	3
    2	5
    3	4
    Result: 3

    NEW ANALYSIS:
    Result: Found other type!

*/

//// More iterations (No immediate abort on "Found other type", use "for" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, false, true) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: No strings found!

    NEW ANALYSIS:
    Key	Length / Notification / Error
    0	Found first string member at key with length: 3!
    1	3
    2	5
    3	4
    Result: 3

    NEW ANALYSIS:
    Key	Length / Notification / Error
    4	Found first string member at key with length: 4!
    5	No string!
    6	No string!
    7	5
    8	No string!
    Result: 4

*/


//// Most iterations (No immediate abort on "Found other type", use "foreach" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, false, false) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: No strings found!

    NEW ANALYSIS:
    Key	Length / Notification / Error
    0	Found first string member at key with length: 3!
    0	3
    1	3
    2	5
    3	4
    Result: 3

    NEW ANALYSIS:
    Key	Length / Notification / Error
    4	Found first string member at key with length: 4!
    0	No string!
    1	No string!
    2	No string!
    3	No string!
    4	4
    5	No string!
    6	No string!
    7	5
    8	No string!
    Result: 4

*/

这篇关于找到字符串数组的共同preFIX的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆