替换字符串中的变量 [英] Replacing variables in a string

查看:139
本文介绍了替换字符串中的变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用PHP的多语言网站,在我的语言文件中,我经常有包含多个变量的字符串,这些变量稍后将被填充以完成句子.

当前,我将{VAR_NAME}放入字符串中,并在使用时手动将每个匹配项替换为其匹配值.

所以基本上:

{X} created a thread on {Y}

成为:

Dany created a thread on Stack Overflow

我已经想到了sprintf,但是我发现它很不方便,因为它取决于变量的顺序,这些变量可以从一种语言更改为另一种语言.

而且我已经检查过如何用值是否在php中?现在我基本上使用此方法.

但是我很想知道PHP中是否有内置的(或可能不是)便捷的方法来完成此操作,考虑到我已经在上一个示例中将变量命名为X和Y,更像是$$可变变量.

因此,除了对字符串执行str_replace之外,我可能会调用类似这样的函数:

$X = 'Dany';
$Y = 'Stack Overflow';
$lang['example'] = '{X} created a thread on {Y}';

echo parse($lang['example']);

还将打印出:

Dany created a thread on Stack Overflow

谢谢!

修改

这些字符串用作模板,可以在不同的输入下多次使用.

所以基本上做"{$X} ... {$Y}"不会成功,因为我会丢失模板,并且字符串将使用尚未确定的$X$Y起始值进行初始化.

解决方案

我将在此处添加答案,因为在我看来,当前的答案都没有真正切入芥末味.我将直接学习并向您显示执行此操作所需的代码:

 function parse(
    /* string */ $subject,
    array        $variables,
    /* string */ $escapeChar = '@',
    /* string */ $errPlaceholder = null
) {
    $esc = preg_quote($escapeChar);
    $expr = "/
        $esc$esc(?=$esc*+{)
      | $esc{
      | {(\w+)}
    /x";

    $callback = function($match) use($variables, $escapeChar, $errPlaceholder) {
        switch ($match[0]) {
            case $escapeChar . $escapeChar:
                return $escapeChar;

            case $escapeChar . '{':
                return '{';

            default:
                if (isset($variables[$match[1]])) {
                    return $variables[$match[1]];
                }

                return isset($errPlaceholder) ? $errPlaceholder : $match[0];
        }
    };

    return preg_replace_callback($expr, $callback, $subject);
}
 

这是做什么用的?

简而言之:

  • 使用指定的转义符创建正则表达式,该正则表达式将匹配三个序列之一(在下面进行更多说明)
  • 将其输入到 preg_replace_callback() 中,其中回调函数将正确处理其中的两个序列并将其处理一切作为替换操作.
  • 返回结果字符串

正则表达式

正则表达式匹配以下三个序列中的任何一个:

  • 两次出现转义符,然后零次或多次出现转义符,然后是一个大括号.仅使用前两个出现的转义字符.替换为一次出现的转义字符.
  • 一次出现转义字符,后跟一个大括号.取而代之的是原义的花括号.
  • 一个大括号,后跟一个或多个perl单词字符(字母数字和下划线字符),后跟一个大括号.将其视为占位符,并在$variables数组中的括号之间执行名称查找,如果找到,则返回替换值,如果未找到,则返回$errPlaceholder的值-默认情况下,该值为null,这是一种特殊情况,并返回原始占位符(即未修改字符串).

为什么更好?

要了解为什么更好,让我们看看其他答案所采用的替代方法.出现一个例外(唯一的失败是与PHP兼容; 5.4和不太明显的行为),分为两类:

  • strtr() -这没有提供处理转义符的机制.如果您的输入字符串中需要文字{X}怎么办? strtr()不能解决这个问题,它将代替值$X.
  • str_replace() -与strtr()一样,也有另一个问题.当您使用搜索/替换参数的数组参数调用str_replace()时,它的行为就像您多次调用它一样-每个替换对数组一次.这意味着,如果您的替换字符串之一包含一个在搜索数组中稍后出现的值,那么您最终也将替换该值.

要通过str_replace()演示此问题,请考虑以下代码:

 $pairs = array('A' => 'B', 'B' => 'C');
echo str_replace(array_keys($pairs), array_values($pairs), 'AB');
 

现在,您可能希望此处的输出为BC,但实际上将为CC( demo )-这是因为第一次迭代将A替换为B,而在第二次迭代中,主题字符串为BB-因此,这两个出现的B都被替换为C. /p>

此问题还背叛了性能考虑因素,该考虑因素可能不会立即显现-因为每对分别处理,所以操作为O(n),对于每个替换对,将搜索整个字符串并进行单个替换操作.如果您有一个很大的主题字符串和很多替换对,那么在引擎盖下进行的是一个相当大的操作.

可以说,这种性能考量不是问题-您需要一个非常大字符串和一个 lot 替换对,然后才能有意义地降低速度,但是仍然值得记住.还值得记住的是,正则表达式本身会对性能造成不利影响,因此总体上不应将此考虑因素纳入决策过程.

相反,我们使用preg_replace_callback().这将访问字符串的任何给定部分,以在提供的正则表达式的范围内恰好匹配一次.我添加此限定符是因为,如果您编写的表达式导致灾难性回溯,那将是相当大的不止一次,但是在这种情况下应该不成问题(为避免这种情况,我在表达式拥有).

我们使用preg_replace_callback()而不是preg_replace()来允许我们在查找替换字符串时应用自定义逻辑.

这允许您做什么

问题的原始示例

$X = 'Dany';
$Y = 'Stack Overflow';
$lang['example'] = '{X} created a thread on {Y}';

echo parse($lang['example']);

它变成:

$pairs = array(
    'X' = 'Dany',
    'Y' = 'Stack Overflow',
);

$lang['example'] = '{X} created a thread on {Y}';

echo parse($lang['example'], $pairs);
// Dany created a thread on Stack Overflow

更高级的

现在让我们说:

$lang['example'] = '{X} created a thread on {Y} and it contained {X}';
// Dany created a thread on Stack Overflow and it contained Dany

...,我们希望第二个{X}在结果字符串中出现 literally .使用默认的转义字符@,我们将其更改为:

$lang['example'] = '{X} created a thread on {Y} and it contained @{X}';
// Dany created a thread on Stack Overflow and it contained {X}

好的,到目前为止看起来还不错.但是,如果该@应该是文字,该怎么办?

$lang['example'] = '{X} created a thread on {Y} and it contained @@{X}';
// Dany created a thread on Stack Overflow and it contained @Dany

请注意,将正则表达式设计为仅注意直接在前的花括号的转义序列.这意味着您无需转义转义字符,除非该字符立即出现在占位符的前面.

关于将数组用作参数的说明

您的原始代码示例使用的变量与字符串中的占位符的命名方式相同.我的使用带有命名键的数组.这样做有两个很好的理由:

  1. 清晰度和安全性-更容易看到最终将要替换的内容,并且您不必冒险意外替换了不想公开的变量.如果有人可以简单地输入{dbPass}并查看您的数据库密码,那就好了吗?
  2. 范围-除非调用者是全局范围,否则无法从调用范围导入变量.如果从另一个函数调用该函数,则该函数将变得无用,并且从另一个范围导入数据是非常不好的做法.

如果您真的要使用当前作用域中的命名变量(由于上述安全问题,我建议这样做),则可以传递a的结果调用 get_defined_vars() 到第二个参数.

有关选择转义字符的说明

您会注意到我选择@作为默认的转义字符.您可以通过将任何字符(或字符序列,可以不止一个字符)传递给第三个参数来使用它-由于许多语言都使用\,因此您可能会想使用\,但是请耐心等待你这样做.

您不想使用\的原因是,因为许多语言都将其用作自己的转义字符,这意味着当您要在例如PHP字符串文字,您会遇到此问题:

$lang['example'] = '\\{X}';   // results in {X}
$lang['example'] = '\\\{X}';  // results in \Dany
$lang['example'] = '\\\\{X}'; // results in \Dany

这可能导致可读性的噩梦,以及一些复杂模式下的非显而易见的行为.选择一个其他任何相关语言都不会使用的转义字符(例如,如果您正在使用此技术生成HTML片段,则也不要使用&作为转义字符).

总结

您正在做的事情有极端情况.为了正确解决该问题,您需要使用一种能够处理这些极端情况的工具-而在涉及字符串操作时,最常用的工具是正则表达式.

I am working on a multilingual website in PHP and in my languages files i often have strings which contain multiple variables that will be later filled in to complete the sentences.

Currently i am placing {VAR_NAME} in the string and manually replacing each occurence with its matching value when used.

So basically :

{X} created a thread on {Y}

becomes :

Dany created a thread on Stack Overflow

I have already thought of sprintf but i find it inconvenient because it depends on the order of the variables which can change from a language to another.

And I have already checked How replace variable in string with value in php? and for now i basically use this method.

But i am interested in knowing if there is a built-in (or maybe not) convenient way in PHP to do that considering that i already have variables named exactly as X and Y in the previous example, more like $$ for a variable variable.

So instead of doing str_replace on the string i would maybe call a function like so :

$X = 'Dany';
$Y = 'Stack Overflow';
$lang['example'] = '{X} created a thread on {Y}';

echo parse($lang['example']);

would also print out :

Dany created a thread on Stack Overflow

Thanks!

Edit

The strings serve as templates and can be used multiple times with different inputs.

So basically doing "{$X} ... {$Y}" won't do the trick because i will lose the template and the string will be initialized with the starting values of $X and $Y which aren't yet determined.

解决方案

I'm going to add an answer here because none of the current answers really cut the mustard in my view. I'll dive straight in and show you the code I would use to do this:

function parse(
    /* string */ $subject,
    array        $variables,
    /* string */ $escapeChar = '@',
    /* string */ $errPlaceholder = null
) {
    $esc = preg_quote($escapeChar);
    $expr = "/
        $esc$esc(?=$esc*+{)
      | $esc{
      | {(\w+)}
    /x";

    $callback = function($match) use($variables, $escapeChar, $errPlaceholder) {
        switch ($match[0]) {
            case $escapeChar . $escapeChar:
                return $escapeChar;

            case $escapeChar . '{':
                return '{';

            default:
                if (isset($variables[$match[1]])) {
                    return $variables[$match[1]];
                }

                return isset($errPlaceholder) ? $errPlaceholder : $match[0];
        }
    };

    return preg_replace_callback($expr, $callback, $subject);
}

What does that do?

In a nutshell:

  • Create a regular expression using the specified escape character that will match one of three sequences (more on that below)
  • Feed that into preg_replace_callback(), where the callback handles two of those sequences exactly and treats everything else as a replacement operation.
  • Return the resulting string

The regex

The regex matches any one of these three sequences:

  • Two occurrences of the escape character, followed by zero or more occurrences of the escape character, followed by an opening curly brace. Only the first two occurrences of the escape character are consumed. This is replaced by a single occurrence of the escape character.
  • A single occurrence of the escape character followed by an opening curly brace. This is replaced by a literal open curly brace.
  • An opening curly brace, followed by one or more perl word characters (alpha-numerics and the underscore character) followed by a closing curly brace. This is treated as a placeholder and a lookup is performed for the name between the braces in the $variables array, if it is found then return the replacement value, if not then return the value of $errPlaceholder - by default this is null, which is treated as a special case and the original placeholder is returned (i.e. the string is not modified).

Why is it better?

To understand why it's better, let's look at the replacement approaches take by other answers. With one exception (the only failing of which is compatibility with PHP<5.4 and slightly non-obvious behaviour), these fall into two categories:

  • strtr() - This provides no mechanism for handling an escape character. What if your input string needs a literal {X} in it? strtr() does not account for this, and it would be substituted for the value $X.
  • str_replace() - this suffers from the same issue as strtr(), and another problem as well. When you call str_replace() with an array argument for the search/replace arguments, it behaves as if you had called it multiple times - one for each of the array of replacement pairs. This means that if one of your replacement strings contains a value that appears later in the search array, you will end up substituting that as well.

To demonstrate this issue with str_replace(), consider the following code:

$pairs = array('A' => 'B', 'B' => 'C');
echo str_replace(array_keys($pairs), array_values($pairs), 'AB');

Now, you'd probably expect the output here to be BC but it will actually be CC (demo) - this is because the first iteration replaced A with B, and in the second iteration the subject string was BB - so both of these occurrences of B were replaced with C.

This issue also betrays a performance consideration that might not be immediately obvious - because each pair is handled separately, the operation is O(n), for each replacement pair the entire string is searched and the single replacement operation handled. If you had a very large subject string and a lot of replacement pairs, that's a sizeable operation going on under the bonnet.

Arguably this performance consideration is a non-issue - you would need a very large string and a lot of replacement pairs before you got a meaningful slowdown, but it's still worth remembering. It's also worth remembering that regex has performance penalties of its own, so in general this consideration shouldn't be included in the decision-making process.

Instead we use preg_replace_callback(). This visits any given part of the string looking for matches exactly once, within the bounds of the supplied regular expression. I add this qualifier because if you write an expression that causes catastrophic backtracking then it will be considerably more than once, but in this case that shouldn't be a problem (to help avoid this I made the only repetition in the expression possessive).

We use preg_replace_callback() instead of preg_replace() to allow us to apply custom logic while looking for the replacement string.

What this allows you to do

The original example from the question

$X = 'Dany';
$Y = 'Stack Overflow';
$lang['example'] = '{X} created a thread on {Y}';

echo parse($lang['example']);

This becomes:

$pairs = array(
    'X' = 'Dany',
    'Y' = 'Stack Overflow',
);

$lang['example'] = '{X} created a thread on {Y}';

echo parse($lang['example'], $pairs);
// Dany created a thread on Stack Overflow

Something more advanced

Now let's say we have:

$lang['example'] = '{X} created a thread on {Y} and it contained {X}';
// Dany created a thread on Stack Overflow and it contained Dany

...and we want the second {X} to appear literally in the resulting string. Using the default escape character of @, we would change it to:

$lang['example'] = '{X} created a thread on {Y} and it contained @{X}';
// Dany created a thread on Stack Overflow and it contained {X}

OK, looks good so far. But what if that @ was supposed to be a literal?

$lang['example'] = '{X} created a thread on {Y} and it contained @@{X}';
// Dany created a thread on Stack Overflow and it contained @Dany

Note that the regular expression has been designed to only pay attention to escape sequences that immediately precede an opening curly brace. This means that you don't need to escape the escape character unless it appears immediately in front of a placeholder.

A note about the use of an array as an argument

Your original code sample uses variables named the same way as the placeholders in the string. Mine uses an array with named keys. There are two very good reasons for this:

  1. Clarity and security - it's much easier to see what will end up being substituted, and you don't risk accidentally substituting variables you don't want to be exposed. It wouldn't be much good if someone could simply feed in {dbPass} and see your database password, now would it?
  2. Scope - it's not possible to import variables from the calling scope unless the caller is the global scope. This makes the function useless if called from another function, and importing data from another scope is very bad practice.

If you really want to use named variables from the current scope (and I do not recommend this due to the aforementioned security issues) you can pass the result of a call to get_defined_vars() to the second argument.

A note about choosing an escape character

You'll notice I chose @ as the default escape character. You can use any character (or sequence of characters, it can be more than one) by passing it to the third argument - and you may be tempted to use \ since that's what many languages use, but hold on before you do that.

The reason you don't want to use \ is because many languages use it as their own escape character, which means that when you want to specify your escape character in, say, a PHP string literal, you run into this problem:

$lang['example'] = '\\{X}';   // results in {X}
$lang['example'] = '\\\{X}';  // results in \Dany
$lang['example'] = '\\\\{X}'; // results in \Dany

It can lead to a readability nightmare, and some non-obvious behaviour with complex patterns. Pick an escape character that is not used by any other language involved (for example, if you are using this technique to generate fragments of HTML, don't use & as an escape character either).

To sum up

What you are doing has edge-cases. To solve the problem properly, you need to use a tool capable of handling those edge-cases - and when it comes to string manipulation, the tool for the job is most often regex.

这篇关于替换字符串中的变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆