特殊字符会在PHP中抛出str_pad吗? [英] Special characters throwing off str_pad in php?

查看:67
本文介绍了特殊字符会在PHP中抛出str_pad吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个模块,该模块应该能够以BankOne格式导出交易记录.

Here

将字段放在行的特定范围内,并用新行分隔记录.需要添加大量空格以确保字段在行中的特定点处开始和结束.

我为此在php中编写了一个函数.它以字段作为参数,并应返回格式正确的记录.

function record4($checknum='', $nameid='', $purpose='', $pledge='', $payment='', 
             $frequency='', $title='', $fname='', $lname='', $suffix='',
             $address='', $postalcode='', $city='', $state='', $greeting='')
{
$fields = array(
    'checknum' => array('length' => 8, 'start' => 37),
    'nameid' => array('length' => 7, 'start' => 45),
    'purpose' => array('length' => 5, 'start' => 52),
    'pledge' => array('length' => 10, 'start' => 57),
    'payment' => array('length' => 10, 'start' => 67),
    'frequency' => array('length' => 1, 'start' => 77),
    'title' => array('length' => 20, 'start' => 78),
    'fname' => array('length' => 40, 'start' => 98),
    'lname' => array('length' => 40, 'start' => 138),
    'suffix' => array('length' => 20, 'start' => 178),
    'address' => array('length' => 35, 'start' => 198),
    'postalcode' => array('length' => 10, 'start' => 233),
    'city' => array('length' => 28, 'start' => 243),
    'state' => array('length' => 5, 'start' => 271),
    'greeting' => array('length' => 40, 'start' => 276)
);

$str = '4';
foreach($fields as $field_name => $field)
{
    if($$field_name)
    {
        $str = str_pad($str, $field['start']-1, ' ');
        $str = $str.substr(trim((string)$$field_name), 0, $field['length']);
    }
}

return $str."\n";
}

它似乎按预期工作,但是当我查看输出文件时,发现了这一点(滚动到末尾):

4                                                                 1                              David                                   Landrum
4                                                                 3                              Hazel                                   Baker
4                                                                 3                              Jerome                                  Zehnder
4                                                                 1                              Víctor                               Nadales
4                                                                 2                              Philip                                  Nauert
4                                                                 1                              Jana                                    Ortcutter

该文件包含从数据库中提取的900条记录,所有记录的格式都正确,除了Véctor Nadales.以该名字命名之后,其他所有字段都在其应保留的位置之后留三个空格.关于此记录的唯一异常现象似乎是名字中的Ã".

该函数应该在处理完每个字段后将字符串填充到适当的长度,但是在这一行上却以某种方式被愚弄了?

谁能告诉我这是怎么回事?

我刚刚意识到,这种格式的任何导入文件都可能甚至不支持特殊的UTF-8字符.因此,我将此行添加到了我的代码中:

$$field_name = iconv('UTF-8', 'ASCII//TRANSLIT', $$field_name);

Ã出来的样子是这样的:〜A-.不理想,但至少文件现在已正确格式化.

解决方案

之所以会发生这种情况,是因为'Ã'是一个多字节字符(长4个字节),并且str_pad在计算字节而不是逻辑字符.

这就是为什么您缺少三个空格的原因,str_pad'Ã'计数为4个单字节字符,而不是一个多字节字符.

尝试使用此功能(在此处信用).

<?
function mb_str_pad( $input, $pad_length, $pad_string = ' ', $pad_type = STR_PAD_RIGHT)
{
    $diff = strlen( $input ) - mb_strlen( $input );
    return str_pad( $input, $pad_length + $diff, $pad_string, $pad_type );
}
?>

I'm writing a module that is supposed to be able to export transaction records in BankOne format.

Here is the specification of the format

Here is an example file

The fields are put in specific ranges on the line and records are seperated by new lines. Lots of spaces needs to be added to ensure that the fields start and end at specific points in the line.

I wrote a function in php for this. It takes in the fields as parameters and should return a properly formatted record.

function record4($checknum='', $nameid='', $purpose='', $pledge='', $payment='', 
             $frequency='', $title='', $fname='', $lname='', $suffix='',
             $address='', $postalcode='', $city='', $state='', $greeting='')
{
$fields = array(
    'checknum' => array('length' => 8, 'start' => 37),
    'nameid' => array('length' => 7, 'start' => 45),
    'purpose' => array('length' => 5, 'start' => 52),
    'pledge' => array('length' => 10, 'start' => 57),
    'payment' => array('length' => 10, 'start' => 67),
    'frequency' => array('length' => 1, 'start' => 77),
    'title' => array('length' => 20, 'start' => 78),
    'fname' => array('length' => 40, 'start' => 98),
    'lname' => array('length' => 40, 'start' => 138),
    'suffix' => array('length' => 20, 'start' => 178),
    'address' => array('length' => 35, 'start' => 198),
    'postalcode' => array('length' => 10, 'start' => 233),
    'city' => array('length' => 28, 'start' => 243),
    'state' => array('length' => 5, 'start' => 271),
    'greeting' => array('length' => 40, 'start' => 276)
);

$str = '4';
foreach($fields as $field_name => $field)
{
    if($$field_name)
    {
        $str = str_pad($str, $field['start']-1, ' ');
        $str = $str.substr(trim((string)$$field_name), 0, $field['length']);
    }
}

return $str."\n";
}

It seems to work as intended, but when I looked at the output file I found this (scroll to the end):

4                                                                 1                              David                                   Landrum
4                                                                 3                              Hazel                                   Baker
4                                                                 3                              Jerome                                  Zehnder
4                                                                 1                              Víctor                               Nadales
4                                                                 2                              Philip                                  Nauert
4                                                                 1                              Jana                                    Ortcutter

The file contains 900 records pulled from a database, all of them are formatted correctly, except Víctor Nadales. After that first name, every other field is three spaces left of where it is supposed to be. The only anomalous thing about this record appears to be the 'Ã' in the first name.

The function is supposed to pad out the string to the proper length after each and every field it processes, yet it somehow gets fooled on this one line?

Can anyone tell me what is going on here?

EDIT: I just realized that whatever imports files of this format might not even support special UTF-8 characters. Therefore I added this line to my code:

$$field_name = iconv('UTF-8', 'ASCII//TRANSLIT', $$field_name);

The à comes out looking like this: ~A-. Not ideal, but at least the file is formatted properly now.

解决方案

This is happening because 'Ã' is a multi-byte character (4 bytes long), and str_pad is counting bytes rather than logical characters.

This is why you are missing three spaces, str_pad is counting 'Ã' as 4 single byte characters instead of one multi-byte one.

Try this function (credit here).

<?
function mb_str_pad( $input, $pad_length, $pad_string = ' ', $pad_type = STR_PAD_RIGHT)
{
    $diff = strlen( $input ) - mb_strlen( $input );
    return str_pad( $input, $pad_length + $diff, $pad_string, $pad_type );
}
?>

这篇关于特殊字符会在PHP中抛出str_pad吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆