字符串解析问题 [英] String parsing question

查看:76
本文介绍了字符串解析问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道执行以下操作的最佳方法:


我有一个以分号分隔的字符串。分隔的项目可以是以下任何格式:

1)14个孤立字符

2)5个alphanums space 8个alphanums

3)6个alphanums冒号8个alphanums

4)5个alphanums冒号8个alphanums


我的任务是转换第三种格式的项目到第一种格式,和第四种格式的物品

到第二种格式。另外,我需要计算字符串中的项目数量

,其中可能有也可能没有结尾分号。


我的计划(我觉得是次优的 - 因此这篇文章),是一次性地逐步通过
一个字符的初始字符串,以便在一个

传递中完成这些事情。虽然我可以使用strchr()轻松计算分号,但正确删除

冒号意味着无论如何都要踩过整个字符串(对吗?)所以

我可以算半分号同时。我还想验证

数据格式(即不允许使用15个字符的项目)。


int myfunc(const char * list )

{

int items = 0;

char * cp = strdup(idlist); / * nonstandard * /

char * newstr = cp;

int shifting = 0;

int chars = 0;


for(; * cp; * cp ++){

if(* cp =='':''){

if(chars = = 6){

轮班++;

继续;

}

if(chars == 5){

*(cp-shifting)='''';

chars ++;

继续;

}

return(-1); / *错误* /

}

if(* cp =='';''){

items ++;

if(chars!= 14){

return(-1); / *错误* /

}

chars = 0;

}

else if(++ chars> ; 14){

return(-1); / *错误* /

}

*(cp-shifting)= * cp;

}

* (cp-shifting)=''\ 0'';

if(chars == 14){

items ++;

}

if(!items ||(chars&& chars!= 14)){

return(-1); / *错误* /

}

printf("字符串''%s''有%d项。",newstr,items);

free(newstr);

return(0); / *成功* /

}


还有更好的办法吗?


-

Christopher Benson-Manica |在你的命运转向轮子,

ataru(at)cyberspace.org |在你的课上学习。

I''m wondering about the best way to do the following:

I have a string delimited by semicolons. The items delimited may be in any of
the following formats:
1) 14 alphanum characters
2) 5 alphanums space 8 alphanums
3) 6 alphanums colon 8 alphanums
4) 5 alphanums colon 8 alphanums

My task is to convert items in the third format to the first format, and items
in the fourth format to the second. Also, I need to count the number of items
in the string, which may or may not have a trailing semicolon.

My plan (which I feel is sub-optimal - hence this post), is to step through
the initial string one character at a time to accomplish these things in one
pass. While I could count semicolons easily with strchr(), deleting the
colons properly means stepping through the whole string anyway (right?) and so
I may as well count semicolons simultaneously. I''d also like to validate the
data format (i.e., 15-character items are not allowed).

int myfunc( const char *list )
{
int items=0;
char *cp=strdup( idlist ); /* nonstandard */
char *newstr=cp;
int shifts=0;
int chars=0;

for( ; *cp ; *cp++ ) {
if( *cp == '':'' ) {
if( chars == 6 ) {
shifts++;
continue;
}
if( chars == 5 ) {
*(cp-shifts)='' '';
chars++;
continue;
}
return( -1 ); /* error */
}
if( *cp == '';'' ) {
items++;
if( chars != 14 ) {
return( -1 ); /* error */
}
chars=0;
}
else if( ++chars > 14 ) {
return( -1 ); /* error */
}
*(cp-shifts)=*cp;
}
*(cp-shifts)=''\0'';
if( chars == 14 ) {
items++;
}
if( !items || (chars && chars != 14) ) {
return( -1 ); /* error */
}
printf( "The string ''%s'' has %d items.", newstr, items );
free( newstr );
return( 0 ); /* success */
}

Is there a better way?

--
Christopher Benson-Manica | Upon the wheel thy fate doth turn,
ataru(at)cyberspace.org | upon the rack thy lesson learn.

推荐答案

在< bm ********** @ chessie.cirr.com> Christopher Benson-Manica< at *** @ nospam.cyberspace.org>写道:
In <bm**********@chessie.cirr.com> Christopher Benson-Manica <at***@nospam.cyberspace.org> writes:
我想知道如何做到以下几点的最好方法:

我有一个用分号分隔的字符串。分隔的项目可以采用以下任何格式:
1)14个孤立字符
2)5个alphanums空间8个alphanums
3)6个alphanums冒号8个alphanums 4)5个alphanums冒号8个alphanums

我的任务是将第三种格式的项目转换为第一种格式,将第四种格式的项目转换为第二种格式。另外,我需要计算字符串中的项目数量,这些项目可能有也可能没有带分号。

我的计划(我认为这是次优的 - 因此这篇文章),是逐步通过
初始字符串一个字符来完成这些事情的一次传递。虽然我可以使用strchr()轻松计算分号,但正确删除
冒号意味着无论如何都要踩过整个字符串(对吗?)所以
我也可以同时计算分号。我也想验证
数据格式(即不允许使用15个字符的项目)。

int myfunc(const char * list)
{
int items = 0;
char * cp = strdup(idlist); / * nonstandard * /
char * newstr = cp;
int shifting = 0;
int chars = 0;

for(; * cp; * cp ++) {
如果(* cp =='':''){
if(chars == 6){
转换++;
继续;
} if(chars == 5){
*(cp-shifting)='''';
chars ++;
继续;
}
返回(-1 ); / *错误* /
}
if(* cp =='';''){
项目++;
if(chars!= 14){
返回(-1); / *错误* /
}
chars = 0;
}
if if(++ chars> 14){
return(-1); / *错误* /
}
*(cp-shifting)= * cp;
}
*(cp-shifting)=''\ 0'';
if(chars == 14){
items ++;
}
if(!items ||(chars&& chars!= 14)){
return( -1); / *错误* /
}
printf("字符串''%s''有%d项。",newstr,items);
免费(newstr);
return(0); / *成功* /
}

有更好的方法吗?
I''m wondering about the best way to do the following:

I have a string delimited by semicolons. The items delimited may be in any of
the following formats:
1) 14 alphanum characters
2) 5 alphanums space 8 alphanums
3) 6 alphanums colon 8 alphanums
4) 5 alphanums colon 8 alphanums

My task is to convert items in the third format to the first format, and items
in the fourth format to the second. Also, I need to count the number of items
in the string, which may or may not have a trailing semicolon.

My plan (which I feel is sub-optimal - hence this post), is to step through
the initial string one character at a time to accomplish these things in one
pass. While I could count semicolons easily with strchr(), deleting the
colons properly means stepping through the whole string anyway (right?) and so
I may as well count semicolons simultaneously. I''d also like to validate the
data format (i.e., 15-character items are not allowed).

int myfunc( const char *list )
{
int items=0;
char *cp=strdup( idlist ); /* nonstandard */
char *newstr=cp;
int shifts=0;
int chars=0;

for( ; *cp ; *cp++ ) {
if( *cp == '':'' ) {
if( chars == 6 ) {
shifts++;
continue;
}
if( chars == 5 ) {
*(cp-shifts)='' '';
chars++;
continue;
}
return( -1 ); /* error */
}
if( *cp == '';'' ) {
items++;
if( chars != 14 ) {
return( -1 ); /* error */
}
chars=0;
}
else if( ++chars > 14 ) {
return( -1 ); /* error */
}
*(cp-shifts)=*cp;
}
*(cp-shifts)=''\0'';
if( chars == 14 ) {
items++;
}
if( !items || (chars && chars != 14) ) {
return( -1 ); /* error */
}
printf( "The string ''%s'' has %d items.", newstr, items );
free( newstr );
return( 0 ); /* success */
}

Is there a better way?




1.这样的代码是维护噩梦(想象一下,你将不得不在5年后做出一些改变。


2.我可能会遗漏一些东西,但我可以'找不到任何试验的尝试

你的角色真的是alphanums,你只是在寻找你的

分隔符。


我将使用sscanf调用实现此功能。结果会比较慢,但更可读。

字母数字转换说明符可以使用以下宏:


#define ALNUM" [abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWX YZ0123456789]"


Dan

-

Dan Pop

DESY Zeuthen,RZ集团

电子邮件: Da ***** @ ifh.de


Christopher Benson-Manica写道:
Christopher Benson-Manica wrote:
我想知道如何做到以下几点的最好方法:

我有一个由分号分隔的字符串。分隔的项目可以采用以下任何格式:
1)14个孤立字符
2)5个alphanums空间8个alphanums
3)6个alphanums冒号8个alphanums 4)5个alphanums冒号8个alphanums

我的任务是将第三种格式的项目转换为第一种格式,将第四种格式的项目转换为第二种格式。另外,我需要计算字符串中的项目数量,这些项目可能有也可能没有带分号。

我的计划(我认为这是次优的 - 因此这篇文章),是逐步通过
初始字符串一个字符来完成这些事情的一次传递。虽然我可以使用strchr()轻松计算分号,但正确删除
冒号意味着无论如何都要踩过整个字符串(对吗?)所以
我也可以同时计算分号。我还想验证
数据格式(即不允许使用15个字符的项目)。
[代码剪辑]

有更好的方法吗?
I''m wondering about the best way to do the following:

I have a string delimited by semicolons. The items delimited may be in any of
the following formats:
1) 14 alphanum characters
2) 5 alphanums space 8 alphanums
3) 6 alphanums colon 8 alphanums
4) 5 alphanums colon 8 alphanums

My task is to convert items in the third format to the first format, and items
in the fourth format to the second. Also, I need to count the number of items
in the string, which may or may not have a trailing semicolon.

My plan (which I feel is sub-optimal - hence this post), is to step through
the initial string one character at a time to accomplish these things in one
pass. While I could count semicolons easily with strchr(), deleting the
colons properly means stepping through the whole string anyway (right?) and so
I may as well count semicolons simultaneously. I''d also like to validate the
data format (i.e., 15-character items are not allowed). [code snipped]

Is there a better way?




另一种方法是像语言一样解析字符串。分析

数据以查找其当前格式,然后应用转换。


让我们仔细观察格式。让A代表任意字符

来自字母数字集合。

[1] AAAAAAAAAAAAAA

[2] AAAAA AAAAAAAA

[3] AAAAAA:AAAAAAAA

[4] AAAAA:AAAAAAAA

查看以上几行,格式在6日不同

列(从第1列开始作为第一列)。

变体是:

第6个字符格式数

------ - -------------

'':''4

''''2

A 1或3

最后一个值需要查看第7列:

7th char格式编号

--------- ------------

'':''3

A 1


基于这个分析,格式选择看起来很容易。

格式转换留给读者和& OP。


Format1 :: = AlphaNum AlphaNum {...} AlphaNum


Format2 :: = AlphaNum AlphaNum AlphaNum AlphaNum

AlphaNum''''


等等。您可以尝试使用Lexer工具,例如

Yacc和Lexx(Bison和Flex)。


-

托马斯马修斯


C ++新闻组欢迎辞:

http ://www.slack.net/~shiva/welcome.txt

C ++常见问题:http://www.parashift.com/c++-faq-lite

C常见问题:http://www.eskimo.com/~scs/c-faq/top.html

alt.comp.lang.learn.c-c ++ faq:

http://www.raos.demon.uk/acllc-c++/faq.html

其他网站:

http://www.josuttis.com - C ++ STL图书馆书



Another method would be parse the string like a language. Analyze the
data to find its current format, then apply the conversion.

Let''s look closer at the formats. Let A represent any character
from the set of alphanumerics.
[1] AAAAAAAAAAAAAA
[2] AAAAA AAAAAAAA
[3] AAAAAA:AAAAAAAA
[4] AAAAA:AAAAAAAA
Looking at the above lines, the formats differ at the 6th
column (starting with column 1 as the first column).
The variations are:
6th char Format Number
-------- -------------
'':'' 4
'' '' 2
A 1 or 3
This last value requires looking at column 7:
7th char Format Number
-------- -------------
'':'' 3
A 1

Based on this analysis, format selection looks easy.
Format conversion is left for the reader & OP.

Format1 ::= AlphaNum AlphaNum {...} AlphaNum

Format2 ::= AlphaNum AlphaNum AlphaNum AlphaNum
AlphaNum '' ''

Etc. You could try using a Lexer tool, such as
Yacc and Lexx (Bison and Flex).

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book


你看过strspn和strcspn吗?后者将找到(下一个)
分号,前者可​​以验证当前

到分号的字符都是字母数字。


char * alnum =" abcdefghijklmnopqrstuvwxyz"

" ABCDEFGHIJKLMNOPQRSTUVWXYZ"

" 0123456789";

size_t tokenLength(char * tkn)

{

size_t len,half;


if(!tkn )

return(size_t)0;


len = strlen(tkn);

semi = strcspn(tkn," ;");

if(半== len)//没有分号

返回(size_t)0;


if(strspn(tkn,alnum)!= semi)

return(size_t)0; //不是所有的字母数字


返回半个;

}


-

#include< standard.disclaimer>

_

Kevin D Quitt USA 91387-4454 96.37%的统计数据构成

根据FCA,此地址不得添加到任何商业邮件列表中
Have you looked at strspn and strcspn? The latter will locate the (next)
semi-colon, and the former can verify that the characters from the current
to the semi-colon are all alphanumerics.

char *alnum = "abcdefghijklmnopqrstuvwxyz"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"0123456789";

size_t tokenLength( char *tkn )
{
size_t len, semi;

if ( !tkn )
return (size_t)0;

len = strlen( tkn );
semi = strcspn( tkn, ";" );
if ( semi == len ) // There''s no semi-colon
return (size_t)0;

if ( strspn( tkn, alnum ) != semi )
return (size_t)0; // Not all alpha-num

return semi;
}

--
#include <standard.disclaimer>
_
Kevin D Quitt USA 91387-4454 96.37% of all statistics are made up
Per the FCA, this address may not be added to any commercial mail list


这篇关于字符串解析问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆