嗨...关于文本文件中的一些奇怪的字符 [英] Hi...about some strange character in textfile

查看:66
本文介绍了嗨...关于文本文件中的一些奇怪的字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,

我是VB.Net编程的新手..


希望你们中的一些人可以帮我解决这个问题。 。


我正在努力阅读,解析并将文本文件保存到SQL Server中。

文本文件包含数千行,每行约50个coloums ..


Everythings一直顺利,直到我找到一个带有一些奇怪字符的文本文件...似乎是日文字符(因为它是拥有此文本文件的日本公司)


问题是..

并非此文件中的所有行都有这个奇怪的字符..因为这个字符,解析函数无法正常工作(我正在使用Substring来读取每一列。)

因为行的长度变得不同..

具有奇怪字符的行的长度是208。没有这个奇怪角色的长度只有206.


有人可以告诉我如何解决这类问题吗?


T hanks任何建议和答案.....

解决方案

嗨。我根本不了解.NET编程,但我知道很多关于在VB6中处理这些Unicode字符的问题,也许.NET在这方面类似......?


这些Unicode字符中的一个由两个字节组成。

例如,如果你要求VB给你一个长度...

aaaか" (3'字母'''',然后是平假名''ka'' - 不知道在这些论坛上是否会出现这种情况)

...使用Len( ),然后VB会说它长4个字符,但是如果你使用LenB()它会说它长5个字符(5个字节)。


我但是我不理解你的问题。

你说它不会让某些东西正确解析,你似乎也认为VB认为它是2个字符是错误的。


嗯,我试图解释为什么当你删除Unicode字符时它会减少2个字符,但我不明白你的意思是什么解析,抱歉。 -_-;


编辑:基本上,如果.NET中有不同的功能,你可以通过''字符数'和''来处理字符串字节'',就像在VB6中一样,也许你需要使用不同的函数来找到有效的......? (是的,我还是不理解> __<)


谢谢,罗比..


我知道它有什么东西给使用unicode ...,但我不知道如何在我的代码中克服这个...


所以,就像这样..

我有一个文本文件,包含近40,000行和50列......

在本专栏之一中它有一些奇怪的特征。

但它出现了只在某些行中..不是文本文件中的所有行。这使得每行的长度变得不同......


正如我之前所说,我正在努力读取,解析并将文本文件保存到SQLServer中。

首先,我每行读取文本文件行。然后我尝试在每一行中读取并保存每列的数据列。在这里,我正在使用Substring来完成它。

简单地说,我正在计算每列的长度以获取数据并将其保存到数据库中......


例如:


专栏:

ItemCode ItemDescription InvoiceNo


数据:

CV1025 HandkerchiefRED SX100 - >没有奇怪的性格

SC22254皮革钱包橙色U SC452 - >奇怪的角色


假设列itemcode的长度是10,

列itemdescription的长度是15(这里有时包含奇怪的字符)

和列invoiceLo是10


但是因为某些行的长度变得不同...我已经读出了这个函数每列的数据不再正常工作...


在行中没有奇怪的字符我将根据列来获取与textfile中的数据完全相同的数据。


列项目代码:CV1025

栏目ItemDescription:手帕RED

发票号:SX100


但是在有奇怪字符的行中,我得到这样的数据:


列条目代码:SC22254

列ItemDescription:皮革钱包橙色(奇怪的)角色丢失了)

InvoiceNo:452(缺少SC)


并且它不能保存我n到数据库。


希望问题现在更清楚了......


谁能帮助我?


谢谢...


对不起,我根本无法帮助SQLServer。这对我来说没有任何意义。

但我认为如果.NET中的不同部分工作方式不同,那么你的函数只会失败。例如,获取一段文本的长度以字符为单位(Unicode字符计为1),但您使用Substring以字节为单位绘制一些文本度量(Unicode字符计为2)。

或者那样的东西。

这就是我所能做的,只是暗示我认为问题出在哪里;我不知道如何解决它。 T_T

Hi all,
I''m a newbie in VB.Net Programming..

Hope that some of you can help me to solve this..

I''m working out to read,parse and save textfile into SQL Server.
The textfile contains thousands of rows with about 50 coloums every row..

Everythings goes well until I found one textfile with some strange character...seems to be Japanese character(because it''s a Japanese company who owns this textfile)

The problem is..
Not all rows in this file have this strange characters..and because of this character, the parse function can''t work properly(I''m using Substring to read every column).
Because the length of the row become different..
the length of the row with the strange character is 208.The length without this strange character is only 206.

Can someone please tell me how to fix this kind of problems??

Thanks for any suggestions and answers.....

解决方案

Hi. I do not know about .NET programming much at all, but I know a lot about dealing with such Unicode characters in VB6, and maybe .NET is similar in this respect...?

A single one of these Unicode characters is made of two bytes.
If, for example, you ask VB to give you the length of...
"aaaか" (3 of letter ''a'', then a Hiragana ''ka'' - don''t know if that''ll come out alright on these forums)
...using Len(), then VB will say that it is 4 characters long, but if you use LenB() it''ll say it''s 5 characters long (5 bytes).

I''m not understanding your problem though.
You say that it''s not letting something parse properly, and you also seemed to think that VB thinking it was 2 characters was wrong.

Well, I''ve tried to explain why it says it''s 2 characters less when you remove the Unicode character, but I don''t understand what you mean about the parsing, sorry. -_-;

EDIT: So basically, if there are different functions in .NET which let you deal with strings of characters by the ''number of characters'' and ''number of bytes'', as there are in VB6, maybe you need to play around with the different functions to find ones which work...? (Yep, I''m still not understanding >__<)


Thanks,Robbie..

I know that it has something to do with the unicode...,but i don''t know how can I get over this in my code...

so, it is like this..
I have one textfile with almost 40,000 rows and 50 columns...
in one of this column it has some strange character.
But it shows up only in some rows..not all of the rows in the textfile.That makes the length of each rows become different...

As I say before ,I''m working out to read,parse and save the textfile into SQLServer.
First of all I read the textfile line per line.After that I try to read and save the data column per column in every line. Here, I''m using Substring to do it.
Simple to say, I''m counting the length of every column to get the data and save it to database...

For ex:

The column:
ItemCode ItemDescription InvoiceNo

The Data :
CV1025 HandkerchiefRED SX100 --> no strange character
SC22254 Leather Purse Orange U SC452 --> with strange character

Let say the length for column itemcode is 10,
the length for column itemdescription is 15(here sometimes contains strange characters)
and for column invoiceNo is 10


But because the length become different in some rows...the function I''ve made to read out the data per column is not working properly anymore...

in row without strange character I''ll get the data exactly as the data in textfile according to the column..

column itemcode : CV1025
column ItemDescription : HandkerchiefRED
InvoiceNo : SX100


but in rows with the strange character, I get the data like this:

column itemcode : SC22254
column ItemDescription : Leather Purse Orange (the strange character is missing)
InvoiceNo : 452 (the SC is missing)

and it can''t be save into the database.

Hope that the problem is more clear now...

can anyone help me??

thanks...


Sorry, I can''t help with SQLServer at all. It means nothing to me.
But I think that your function would only fail if different parts in .NET are working differently. For example, getting the length of a piece of text measures in characters (a Unicode character is counted as 1), but you using Substring to pic out some text measures in bytes (a Unicode character is counted as 2).
Or something to that effect.
That''s all I can do, only hint towards where I think the problem is; I don''t know how to solve it. T_T


这篇关于嗨...关于文本文件中的一些奇怪的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆