外语单词之间的空格 [英] Spaces between words in forign language

查看:98
本文介绍了外语单词之间的空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了从包含许多单词的一行中获取所有单词(如果已使用Split).
当我使用英文字母时,一切都正确,但是如果使用阿拉伯语等外国语言
事情变得不同了,我发现每个阿拉伯语单词之间必须至少有3个空格1个空格不够
为了使用
ss2 = ssFeedo [0] .Split(new char [] {''}},StringSplitOptions.RemoveEmptyEntries);
请看下面的程序.
您会发现方法
private void EnglishIf()可以正常工作而没有任何错误

您会发现方法
如果我注释了B行和未注释的A行,则private void ArabicIf()也可以正常工作
你会发现
如果我注释A行和未注释的B行,则private void ArabicIf()将无法正常工作
我的问题是
如果我使用线B,有什么方法可以得到J = 20
ss2 = ssFeedo [0] .Split(new char [] {''}},StringSplitOptions.RemoveEmptyEntries);
而ssFeedo [0]包含几个阿拉伯语单词,仅用一个空格分隔吗?

In order to get all words from one line that contains many words, if have used Split.
every things is correct when i used English letters,but if used forign language like arabic
things becomes diffrent ,i have found that At least 3 spaces must be between each arabic words 1 space is not enough
in order to use the
ss2= ssFeedo[0].Split(new char[] { '' '' }, StringSplitOptions.RemoveEmptyEntries);
look at the program below.
you will find method
private void EnglishIf() works fine without any errors
and
you will find method
private void ArabicIf() also will works fine if i comment the Line B and uncommented line A
and you will find that
private void ArabicIf() will not works fine if i comment the Line A and uncommented line B
MY QUESTION IS
is there is any way to obtains J=20 if i used line line B
ss2= ssFeedo[0].Split(new char[] { '' '' }, StringSplitOptions.RemoveEmptyEntries);
while ssFeedo[0] contains several arabic words that is separated by only one single space?

//-----------
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.IO;
namespace testIf
{
    public partial class Form1 : Form
    {
        StreamWriter sw = null;
        FileStream fsW = null;
        public Form1()
        {
            InitializeComponent();
             fsW = new FileStream("pursuit.txt", FileMode.Open, FileAccess.Write);
             sw = new StreamWriter(fsW);
            EnglishIf();
            ArabicIf();
            sw.Close(); fsW.Close();
        }
        private void EnglishIf()
        {
            string[] ss2 = new string[120];
            string[] ssFeedo = new string[2000];
            int J = -1;
                ssFeedo[0] = "hellow0 hellow1 hellow2";
                ss2 = ssFeedo[0].Split(new char[] { '' '' }, StringSplitOptions.RemoveEmptyEntries);
                   sw.Write(ss2.Length.ToString() + "\n");
                sw.Write("ss2[0]=" + ss2[0] + "  ss2[1]=" +ss2[1] + "  ss2[2]=" +ss2[2]+"\n");
                if (ss2.Length == 3)
                {
                   sw.Write("yes ss2[0]=" + ss2[0] + "\n");
                   sw.Write("yes ss2[1]=" + ss2[1] + "\n");
                   sw.Write("yes ss2[2]=" + ss2[2] + "\n");
                   if(ss2[2]=="hellow2")
                   {
                    J=10;
                    MessageBox.Show("J=",J.ToString());
                    sw.Write("J=" + J.ToString() + "\n");
                    goto outy2;
                   }
                   else;
                }//if ss2.length==3
                else;
            outy2:
                ;
        }//EnglishIf
        private void ArabicIf()
        {
            string[] ss2 = new string[120];
            string[] ssFeedo = new string[2000];
            int J = -1;
            //there are 3 spaces between the words
            //ssFeedo[0] = "مرحبا0   مرحبا1  مرحبا2";//<--------------- Line A
            
            
            //there are only one space between the words
            ssFeedo[0] = "مرحبا0 مرحبا1 مرحبا2";// <------------------- Line B
            ss2 = ssFeedo[0].Split(new char[] { '' '' }, StringSplitOptions.RemoveEmptyEntries);
            sw.Write(ss2.Length.ToString() + "\n");
            sw.Write("ss2[0]=" + ss2[0] + "  ss2[1]=" + ss2[1] + "  ss2[2]=" + ss2[2] + "\n");
            if (ss2.Length == 3)
            {
                sw.Write("yes ss2[0]=" + ss2[0] + "\n");
                sw.Write("yes ss2[1]=" + ss2[1] + "\n");
                sw.Write("yes ss2[2]=" + ss2[2] + "\n");
                if (ss2[2] == "مرحبا2")
                {
                    J = 20;
                    //you will never get J=20 if you used the line B instead of line A
                    MessageBox.Show("J=", J.ToString());
                    sw.Write("J="+J.ToString()+"\n");
                    goto outy2;
                }
                else;
            }//if ss2.length==3
            else ;
        outy2:
            ;
        }//ArabicIf
    }
}



[edit]已添加代码块-OriginalGriff [/edit]



[edit]Code block added - OriginalGriff[/edit]

推荐答案

运行此代码:

Running this:

//The arabic is fine on my machine, it might be dodgy here!!!
string foo = "مرحب مرحبا مرحبا";
string[] bar = foo.Split(' ');
Console.WriteLine(bar.Length);


输出3(拆分成阿拉伯语的欢迎").
更换


Output 3 (the Arabic "Welcomes" are split).
Replacing

string foo = "مرحبا0 مرحبا1 مرحبا2

;
像预期的那样返回3,

";
Returns 3, as expected as does

string[] bar = foo.Split(new char[] { '' '' }, StringSplitOptions.RemoveEmptyEntries);


请注意,我只使用了单个空格.

这意味着(但不是结论性的)您的代码有问题,但是我看不到什么.

另一件事是检查要用于拆分的空间是否与文本中的空间相同.我注意到在Windows阿拉伯字符集(请参见下面的链接)中,在#00A0处有一个不可突破的空间,但在#0020处也有一个正常的空间,这些空间不一样,分割会失败!这并不能解释为什么两个空格都起作用,除非您已复制标准空格或在键盘设置为"en"的情况下键入了辅助空格.最好的检查方法是复制要分割的字符串并输出十六进制代码值,如果两个空格不同,则存在问题.您可以通过将阿拉伯语空格添加到拆分字符列表中来解决此问题.

http://en.wikipedia.org/wiki/Windows-1256 [


Note that I have used only single spaces.

This implies (but is not conclusive) that there is something wrong with your code, but I cannot see what.

The other thing to do is to check the space you are using to split is the same as the one in the text. I notice in the Windows Arabic char set (see link below) there is an Non-Breaking Space at #00A0, but there is also the normal space at #0020, these are not the same, and the split would fail! This doesn''t explain why two spaces work, unless you have copied the standard space or typed the secondary spaces with the keyboard set to "en". The best way to check is to copy the string you are splitting and output the hex code values, if the two spaces are different, there''s your problem. You can fix the problem by adding the Arabic space to the list of split characters.

http://en.wikipedia.org/wiki/Windows-1256[^]


嗯,这很困难.我使用.NET 4.0将您的代码粘贴到VS 2010中,并对其进行了一些修改(删除了与该问题没有直接关系的部分,例如文件I/O),并且它按预期运行.单词正确地分为3个元素,分隔它们的空格数量没有影响.

Hmm, this is a difficult one. I pasted your code into VS 2010, using .NET 4.0, and adapted it slightly (removing parts not immediately related to the problem, eg. the file I/O), and it''s running as expected. The words are correctly split into 3 elements, the number of spaces separating them does not make a difference.

using System;
namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            English();
            Arabic("مرحبا0   مرحبا1  مرحبا2"); //<--------------- Line A (3 spaces between the words)
            Arabic("مرحبا0 مرحبا1 مرحبا2"); // <------------------- Line B (only one space between the words)
            Console.ReadLine(); // Pause window before it disappears
        }

        private static void English()
        {
            Console.WriteLine("---------English---------");
            string[] ss2; // = new string[120]; <------ This is unnecessary, the array created here will be overwritten by String.Split
            string[] ssFeedo = new string[2000];
            ssFeedo[0] = "hellow0 hellow1 hellow2";
            ss2 = ssFeedo[0].Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
            Console.WriteLine("ss2[0]=" + ss2[0] + "  ss2[1]=" + ss2[1] + "  ss2[2]=" + ss2[2] + "\n");
            if (ss2.Length == 3)
            {
                Console.WriteLine("Correct");
            }
        }

        private static void Arabic(string text)
        {
            Console.WriteLine("---------Arabic---------");
            string[] ss2; // = new string[120]; <------ This is unnecessary, the array created here will be overwritten by String.Split
            string[] ssFeedo = new string[2000];
            int J = -1;
            ssFeedo[0] = text;
            ss2 = ssFeedo[0].Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
            Console.WriteLine("ss2[0]=" + ss2[0] + "  ss2[1]=" + ss2[1] + "  ss2[2]=" + ss2[2] + "\n");
            if (ss2.Length == 3)
            {
                Console.WriteLine("Correct");
                if (ss2[2] == "مرحبا2")
                {
                    J = 20;
                    //you will never get J=20 if you used the line B instead of line A
                    Console.WriteLine("J=" + J.ToString() + "\n");
                }
            }//if ss2.length==3
        }
    }
}



在每个阿拉伯文本案例中,J的确设置为20.

您介意在系统上尝试这个小应用程序(它是一个控制台应用程序),并验证您是否还获得了正确的输出吗?当然,阿拉伯字符在控制台窗口中显示为问号,但变量的内容仍然正确,如在调试器中所验证的那样.

如果您没有得到相同的结果,那么我想知道我们的系统在某些区域设置方面是否可能有所不同.

我的输出:



In each of the Arabic text cases J is indeed set to 20.

Would you mind trying this little app on your system (it''s a console application), and verifying if you''re also getting correct output? Granted, the Arabic characters display as question marks in the console window, but the contents of the variables is nevertheless correct, as verified under a debugger.

If you don''t get the same results, then I''m wondering whether our systems may perhaps differ in terms of some regional settings.

My output:

---------English---------
ss2[0]=hellow0  ss2[1]=hellow1  ss2[2]=hellow2
Correct
---------Arabic---------
ss2[0]=?????0  ss2[1]=?????1  ss2[2]=?????2
Correct
J=20
---------Arabic---------
ss2[0]=?????0  ss2[1]=?????1  ss2[2]=?????2
Correct
J=20


我在两行都得到相同的结果...

ss2[2]失败时,您会得到什么?
I''m getting the same result with both lines...

What do you get on ss2[2] when it fails?


这篇关于外语单词之间的空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆