您可以通过不安全的方法更改(immutable)的字符串的内容? [英] Can you change the contents of a (immutable) string via an unsafe method?

查看:133
本文介绍了您可以通过不安全的方法更改(immutable)的字符串的内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道,字符串是不变的,为String的任何更改只在内存中创建一个新的字符串(标志着旧免费)。不过,我想知道如果我的逻辑是以下中的声音,你居然能在圆一个回合的方式,修改字符串的内容。

 常量字符串即basestring =敏捷的棕色狐狸跳过懒惰的狗! 

//初始化一个新的字符串
串candidateString =新的字符串(\0',baseString.Length);

//针串
的GCHandle的GCHandle = GCHandle.Alloc(candidateString,GCHandleType.Pinned);

//基本字符串的内容复制到候选串
不安全
{
的char * cCandidateString =(字符*)gcHandle.AddrOfPinnedObject();
的for(int i = 0; I< baseString.Length;我++)
{
cCandidateString [I] =即basestring [I]
}
}



请问这种做法确实改变内容 candidateString (没有在内存中创建一个新的candidateString)还是运行时通过我的伎俩,并把它作为一个正常的字符串?


解决方案

您例如工作得很好,得益于几个因素:




  • candidateString 住在托管堆中,因此它是安全的修改。与即basestring ,这是实习进行比较。如果你尝试修改字符串实习,意想不到的事情可能会发生。有没有保证字符串不会在某个时刻生活在写保护的内存,但它似乎今天的工作。这将是非常相似分配常量字符串到的char * 变量用C,然后修改它。在C语言中,这是不确定的行为


  • 您预先分配的 candidateString 足够的空间 - 所以你不溢出的缓冲区。


  • 字符数据的不存储在的偏移量字符串类。它存储在一个偏移量等于 RuntimeHelpers.OffsetToStringData

     公共静态INT OffsetToStringData 
    {
    //这个偏移量是由字符串索引内在烤,所以不存在损害
    //在得到它这里出炉的为好。
    [System.Runtime.Versioning.NonVersionable]
    获得{
    //从地址字节数由至
    //一个字符串的第一个引用指向16位字符的字符串。跳过
    //在方法表指针,&安培;字符串
    //长度。当然,String引用指向同步块后,内存
    //,所以不要指望这一点。
    //此属性允许C#的固定语句来处理字符串。
    //在64位平台上,这应该是12(8 + 4),并在32位8(4 + 4)。
    #如果WIN32
    返回8;
    的#else
    返回12;
    的#endif // WIN32
    }
    }



    除。 ..


  • GCHandle.AddrOfPinnedObject 特殊套管为两种类型: 字符串和数组类型。而不是返回对象本身的地址,它位于并返回偏移的数据。见CoreCLR的源代码



    <预类=郎-CPP prettyprint-覆盖> //获取由所提供的固定
    //句柄引用一个固定对象的地址。这一程序假设手柄固定和不检查。
    FCIMPL1(LPVOID,MarshalNative :: GCHandleInternalAddrOfPinnedObject,对象句柄句柄)
    {
    FCALL_CONTRACT;

    LPVOID磷;
    OBJECTREF坷Ref = ObjectFromHandle(句柄);

    如果(坷Ref == NULL)
    {
    P = NULL;
    }
    ,否则
    {
    //获取支持的固定类型的内部指针。
    如果(objRef-> GetMethodTable()== g_pStringClass)
    p值=((*(StringObject **)及坷Ref)) - >的GetBuffer();
    ,否则如果(objRef-> GetMethodTable() - > IsArray的())
    p值=(*((ArrayBase **)及坷Ref)) - > GetDataPtr();
    ,否则
    P = objRef->的GetData();
    }

    返回磷;
    }
    FCIMPLEND




综上所述,运行时让你与它的数据玩,不抱怨。您使用不安全代码毕竟。我见过更糟糕的运行搞乱比,包括堆栈创建引用类型; - )



不过,别忘了添加一个额外的 \0 之后的所有字符,如果你的最后一个字符串比什么分配较短(偏移长度)。这不会溢出,每串有在年底缓解互操作场景隐式空字符。






现在采取怎么看待的StringBuilder 创建一个字符串,这里的 StringBuilder.ToString

  [System.Security.SecuritySafeCritical] //自动生成的
公共重写字符串的ToString(){
Contract.Ensures(Contract.Result<串GT;()!= NULL);

VerifyClassInvariant();

如果(长度== 0)
返回的String.Empty;

串RET = string.FastAllocateString(长度);
StringBuilder的块=这一点;
不安全{
固定(字符* destinationPtr = RET)
{

{
如果(chunk.m_ChunkLength大于0)
{
//复制到这些本地变量,使他们稳定的,即使在竞争条件
炭的存在[] sourceArray = chunk.m_ChunkChars;
INT chunkOffset = chunk.m_ChunkOffset;
INT chunkLength = chunk.m_ChunkLength;

//检查,我们不会侵占我们的边界。
如果((UINT)(chunkLength + chunkOffset)LT = ret.Length和放大器;及(UINT)chunkLength< =(UINT)sourceArray.Length)
{
固定(CHAR * sourcePtr = sourceArray)
string.wstrcpy(destinationPtr + chunkOffset,sourcePtr,chunkLength);
}
,否则
{
抛出新ArgumentOutOfRangeException(chunkLength,Environment.GetResourceString(ArgumentOutOfRange_Index));
}
}
块= chunk.m_ChunkPrevious;
},而(块!= NULL);
}
}
返回RET;
}



是的,它使用不安全的代码,是的,你可以通过优化你的固定,因为这种类型的钉扎是的的比分配GC手柄更轻便:

 常量字符串即basestring =敏捷的棕色狐狸跳过懒惰的狗! 

//初始化一个新的字符串
串candidateString =新的字符串(\0',baseString.Length);

//基本字符串的内容复制到候选串
不安全
{
固定(字符* cCandidateString = candidateString)
{
的for(int i = 0; I< baseString.Length;我++)
cCandidateString [I] =即basestring [I]
}
}

当您使用固定时,GC只会发现需要被固定的物体时,它收集过程中绊倒了。如果没有收集正在进行中,GC甚至没有涉及。当您使用的GCHandle ,手柄在GC每次注册。


I know that strings are immutable and any changes to a string simply creates a new string in memory (and marks the old one as free). However, I'm wondering if my logic below is sound in that you actually can, in a round-a-bout fashion, modify the contents of a string.

const string baseString = "The quick brown fox jumps over the lazy dog!";

//initialize a new string
string candidateString = new string('\0', baseString.Length);

//Pin the string
GCHandle gcHandle = GCHandle.Alloc(candidateString, GCHandleType.Pinned);

//Copy the contents of the base string to the candidate string
unsafe
{
    char* cCandidateString = (char*) gcHandle.AddrOfPinnedObject();
    for (int i = 0; i < baseString.Length; i++)
    {
        cCandidateString[i] = baseString[i];
    }
}

Does this approach indeed change the contents candidateString (without creating a new candidateString in memory) or does the runtime see through my tricks and treat it as a normal string?

解决方案

Your example works just fine, thanks to several elements:

  • candidateString lives in the managed heap, so it's safe to modify. Compare this with baseString, which is interned. If you try to modify the interned string, unexpected things may happen. There's no guarantee that string won't live in write-protected memory at some point, although it seems to work today. That would be pretty similar to assigning a constant string to a char* variable in C and then modifying it. In C, that's undefined behavior.

  • You preallocate enough space in candidateString - so you're not overflowing the buffer.

  • Character data is not stored at offset 0 of the String class. It's stored at an offset equal to RuntimeHelpers.OffsetToStringData.

    public static int OffsetToStringData
    {
        // This offset is baked in by string indexer intrinsic, so there is no harm
        // in getting it baked in here as well.
        [System.Runtime.Versioning.NonVersionable] 
        get {
            // Number of bytes from the address pointed to by a reference to
            // a String to the first 16-bit character in the String.  Skip 
            // over the MethodTable pointer, & String 
            // length.  Of course, the String reference points to the memory 
            // after the sync block, so don't count that.  
            // This property allows C#'s fixed statement to work on Strings.
            // On 64 bit platforms, this should be 12 (8+4) and on 32 bit 8 (4+4).
    #if WIN32
            return 8;
    #else
            return 12;
    #endif // WIN32
        }
    }
    

    Except...

  • GCHandle.AddrOfPinnedObject is special cased for two types: string and array types. Instead of returning the address of the object itself, it lies and returns the offset to the data. See the source code in CoreCLR.

    // Get the address of a pinned object referenced by the supplied pinned
    // handle.  This routine assumes the handle is pinned and does not check.
    FCIMPL1(LPVOID, MarshalNative::GCHandleInternalAddrOfPinnedObject, OBJECTHANDLE handle)
    {
        FCALL_CONTRACT;
    
        LPVOID p;
        OBJECTREF objRef = ObjectFromHandle(handle);
    
        if (objRef == NULL)
        {
            p = NULL;
        }
        else
        {
            // Get the interior pointer for the supported pinned types.
            if (objRef->GetMethodTable() == g_pStringClass)
                p = ((*(StringObject **)&objRef))->GetBuffer();
            else if (objRef->GetMethodTable()->IsArray())
                p = (*((ArrayBase**)&objRef))->GetDataPtr();
            else
                p = objRef->GetData();
        }
    
        return p;
    }
    FCIMPLEND
    

In summary, the runtime lets you play with its data and doesn't complain. You're using unsafe code after all. I've seen worse runtime messing than that, including creating reference types on the stack ;-)

Just remember to add one additional \0 after all the characters (at offset Length) if your final string is shorter than what's allocated. This won't overflow, each string has an implicit null character at the end to ease interop scenarios.


Now take a look at how StringBuilder creates a string, here's StringBuilder.ToString:

[System.Security.SecuritySafeCritical]  // auto-generated
public override String ToString() {
    Contract.Ensures(Contract.Result<String>() != null);

    VerifyClassInvariant();

    if (Length == 0)
        return String.Empty;

    string ret = string.FastAllocateString(Length);
    StringBuilder chunk = this;
    unsafe {
        fixed (char* destinationPtr = ret)
        {
            do
            {
                if (chunk.m_ChunkLength > 0)
                {
                    // Copy these into local variables so that they are stable even in the presence of race conditions
                    char[] sourceArray = chunk.m_ChunkChars;
                    int chunkOffset = chunk.m_ChunkOffset;
                    int chunkLength = chunk.m_ChunkLength;

                    // Check that we will not overrun our boundaries. 
                    if ((uint)(chunkLength + chunkOffset) <= ret.Length && (uint)chunkLength <= (uint)sourceArray.Length)
                    {
                        fixed (char* sourcePtr = sourceArray)
                            string.wstrcpy(destinationPtr + chunkOffset, sourcePtr, chunkLength);
                    }
                    else
                    {
                        throw new ArgumentOutOfRangeException("chunkLength", Environment.GetResourceString("ArgumentOutOfRange_Index"));
                    }
                }
                chunk = chunk.m_ChunkPrevious;
            } while (chunk != null);
        }
    }
    return ret;
}

Yes, it uses unsafe code, and yes, you can optimize yours by using fixed, as this type of pinning is much more lightweight than allocating a GC handle:

const string baseString = "The quick brown fox jumps over the lazy dog!";

//initialize a new string
string candidateString = new string('\0', baseString.Length);

//Copy the contents of the base string to the candidate string
unsafe
{
    fixed (char* cCandidateString = candidateString)
    {
        for (int i = 0; i < baseString.Length; i++)
            cCandidateString[i] = baseString[i];
    }
}

When you use fixed, the GC only discovers an object needs to be pinned when it stumbles upon it during a collection. If there's no collection going on, the GC isn't even involved. When you use GCHandle, a handle is registered in the GC each time.

这篇关于您可以通过不安全的方法更改(immutable)的字符串的内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆