Java如何在RAM中存储原始类型? [英] How does Java store primitive types in RAM?

查看:80
本文介绍了Java如何在RAM中存储原始类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述




这不是关于基元是否进入堆栈或堆,而是关于它们在实际物理RAM中保存的位置。






举一个简单的例子:

  int a = 5; 

我知道5会被存储到内存块中。



我感兴趣的领域是变量'a'存储在哪里?



相关子-questions:在'a'与包含原始值5的内存块相关联的地方会发生什么?是否有另一个内存块来保存'a'?但这似乎是一个指向对象的指针,但它是这里涉及的原始类型。

解决方案

要阐述 Java原语是否在堆栈或堆上? -



假设你有一个函数 foo()

  void foo(){
int a = 5;
system.out.println(a);
}

然后当编译器编译该函数时,它将创建离开的字节码指令每当调用该函数时,堆栈上有4个字节的空间。名称'a'仅对您有用 - 对于编译器,它只是为它创建一个点,记住该点的位置,以及它想要使用'a'值的任何地方它改为插入对内存位置的引用保留给那个值。



如果你不确定堆栈是如何工作的,它的工作原理如下:每个程序至少有一个线程,每个线程都有一堆。堆栈是一个连续的内存块(如果需要也可以增长)。最初堆栈为空,直到调用程序中的第一个函数。然后,当你的函数被调用时,你的函数为堆栈本身,所有局部变量,返回类型等分配空间。



当你的函数 main 调用另一个函数 foo ,这是可能发生的一个例子(这里有几个简化的白色谎言):




  • main 想要将参数传递给 foo 。它将这些值推送到堆栈的顶部,以便 foo 确切地知道它们的放置位置( main foo 将以一致的方式传递参数。)

  • main foo 完成后,按下程序执行应该返回的地址。这会增加堆栈指针。

  • main 调用 foo

  • foo 启动时,它会看到堆栈当前位于地址X

  • foo 想在堆栈上分配3 int 变量,因此它需要12个字节。

  • foo 将使用X + 0表示第一个int,X + 4表示第二个int,X + 8表示第三个int。


    • 编译器可以在编译时计算它,编译器可以依赖堆栈指针寄存器的值(x86系统上的ESP),所以它写出的汇编代码执行地址ESP + 0中存储0,将地址存储到地址ESP + 4中等等。


  • 在调用 foo 之前推送到堆栈上的 main 的参数也可以通过<$ c $访问c> foo 通过计算堆栈指针的一些偏移量。


    • foo 知道它需要多少参数(比如3)所以它知道,比方说,X - 8是第一个,X - 12是第二个,X - 16是第三个。


  • 所以现在 foo 在堆栈上有空间来完成它的工作,它会这样做并完成

  • main 之前就叫 foo main 在递增堆栈指针之前在堆栈上写了它的返回地址。

  • foo 查找要返回的地址 - 比如地址存储在 ESP - 4 - foo 查看该地点堆栈,在那里找到返回地址,然后跳转到返回地址。

  • 现在 main 中的其余代码继续我们已经完成了全程往返。



请注意,每次调用函数时,它都可以执行任何操作当前堆栈指针指向的内存及其后的所有内容。每次函数在堆栈上为自己腾出空间时,它会在调用其他函数之前递增堆栈指针,以确保每个人都知道他们可以在哪里使用堆栈。



我知道这个解释稍微模糊了x86和java之间的界限,但我希望它有助于说明硬件的实际工作方式。



现在,这只涵盖'堆栈'。堆栈存在于程序中的每个线程,并捕获在该线程上运行的每个函数之间的函数调用链的状态。但是,程序可以有多个线程,因此每个线程都有自己独立的堆栈。



当两个函数调用想要处理同一块内存时会发生什么,无论它们在哪个线程或它们在堆栈中的位置?



这是堆进入的地方。通常(但不总是)一个程序只有一个堆。堆被称为堆,因为它只是一个很大的内存堆。



要在堆中使用内存,你必须调用分配例程 - 例程找到未使用的空间并将其提供给您,以及允许您返回已分配但不再使用的空间的例程。内存分配器从操作系统获取大页面内存,然后将各个小位分发给任何需要的内存。它记录了操作系统给它的内容,并从中了解了它给程序其余部分的内容。当程序请求堆内存时,它会查找满足需要的最小内存块,将该块标记为已分配,并将其交还给程序的其余部分。如果它没有任何更多的空闲块,它可以向操作系统询问更多页面的内存并分配到那里(直到某个限制)。



In像C这样的语言,我提到的那些内存分配例程通常被称为 malloc()来询问内存和 free()返回它。



另一方面,Java没有像C那样的显式内存管理,而是有一个垃圾收集器 - 你可以分配你想要的任何内存,以及那么当你完成后,你就停止使用它了。 Java运行时环境将跟踪您已分配的内存,并将扫描您的程序以确定您是否不再使用所有分配,并将自动解除分配这些块。



现在我们知道内存是在堆或堆栈上分配的,当我在类中创建私有变量时会发生什么?

  public class Test {
private int balance;
...
}

那个记忆来自哪里?答案是堆。你有一些代码创建一个新的测试对象 - 测试myTest = new Test()。调用java new 运算符会导致在堆上分配新的 Test 实例。您的变量 myTest 将地址存储到该分配中。 balance 然后只是该地址的一些偏移 - 实际上可能是0。



最底层的答案只是......会计。



...



我谈到的白色谎言?让我们解决其中的一些问题。




  • Java是第一个计算机模型 - 当你将程序编译为字节码时,你就是编译成一个完全虚构的计算机体系结构,没有像任何其他常见CPU(Java和.Net,以及其他一些CPU)那样的寄存器或汇编指令,使用基于堆栈的处理器虚拟机,而不是基于寄存器的机器(如x86处理器)。原因是基于堆栈的处理器更容易推理,因此更容易构建操作该代码的工具,这对于构建将代码编译为实际运行在公共处理器上的代码的工具尤其重要。


  • 给定线程的堆栈指针通常从某个非常高的地址开始,然后至少在大多数x86计算机上向下增长,而不是向上增长。也就是说,因为这是一个机器细节,实际上并不担心Java的问题(Java有自己的伪造机器模型需要担心,它的Just In Time编译器担心将其转换为实际的CPU)。 / p>


  • 我简要提到了如何在函数之间传递参数,说参数A存储在ESP - 8,参数B存储在ESP - 12等这通常被称为召唤大会,其中不止一些。在x86-32上,寄存器是稀疏的,因此很多调用约定都会传递堆栈上的所有参数。这有一些权衡,特别是访问这些参数可能意味着访问ram(尽管缓存可能会缓解这种情况)。 x86-64有更多命名寄存器,这意味着最常见的调用约定会传递寄存器中的前几个参数,这可能会提高速度。此外,由于Java JIT是唯一为整个过程生成机器代码的人(除了本机调用),它可以选择使用它想要的任何约定来传递参数。


  • 我提到过当你在某个函数中声明一个变量时,该变量的内存来自堆栈 - 这并不总是正确的,而且它真的取决于环境运行时的奇思妙想决定从哪里获取内存。在C#/ DotNet的情况下,如果变量用作闭包的一部分,该变量的内存可能来自堆 - 这称为堆升级。大多数语言通过创建隐藏类来处理闭包。所以经常发生的事情是,闭包中涉及的方法本地成员被重写为某些隐藏类的成员,并且当调用该方法时,而是在堆上分配该类的新实例并将其地址存储在堆栈中;现在所有对原始局部变量的引用都是通过该堆引用而发生的。




This is NOT about whether primitives go to the stack or heap, it's about where they get saved in the actual physical RAM.


Take a simple example:

int a = 5;

I know 5 gets stored into a memory block.

My area of interest is where does the variable 'a' get stored?

Related Sub-questions: Where does it happen where 'a' gets associated to the memory block that contains the primitive value of 5? Is there another memory block created to hold 'a'? But that will seem as though a is a pointer to an object, but it's a primitive type involved here.

解决方案

To expound on Do Java primitives go on the Stack or the Heap? -

Lets say you have a function foo():

void foo() {
   int a = 5;
   system.out.println(a);
}

Then when the compiler compiles that function, it'll create bytecode instructions that leave 4 bytes of room on the stack whenever that function is called. The name 'a' is only useful to you - to the compiler, it just creates a spot for it, remembers where that spot is, and everywhere where it wants to use the value of 'a' it instead inserts references to the memory location it reserved for that value.

If you're not sure how the stack works, it works like this: every program has at least one thread, and every thread has exactly one stack. The stack is a continuous block of memory (that can also grow if needed). Initially the stack is empty, until the first function in your program is called. Then, when your function is called, your function allocates room on the stack for itself, for all of its local variables, for its return types etc.

When your function main call another function foo, here's one example of what could happen (there are a couple simplifying white lies here):

  • main wants to pass parameters to foo. It pushes those values onto the top of the stack in such a way that foo will know exactly where they will be put (main and foo will pass parameters in a consistent way).
  • main pushes the address of where program execution should return to after foo is done. This increments the stack pointer.
  • main calls foo.
  • When foo starts, it sees that the stack is currently at address X
  • foo wants to allocate 3 int variables on the stack, so it needs 12 bytes.
  • foo will use X + 0 for the first int, X + 4 for the second int, X + 8 for the third.
    • The compiler can compute this at compile time, and the compiler can rely on the value of the stack pointer register (ESP on x86 system), and so the assembly code it writes out does stuff like "store 0 in the address ESP + 0", "store 1 into the address ESP + 4" etc.
  • The parameters that main pushed on the stack before calling foo can also be accessed by foo by computing some offset from the stack pointer.
    • foo knows how many parameters it takes (say 3) so it knows that, say, X - 8 is the first one, X - 12 is the second one, and X - 16 is the third one.
  • So now that foo has room on the stack to do its work, it does so and finishes
  • Right before main called foo, main wrote its return address on the stack before incrementing the stack pointer.
  • foo looks up the address to return to - say that address is stored at ESP - 4 - foo looks at that spot on the stack, finds the return address there, and jumps to the return address.
  • Now the rest of the code in main continues to run and we've made a full round trip.

Note that each time a function is called, it can do whatever it wants with the memory pointed to by the current stack pointer and everything after it. Each time a function makes room on the stack for itself, it increments the stack pointer before calling other functions to make sure that everybody knows where they can use the stack for themselves.

I know this explanation blurs the line between x86 and java a little bit, but I hope it helps to illustrate how the hardware actually works.

Now, this only covers 'the stack'. The stack exists for each thread in the program and captures the state of the chain of function calls between each function running on that thread. However, a program can have several threads, and so each thread has its own independent stack.

What happens when two function calls want to deal with the same piece of memory, regardless of what thread they're on or where they are in the stack?

This is where the heap comes in. Typically (but not always) one program has exactly one heap. The heap is called a heap because, well, it's just a big ol heap of memory.

To use memory in the heap, you have to call allocation routines - routines that find unused space and give it to you, and routines that let you return space you allocated but are no longer using. The memory allocator gets big pages of memory from the operating system, and then hands out individual little bits to whatever needs it. It keeps track of what the OS has given to it, and out of that, what it has given out to the rest of the program. When the program asks for heap memory, it looks for the smallest chunk of memory that it has available that fits the need, marks that chunk as being allocated, and hands it back to the rest of the program. If it doesn't have any more free chunks, it could ask the operating system for more pages of memory and allocate out of there (up until some limit).

In languages like C, those memory allocation routines I mentioned are usually called malloc() to ask for memory and free() to return it.

Java on the other hand doesn't have explicit memory management like C does, instead it has a garbage collector - you allocate whatever memory you want, and then when you're done, you just stop using it. The Java runtime environment will keep track of what memory you've allocated, and will scan your program to find out if you're not using all of your allocations any more and will automatically deallocate those chunks.

So now that we know that memory is allocated on the heap or the stack, what happens when I create a private variable in a class?

public class Test {
     private int balance;
     ...
}

Where does that memory come from? The answer is the heap. You have some code that creates a new Test object - Test myTest = new Test(). Calling the java new operator causes a new instance of Test to be allocated on the heap. Your variable myTest stores the address to that allocation. balance is then just some offset from that address - probably 0 actually.

The answer at the very bottom is all just .. accounting.

...

The white lies I spoke about? Let's address a few of those.

  • Java is first a computer model - when you compile your program to bytecode, you're compiling to a completely made-up computer architecture that doesn't have registers or assembly instructions like any other common CPU - Java, and .Net, and a few others, use a stack-based processor virtual machine, instead of a register-based machine (like x86 processors). The reason is that stack based processors are easier to reason about, and so its easier to build tools that manipulate that code, which is especially important to build tools that compile that code to machine code that will actually run on common processors.

  • The stack pointer for a given thread typically starts at some very high address and then grows down, instead of up, at least on most x86 computers. That said, since that's a machine detail, it's not actually Java's problem to worry about (Java has its own made-up machine model to worry about, its the Just In Time compiler's job to worry about translating that to your actual CPU).

  • I mentioned briefly how parameters are passed between functions, saying stuff like "parameter A is stored at ESP - 8, parameter B is stored at ESP - 12" etc. This generally called the "calling convention", and there are more than a few of them. On x86-32, registers are sparse, and so many calling conventions pass all parameters on the stack. This has some tradeoffs, particularly that accessing those parameters might mean a trip to ram (though cache might mitigate that). x86-64 has a lot more named registers, which means that the most common calling conventions pass the first few parameters in registers, which presumably improves speed. Additionally, since the Java JIT is the only guy that generates machine code for the entire process (excepting native calls), it can choose to pass parameters using any convention it wants.

  • I mentioned how when you declare a variable in some function, the memory for that variable comes from the stack - that's not always true, and it's really up to the whims of the environment's runtime to decide where to get that memory from. In C#/DotNet's case, the memory for that variable could come from the heap if the variable is used as part of a closure - this is called "heap promotion". Most languages deal with closures by creating hidden classes. So what often happens is that the method local members that are involved in closures are rewritten to be members of some hidden class, and when that method is invoked, instead allocate a new instance of that class on the heap and stores its address on the stack; and now all references to that originally-local variable occur instead through that heap reference.

这篇关于Java如何在RAM中存储原始类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆