Hadoop的Java足够了 [英] Just enough Java for Hadoop

查看:165
本文介绍了Hadoop的Java足够了的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经是C ++开发人员约10年了。我需要为Hadoop选择Java。我怀疑我会在Java中做其他任何事情。所以,我想列举一些我需要拿起的东西。当然,我需要学习核心语言,但还有什么?

我为此做了Google,这可能会被看作是我想学习Java,告诉我怎么做?的可能副本。但事实并非如此。 Java是一种拥有大量库的大型编程语言,我需要了解的内容主要取决于我使用的Hadoop。但我认为可以这样说,不用费心学习。这也会非常有用。

解决方案

在我的日常工作中,我花了一些时间帮助C ++人拿起足够的Java来使用一些Java图书馆通过JNI(Java本地接口),然后共享内存到他们主要的C ++应用程序。以下是我注意到的一些关键内容:


  1. 除了没有IDE的玩具项目之外,您无法管理任何东西。你应该做的第一件事是下载一个流行的Java IDE(Eclipse是一个很好的选择,但也有替代品,包括Netbeans和IntelliJ)。不要试图用vi / emacs和javac / make来管理。 你将生活在一个山洞里,而不是意识到它。一旦你加快了基本IDE功能的速度,你将比没有IDE的情况下增加几十倍。

  2. 学习如何布局一个简单的项目结构和包。在Eclipse站点或其他地方会有简单的演示。不要将任何东西放入默认包中。

  3. Java有一个类型系统,参考类型和原始类型因历史/性能原因相对独立。

  4. Java的泛型与C ++模板一样 not 。阅读type erasure。

  5. 您可能想了解Java的GC如何工作。只需谷歌标记和扫描 - 首先,你可以解决最天真的心智模式,然后了解现代生产GC如何做到这一点。

  6. 应该毫不拖延地学习Collections API。 Map / HashMap,List / ArrayList& LinkedList和Set应该足够了。

  7. 了解现代Java并发性。与java.util.concurrent中的一些很酷的东西相比,Thread是汇编语言级别的原语。学习ConcurrentHashMap,Atomic *,Lock,Condition,CountDownLatch,BlockingQueue和Executors中的线程池。这里的好书是由Brian Goetz和Doug Lea撰写的。

  8. 只要您想使用第三方库,您就需要了解类路径如何工作。这不是火箭科学,但它有点冗长。

如果您是低级别的C ++人,那么您可能会发现其中一些有趣的内容也包括:


  1. 默认情况下,Java具有虚拟分派功能。 Java方法中的关键字static用于指示类方法。私有Java方法使用invokespecial dispatch,这是一个派生到正在使用的确切类型。
  2. 至少在Oracle VM上,对象包含两个标题的机器字(标记字和类字)。标志字是VM使用的一堆标志 - 特别是线程同步。您可以将类字看作指向VM对Class对象表示的指针(这是vtables for methods的存在位置)。以下类字是对象实例的成员字段。

  3. Java .class文件是一种中间语言,与x86对象代码不同。尤其是对于.class文件(包括随JVM附带的javap反汇编程序),还有更多有用的工具。
  4. 符号表的Java等价物称为常量池。它是键入的,它有很多信息 - 可以说比x86对象代码等价。

  5. Java虚拟方法调度包括查找在常量池中调用的正确方法然后将其转换为一个vtable的偏移量。然后遍历类层次结构,直到在该vtable偏移量处找到非空值。
  6. Java从解释开始,然后进行编译(无论如何Oracle和其他一些VM)。切换到编译模式是根据需要逐个方法完成的。当进行基准测试和性能调优时,您需要确保在开始之前已经预热了系统,并且您通常应该在方法级别进行配置以开始。所做的优化可以是非常积极/乐观的(如果违反了这些假设,则进行检查和回退) - 所以性能调整是一项艺术。

希望有一些有用的东西可以继续使用 - 请评论/询问后续问题。


I have been a C++ developer for about 10 years. I need to pick up Java just for Hadoop. I doubt I will be doing any thing else in Java. So, I would like a list of things I would need to pick up. Of course, I would need to learn the core language, but what else?

I did Google around for this and this could be seen as a possible duplicate of "I want to learn Java. Show me how?" but it's not. Java is a huge programming language with lots, of libraries and what I need to learn will depend largely on what I am using Hadoop for. But I suppose it is possible to say something like don't bother learning this. This will be quite useful too.

解决方案

In my day job, I've just spent some time helping a C++ person to pick up enough Java to use some Java libraries via JNI (Java Native Interface) and then shared memory into their primarily C++ application. Here are some of the key things I noticed:

  1. You cannot manage for anything beyond a toy project without an IDE. The very first thing you should do is download a popular Java IDE (Eclipse is a fine choice, but there are also alternatives including Netbeans and IntelliJ). Do not be tempted to try and manage with vi / emacs and javac / make. You will be living in a cave and not realising it. Once you're up to speed with even basic IDE functions you will be literally dozens of times more poductive than without an IDE.
  2. Learn how to layout a simple project structure and packages. There will be simple walkthroughs of how to do this on the Eclipse site or elsewhere. Never put anything into the default package.
  3. Java has a type system whereby the reference and primitive types are relatively separate for historic / performance reasons.
  4. Java's generics are not the same as C++ templates. Read up on "type erasure".
  5. You may wish to understand how Java's GC works. Just google "mark and sweep" - at first, you can just settle for the naivest mental model and then learn the details of how a modern production GC would do it later.
  6. The core of the Collections API should be learned without delay. Map / HashMap, List / ArrayList & LinkedList and Set should be enough to get going.
  7. Learn modern Java concurrency. Thread is an assembly-language level primitive compared to some of the cool stuff in java.util.concurrent. Learn ConcurrentHashMap, Atomic*, Lock, Condition, CountDownLatch, BlockingQueue and the threadpools from Executors. Good books here are those by Brian Goetz and Doug Lea.
  8. As soon as you want to use 3rd party libraries, you'll need to learn how the classpath works. It's not rocket science, but it is a bit verbose.

If you're a low-level C++ guy, then you may find some of this interesting also:

  1. Java has virtual dispatch by default. The keyword static on a Java method is used to indicate a class method. private Java methods use invokespecial dispatch, which is a dispatch onto the exact type in use.
  2. On an Oracle VM at least, objects comprise two machine words of header (the mark word and the class word). The mark word is a bunch of flags the VM uses - notably for thread synchronization. The class word you can think of as a pointer to the VM's representation of the Class object (which is where the vtables for methods live). Following the class word are the member fields of the instance of the object.
  3. Java .class files are an intermediate language, and not really that similar to x86 object code. In particular there are lots more useful tools for .class files (including the javap disassembler which ships with the JVM)
  4. The Java equivalent of the symbol table is called the Constant Pool. It's typed and it has a lot of information in it - arguably more than the x86 object code equivalent.
  5. Java virtual method dispatch consists of looking up the correct method to be called in the Constant Pool and then converting that to an offset into a vtable. Then walking up the class hierarchy until a not-null value is found at that vtable offset.
  6. Java starts off interpreted and then goes compiled (for Oracle and some other VMs anyway). The switch to compiled mode is done method-by-method on a as-need basis. When benchmarking and perf tuning you need to make sure that you've warmed the system up before you start, and that you should typically profile at the method level to start with. The optimizations that are made can be quite aggressive / optimistic (with a check and a fallback if the assumptions are violated) - so perf tuning is a bit of an art.

Hopefully there's some useful stuff in there to be going on with - please comment / ask followup questions.

这篇关于Hadoop的Java足够了的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆