HashMap JDK9 源码阅读笔记

摘要:本文基于 JDK9 详细介绍 HashMap 的实现方法。

概述

  1. HashMap 是基于哈希表的 Map 接口的非同步实现。
  2. 此实现提供所有可选的映射操作,并允许使用 null 值和 null 键。
  3. 此类不保证映射的顺序,特别是它不保证该顺序恒久不变。
  4. 此实现假定哈希函数将元素适当地分布在各桶之间,可为基本操作(get 和 put)提供稳定的性能。
  5. 迭代 collecti on 视图所需的时间与 HashMap 实例的“容量”(桶的数量)及其大小(键-值映射关系数)成比例。
  6. :Hashmap 不是同步的,如果多个线程同时访问一个 HashMap,而其中至少一个线程从结构 上(指添加或者删除一个或多个映射关系的任何操作)修改了,则必须保持外部同步,以防止对映射进行意外的 非同步访问。

HashMap 已有的研究比较多,虽然基于的源码一般为 JDK7或8,但是也还是有参考价值的:

基于 JDK8:

Java 8系列之重新认识HashMap

基于 JDK 7:

HashMap 的实现原理

数据结构

基础数据结构

HashMap实际上是一个“链表散列”的数据结构,即数组和链表的结合体。

1
2
3
4
5
6
7
/**
* The table, initialized on first use, and resized as
* necessary. When allocated, length is always a power of two.
* (We also tolerate length zero in some operations to allow
* bootstrapping mechanics that are currently not needed.)
*/
transient Node<K,V>[] table;

其中Node<K,V>为一个内部静态类:

1
2
3
4
5
6
7
static class Node<K,V> implements Entry<K,V> {
final int hash;
final K key;
V value;
Node<K,V> next;
// other methods
}

扩展数据结构

从 JDK8 开始,一旦数组中某个链表长度达到一定的阈值(TREEIFY_THRESHOLD=8),这时候为了提高查找效率,该链表将会变为平衡树(balance tree),其数据结构同样也是一个内部静态类 TreeNode<K,V>

1
2
3
4
5
6
7
8
static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
TreeNode<K,V> parent; // red-black tree links
TreeNode<K,V> left;
TreeNode<K,V> right;
TreeNode<K,V> prev; // needed to unlink next upon deletion
boolean red;
//....
}

核心成员变量

常量

以下是内部使用的常量

1
2
3
4
5
6
7
8
9
10
11
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

static final int MAXIMUM_CAPACITY = 1 << 30;

static final float DEFAULT_LOAD_FACTOR = 0.75f;

static final int TREEIFY_THRESHOLD = 8;

static final int UNTREEIFY_THRESHOLD = 6;

static final int MIN_TREEIFY_CAPACITY = 64;

成员变量

主要成员变量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
/* ---------------- Fields -------------- */

// 核心存储
transient Node<K,V>[] table;

/**
* Holds cached entrySet(). Note that AbstractMap fields are used
* for keySet() and values().
*/
transient Set<Entry<K,V>> entrySet;

// 键值对总数
transient int size;

/**
* The number of times this HashMap has been structurally modified
* Structural modifications are those that change the number of mappings in
* the HashMap or otherwise modify its internal structure (e.g.,
* rehash). This field is used to make iterators on Collection-views of
* the HashMap fail-fast. (See ConcurrentModificationException).
*/
// fail-fast 机制
transient int modCount;

/**
* The next size value at which to resize (capacity * load factor).
*
* @serial
*/
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
// 决定是否 resize 的阈值
int threshold;

/**
* The load factor for the hash table.
*
* @serial
*/
//加载因子
final float loadFactor;

核心方法

构造函数

1
2
3
4
public HashMap(int initialCapacity, float loadFactor)
public HashMap(int initialCapacity)
public HashMap()
public HashMap(Map<? extends K, ? extends V> m)

也就是说实际上 HashMap 构造函数是可以自己指定初始容量加载因子的。

如果自己指定初始值的话需要符合如下要求:

  1. 初始值不能小于0
  2. 初始值不能超过最大容量:MAXIMUM_CAPACITY
  3. 如果指定了初始值,那么 resize 的阈值 threshold 需要重新计算,这个计算方法为tableSizeFor,实际上就是找到比初始值大并且最接近初始值的2的整数次幂。(比如如果初始值是13,那么 threshold 就为 16。如果初始值是39,那么 threshold 就为 64)。

如果自己指定加载因子

  1. 加载因子不能小于0
  2. 加载因子不能为 Null (Float.isNaN(loadFactor)

如果没有指定则:

  1. 加载因子为 DEFAULT_LOAD_FACTOR = 0.75
  2. 初始容量实际上为 16:DEFAULT_INITIAL_CAPACITY = 1 << 4
  3. threhold 实际上一开始是0,只有开始存放内容的时候才会开始设置。

tableSizeFor

这里我们对这个函数进行解析,同时强化一下 Java 的移位操作

该函数的主要用途:

Returns a power of two size for the given target capacity.

该函数的目的是将传入的整数转化为另一个整数,这个整数的特点是

  1. 是2的倍数 1,2,4,8,16,32…..
  2. 我们假设输出的整数是2的 x 次方,传入的整数为 n,那么 n <= 2 power x && n > 2 power (x-1)

通俗的讲,就是输出的数应该是最接近输入值,同时要大于或等于输入值,并且是2的整倍数。

我们通过该函数巩固一下 移位操作符

cap = 11, int 32bit

0000 0000 0000 0000 0000 0000 0000 1011
*
cap - 1 = 10
0000 0000 0000 0000 0000 0000 0000 1010 2+8=10
n >>> 1 ,右移一位,高位补零
0000 0000 0000 0000 0000 0000 0000 0101 4+1=5
n |= 或操作
0000 0000 0000 0000 0000 0000 0000 1111 15
n >>> 2 ,右移2位,高位补零
0000 0000 0000 0000 0000 0000 0000 0011 2+1=3
n |= 或操作
0000 0000 0000 0000 0000 0000 0000 1111 15
n >>> 4 ,右移2位,高位补零
0000 0000 0000 0000 0000 0000 0000 0000 0
n |= 或操作
0000 0000 0000 0000 0000 0000 0000 1111 15
n >>> 8 ,右移2位,高位补零
0000 0000 0000 0000 0000 0000 0000 0000 0
n |= 或操作
0000 0000 0000 0000 0000 0000 0000 1111 15
n >>> 16 ,右移2位,高位补零
0000 0000 0000 0000 0000 0000 0000 0000 0
n |= 或操作
0000 0000 0000 0000 0000 0000 0000 1111 15

cap = 35, int 32bit

0000 0000 0000 0000 0000 0000 0010 0011
cap - 1 = 34
0000 0000 0000 0000 0000 0000 0010 0010
n >>> 1 ,右移一位,高位补零
0000 0000 0000 0000 0000 0000 0001 0001 16+1=17
n |= 或操作
0000 0000 0000 0000 0000 0000 0011 0011 32+16+3=51
n >>> 2 ,右移2位,高位补零
0000 0000 0000 0000 0000 0000 0000 1100 8+4=12
n |= 或操作
0000 0000 0000 0000 0000 0000 0011 1111 63
之后 或操作就会一直是 63了

cap = 1073741825

1073741825 - 1 = 1073741824

1000000000000000000000000000000

1
0100000000000000000000000000000
|=
1100000000000000000000000000000
2
0011000000000000000000000000000
|=
1111000000000000000000000000000
4
0000111100000000000000000000000
|=
1111111100000000000000000000000
8
0000000011111111000000000000000
|=
1111111111111111000000000000000
16
0000000000000001111111111111111
|=
11111111111111111111111111111111 = (2 power 32) -1

put value

HashMap put 方法流程图

这张流程图十分详细,本文另辟蹊径,通过在源码上添加打印 log 来追踪变化。具体代码参考 Github, 同时有相应的 spock 测试来验证。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
HashMap: The default constructor is invoked: HashMap
HashMap: put key: test1 with value value1
HashMap: putVal: p is current check node in the tab. e is the existing target node
HashMap: putVal: put key: test1 (hash is 110250829) with value value1
HashMap: putVal: if table == null true or table.length == 0 0, tab need to resize()
HashMap: putVal: if table[13] is null and insert directly
HashMap: putVal: afterNodeInsertion callback invoked!! evict: true
HashMap: put key: test2 with value value2
HashMap: putVal: p is current check node in the tab. e is the existing target node
HashMap: putVal: put key: test2 (hash is 110250866) with value value2
HashMap: putVal: if table[2] is null and insert directly
HashMap: putVal: afterNodeInsertion callback invoked!! evict: true
HashMap: put key: test1 with value modified_value1
HashMap: putVal: p is current check node in the tab. e is the existing target node
HashMap: putVal: put key: test1 (hash is 110250829) with value modified_value1
HashMap: putVal: if key test1 has exist, replace the value(the real new assign in e null check block), then e point p
HashMap: putVal: e is not null which mean that find a existing node whose key is same with insert key
HashMap: putVal: onlyIfAbsent is true or oldValue is null, replace the value
HashMap: putVal: afterNodeAccess callback invoked!! p: test1=modified_value1
HashMap: put key: AaAaAaAaAa with value AaAaAaAaAa
HashMap: putVal: p is current check node in the tab. e is the existing target node
HashMap: putVal: put key: AaAaAaAaAa (hash is 341679389) with value AaAaAaAaAa
HashMap: putVal: if p is Linked Node, put value to the end of link
HashMap: putVal-insert-link: binCount:0, p.next is null? true
HashMap: putVal-insert-link: p.next is null and insert the node to the end, use e as the bridge to continue track link node
HashMap: putVal: afterNodeInsertion callback invoked!! evict: true
HashMap: put key: AaAaAaAaBB with value AaAaAaAaBB
HashMap: putVal: p is current check node in the tab. e is the existing target node
HashMap: putVal: put key: AaAaAaAaBB (hash is 341679389) with value AaAaAaAaBB
HashMap: putVal: if p is Linked Node, put value to the end of link
HashMap: putVal-insert-link: binCount:0, p.next is null? false
HashMap: putVal-insert-link: next node in link
HashMap: putVal-insert-link: binCount:1, p.next is null? true
HashMap: putVal-insert-link: p.next is null and insert the node to the end, use e as the bridge to continue track link node
HashMap: putVal: afterNodeInsertion callback invoked!! evict: true
HashMap: put key: AaAaAaBBAa with value AaAaAaBBAa
HashMap: putVal: p is current check node in the tab. e is the existing target node
HashMap: putVal: put key: AaAaAaBBAa (hash is 341679389) with value AaAaAaBBAa
HashMap: putVal: if p is Linked Node, put value to the end of link
HashMap: putVal-insert-link: binCount:0, p.next is null? false
HashMap: putVal-insert-link: next node in link
HashMap: putVal-insert-link: binCount:1, p.next is null? false
HashMap: putVal-insert-link: next node in link
HashMap: putVal-insert-link: binCount:2, p.next is null? true
HashMap: putVal-insert-link: p.next is null and insert the node to the end, use e as the bridge to continue track link node
HashMap: putVal: afterNodeInsertion callback invoked!! evict: true
HashMap: put key: AaAaAaBBBB with value AaAaAaBBBB
HashMap: putVal: p is current check node in the tab. e is the existing target node
HashMap: putVal: put key: AaAaAaBBBB (hash is 341679389) with value AaAaAaBBBB
HashMap: putVal: if p is Linked Node, put value to the end of link
HashMap: putVal-insert-link: binCount:0, p.next is null? false
HashMap: putVal-insert-link: next node in link
HashMap: putVal-insert-link: binCount:1, p.next is null? false
HashMap: putVal-insert-link: next node in link
HashMap: putVal-insert-link: binCount:2, p.next is null? false
HashMap: putVal-insert-link: next node in link
HashMap: putVal-insert-link: binCount:3, p.next is null? true
HashMap: putVal-insert-link: p.next is null and insert the node to the end, use e as the bridge to continue track link node
HashMap: putVal: afterNodeInsertion callback invoked!! evict: true

get value:确定哈希桶数组索引位置

这里我们可以参考上面提到博文,有较好的介绍,不过其源码不是最新的,我们这里基于 JDK9 再做一次介绍:

这里涉及的方法有三个

  1. static final int hash(Object key)
  2. final Node getNode(int hash, Object key)
  3. public V get(Object key)

get 方法比较简单,就是在从 getNode 的返回结果中取出 value。这里的核心是 getNode。首先我们来看一下源码

hash方法

1
2
3
4
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

该方法主要是在原有的 hashCode() 基础上向右移位 16,再和原来的 hashCode 进行异或运算。主要目的是让高16位参与运算。

我们看看官方注释的解释:

计算key.hashCode()并散布(XORs)高位的哈希值到低位。 由于该表使用“幂次幂掩码”,所以只在当前掩码之上的位上变化的散列总是会发生冲突。 (在已知的例子中,浮点数键在小表中保持连续的整数)。因此,我们应用一个变换,向下扩展高位的影响。这样做是 在速度,效用和比特传播质量之间进行权衡。 由于许多常见的散列集合已经合理地分布(所以不能从扩散中受益),并且因为我们使用树来处理箱子中的大量碰撞,所以我们只需以代价最小的方式对一些位移进行异或运算,以减少系统的损失, 以及合并由于表边界而不会用于索引计算的最高位的影响。

原文:

Computes key.hashCode() and spreads (XORs) higher bits of hash to lower. Because the table uses power-of-two masking, sets of hashes that vary only in bits above the current mask will always collide. (Among known examples are sets of Float keys holding consecutive whole numbers in small tables.) So we apply a transform that spreads the impact of higher bits downward. There is a tradeoff between speed, utility, and quality of bit-spreading. Because many common sets of hashes are already reasonably distributed (so don’t benefit from spreading), and because we use trees to handle large sets of collisions in bins, we just XOR some shifted bits in the cheapest possible way to reduce systematic lossage, as well as to incorporate impact of the highest bits that would otherwise never be used in index calculations because of table bounds.

通俗的将这样做是因为,我们后面会通过很 (length-1) 进行与运算来取代模运算,从而提高效率,但是如果 length 过小,这时候与运算的位数比较少,这样就只能和低位进行运算,效率较低。说白了就是我们通过这用方法能够通过给定 key 更精确地计算对应的数组下标。

getNode 方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
/**
* Implements Map.get and related methods
*
* @param hash hash for key
* @param key the key
* @return the node, or null if none
*/
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {
if (first.hash == hash && // always check first node
((k = first.key) == key || (key != null && key.equals(k))))
return first;
if ((e = first.next) != null) {
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}

该方法由于 JDK8+ 支持红黑平衡树的处理,所以其取 node 自然也更复杂,但是其核心还是通过与运算代替取模运算,这用代替是需要一定的前提的(取模的对象需要是2的幂次方),这个性能和原理,可以参考该文(英文),当然维基百科也有介绍。核心代码: (first = tab[(n - 1) & hash]) != null

get 方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/**
* Returns the value to which the specified key is mapped,
* or {@code null} if this map contains no mapping for the key.
*
* <p>More formally, if this map contains a mapping from a key
* {@code k} to a value {@code v} such that {@code (key==null ? k==null :
* key.equals(k))}, then this method returns {@code v}; otherwise
* it returns {@code null}. (There can be at most one such mapping.)
*
* <p>A return value of {@code null} does not <i>necessarily</i>
* indicate that the map contains no mapping for the key; it's also
* possible that the map explicitly maps the key to {@code null}.
* The {@link #containsKey containsKey} operation may be used to
* distinguish these two cases.
*
* @see #put(Object, Object)
*/
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}

Resize

性能

最佳使用实践