我一直都有一个疑问,丰巢业务服务的生产环境jvm参数设置是禁止system.gc的,也就是开启设置:-XX:+DisableExplicitGC,但是生产环境却从来没有出现过堆外内存溢出的情况。说明一下,丰巢使用了阿里开源的dubbo,而dubbo底层通信默认情况下使用了3.2.5.Final版本的netty,而我们对于netty的常规认知里,netty一定是使用了堆外内存,并且堆外内存在禁止了system.gc这个函数调用的话,在服务没有主动回收分配的堆外内存的情况下,一定会出现堆外内存的泄露。带着这个问题,刚好前天晚上有些时间,研究了一下3.2.5版本的netty源码,又是在科兴科兴园等馒头妈妈时候,发现了秘密之所在,我只能说,科兴科学园真是我的宝地啊。
涉及到的netty类:NioWorker、HeapChannelBufferFactory、BigEndianHeapChannelBuffer、SocketReceiveBufferPool
核心的秘密在SocketReceiveBufferPool中
1 final class SocketReceiveBufferPool {
2
3 private static final int POOL_SIZE = 8;
4
5 @SuppressWarnings("unchecked")
6 private final SoftReference<ByteBuffer>[] pool = new SoftReference[POOL_SIZE];
7
8 SocketReceiveBufferPool() {
9 super();
10 }
11
12 final ByteBuffer acquire(int size) {
13 final SoftReference<ByteBuffer>[] pool = this.pool;
14 for (int i = 0; i < POOL_SIZE; i ++) {
15 SoftReference<ByteBuffer> ref = pool[i];
16 if (ref == null) {
17 continue;
18 }
19
20 ByteBuffer buf = ref.get();
21 if (buf == null) {
22 pool[i] = null;
23 continue;
24 }
25
26 if (buf.capacity() < size) {
27 continue;
28 }
29
30 pool[i] = null;
31
32 buf.clear();
33 return buf;
34 }
35
36 ByteBuffer buf = ByteBuffer.allocateDirect(normalizeCapacity(size));
37 buf.clear();
38 return buf;
39 }
40
41 final void release(ByteBuffer buffer) {
42 final SoftReference<ByteBuffer>[] pool = this.pool;
43 for (int i = 0; i < POOL_SIZE; i ++) {
44 SoftReference<ByteBuffer> ref = pool[i];
45 if (ref == null || ref.get() == null) {
46 pool[i] = new SoftReference<ByteBuffer>(buffer);
47 return;
48 }
49 }
50
51 // pool is full - replace one
52 final int capacity = buffer.capacity();
53 for (int i = 0; i< POOL_SIZE; i ++) {
54 SoftReference<ByteBuffer> ref = pool[i];
55 ByteBuffer pooled = ref.get();
56 if (pooled == null) {
57 pool[i] = null;
58 continue;
59 }
60
61 if (pooled.capacity() < capacity) {
62 pool[i] = new SoftReference<ByteBuffer>(buffer);
63 return;
64 }
65 }
66 }
67
68 private static final int normalizeCapacity(int capacity) {
69 // Normalize to multiple of 1024
70 int q = capacity >>> 10;
71 int r = capacity & 1023;
72 if (r != 0) {
73 q ++;
74 }
75 return q << 10;
76 }
77 }
SocketReceiveBufferPool中维护了一个SoftReference<ByteBuffer>类型的数组,关于java的SoftReference,大家可以自行搜索。其实就是在此类中维护了一个directbuffer的内存池,此部分的内存是可以重复利用的。那么问题来了,如果我们把netty用于接收网络信息的directbuffer直接传给dubbo的业务代码,那么这个内存池的作用是什么呢,内存如何被release回内存池?带着这个疑问,继续分析调用了SocketReceiveBufferPool的NioWorker代码。
1 private boolean read(SelectionKey k) {
2 final SocketChannel ch = (SocketChannel) k.channel();
3 final NioSocketChannel channel = (NioSocketChannel) k.attachment();
4
5 final ReceiveBufferSizePredictor predictor =
6 channel.getConfig().getReceiveBufferSizePredictor();
7 final int predictedRecvBufSize = predictor.nextReceiveBufferSize();
8
9 int ret = 0;
10 int readBytes = 0;
11 boolean failure = true;
12
13 ByteBuffer bb = recvBufferPool.acquire(predictedRecvBufSize);
14 15 try {
16 while ((ret = ch.read(bb)) > 0) {
17 readBytes += ret;
18 if (!bb.hasRemaining()) {
19 break;
20 }
21 }
22 failure = false;
23 } catch (ClosedChannelException e) {
24 // Can happen, and does not need a user attention.
25 } catch (Throwable t) {
26 fireExceptionCaught(channel, t);
27 }
28
29 if (readBytes > 0) {
30 bb.flip();
31
32 final ChannelBufferFactory bufferFactory =
33 channel.getConfig().getBufferFactory();
34 final ChannelBuffer buffer = bufferFactory.getBuffer(readBytes);
35 buffer.setBytes(0, bb);
36 buffer.writerIndex(readBytes);
37 //if(buffer instanceof BigEndianHeapChannelBuffer){
38 // logger2.info("buffer instanceof BigEndianHeapChannelBuffer.");
39 //}
40 recvBufferPool.release(bb);
41
42 // Update the predi||\\|||||
43 predictor.previousReceiveBufferSize(readBytes);
44
45 // Fire the event.
46 fireMessageReceived(channel, buffer);
47 } else {
48 recvBufferPool.release(bb);
49 }
50
51 if (ret < 0 || failure) {
52 k.cancel(); // Some JDK implementations run into an infinite loop without this.
53 close(channel, succeededFuture(channel));
54 return false;
55 }
56
57 return true;
58 }
在代码里发现了netty会再创造一个chanelbuffer对象,然后将directbuffer里的内容复制到chanelbuffer里面,而这个chanelbuffer对象实际上是一个堆内内存,然后netty会真对这块内存进行解码及返回给上层调用服务等,也就是说没有直接将directbuffer返回给dubbo服务,这样也就解释了,我们在提供dubbo服务的jvm里,禁止掉了system.gc的情况下,没有发生过堆外内存泄漏的原因。后面我会找时间详细的分析一下netty4和kafka使用directbuffer的情况。