JDK1.4.2_05发布了

我一直就觉得jdk的nio有严重的bug,至少在w2k平台是这样的。我发现的那些莫名其妙的毛病好像是1.4.2_02的时候吧,一直到了05才更新了一些,不知道是不是还有其他隐藏更深的问题没有被挖掘出来。像我做的短信网关这些24x7的服务器项目,运行久了足以看出各种毛病。 8)

nio的更新,主要有两个地方,涉及到wakeup()和做服务器时候的OP_ACCEPT,具体解释如下:

Bug ID:  4850373  
Votes  95  
Synopsis  Blocking Selector stops Blocking occasionally  
Category  java:classes_nio  
Reported Against  1.4.2_02 , mantis-beta  
Release Fixed  tiger-beta2 , 1.4.2_05  
State  Closed, fixed  

FULL PRODUCT VERSION :
java version "1.4.2-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-beta-b19)
Java HotSpot(TM) Client VM (build 1.4.2-beta-b19, mixed mode)

FULL OS VERSION :
Microsoft Windows 2000 [Version 5.00.2195]

A DESCRIPTION OF THE PROBLEM :
Blocking Selector stops Blocking occasionally.

It would appear that when wakeup() is called at a certain point then the selector.select() method starts returning instantly with no selected keys from then on.

This problem is difficult to consistantly reproduce hence it is difficult to track down the exact cause of the problem.

It is a very serious problem for anyone using blocking io as the cpu usage jumps to 100%.

As it is difficult to produce a small test case I have done some research to help find the bug:

* once the problem is occuring the following debug information is available:
keys == {key with interestOps == 1 fd == 1788}
cancelledKeys == {}
sourceFD == 1552
sinkFD == 1600
interruptTriggered == false
timeout == -1
threads == {}
2 channels
exceptIDs == {0, 0, 0, ...}
readIDs == {1, 1552, 1788, 0, 0, 0, ...}
writeIDs == {0, 1788, 0, 0, 0, ...}

* the trace through the code of WindowsSelectorImpl doSelect(long aTimout)
1) calls processDeregisterQueue();
2) calls adjustThreadsCount();
3) calls finishLock.reset();
4) calls startLock.startThreads();
5) calls begin();
6) calls subSelector.poll(); (this doesn't block, I guess this is the problem)
7) calls end();
8) calls resetWakeupSocket();
9) calls finishLock.checkForException();
10) calls processDeregisterQueue();
11) calls updateSelectedKeys();
12) returns 0

Note the same problem also exists in JDK1.4.1_02 (I though maybe the synchronization changes would have fixed it, however they didn't)
This is a replacement bug for review ID: 183301 (review id 183301 is no longer important)

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Difficult to provide a test case due to the nature of the problem.
I have only been able to reproduce the problem when wakeup() has been called.
It is too difficult to establish at what point in select() the code is up to when wakeup() causes the problem.

EXPECTED VERSUS ACTUAL BEHAVIOR :
when no keys are ready to be selected select() to block again until keys become ready for selection or wakeup/interrupt is called.
select() returns no keys without blocking

REPRODUCIBILITY :
This bug can be reproduced occasionally.

Bug ID:  4881228  
Votes  13  
Synopsis  (so) Selector.select() fails on OP_ACCEPT when the network is unplugged (wxp)  
Category  java:classes_nio  
Reported Against  1.4.2 , 1.4.2_02 , 1.4.2_03 , mantis-beta  
Release Fixed  tiger-beta2 , 1.4.2_05  
State  Closed, fixed  

FULL PRODUCT VERSION :
java version "1.4.2-beta"
Java(TM) 2 Runtime Environment, standard Edition (build 1.4.2-beta-b19)
Java HotSpot(TM) Client VM (build 1.4.2-beta-b19, mixed mode)



FULL OS VERSION :
Microsoft Windows XP [Version 5.1.2600]

A DESCRIPTION OF THE PROBLEM :
When a selector is registered with a non-blocking server socket for operation OP_ACCEPT, when windows recognizes that the network cable is unplugged, the method int Selector.select() will not block and return 0.  The SelectedKey set will be empty.  This continues even when the network cable is plugged in again.

This is a problem because according to most examples, Selector.select() is safe to put in an infinite loop.  The abnormal behavior of continuously returning immediately when there are no connections to be accepted causes the loop accepting connections to spin, hogging the CPU.

It seems this is a windows-specific problem.  The Apple JVM for Mac OS X will not spin when the network is unplugged.  Instead, the method Selector.select() remains blocked, as it should acording to the spec.

I tested with apple vm version:
java version "1.4.1_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_01-39)
Java HotSpot(TM) Client VM (build 1.4.1_01-14, mixed mode)


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the example code, following directions.  When prompted, unplug the network cable and wait 30 seconds.  Shortly, the program will begin printing thousands of lines of output.  At this point, ctrl-c the program, as it will not exit.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The program should not print anything out while the network connection is unplugged.  The operation Selector.select() should only return when an xxxxx  operation specified in the documentation occurs.  When the network is unplugged, nothing is connecting to the selector, and thus, no output should be printed.
ACTUAL -
The program will start printing out hundreds lines of output beginning with "".  These are all printed out in the accept loop, which now does not block on the Selector.select() operation, but instead passes right through it, spinning in an infinite loop.

REPRODUCIBILITY :
This bug can be reproduced always.

一个是wakeup在select()某个时机的时候会导致cpu100%,另外一个是网卡或者其他原因,网络地址未就位时,OP_ACCEPT会死循环,即使后来网络正常了,也不能重置。