java.io.IOException: An existing connection was forcibly closed by the remote host

Jenkins JIRA | James Noonan | 2 years ago
  1. 0

    The Jenkins Master reports that all its JNLP (and all our nodes are such) are offline. On the nodes, they report that they are connected. The only way out is to restart the Master. Of the 6 (or so) times this has occurred, 1/2 the time all the slaves need to have their slave process restarted to recover. We also see cases where after a restarting Jenkins, it recovers for a short time. Then the problem re-occurs. However, if it's running 10-minutes after a restart, we seem to be fine for 3-4 days. We were running on version 565 when this first occurred. We ran fine for 3-months. What changed for us is that we increased the number of nodes. We now have 93 nodes, up from about 50. There was also an increase in the number of jobs. We use the vSphere Cloud Plugin. However, we changed one slave to use ssh instead of jnlp. The problem was resolved for this slave, and it is not disconnected when the problem occurs. We did not find the same for a vsphere/jnlp slave where we removed the vsphere configuration. (Well, recreated the slave without vsphere). This seems to be similar to JENKINS-24155 JENKINS-24050 JENKINS-22714 JENKINS-22932 JENKINS-23384 We have examined the VM logs, the network logs and the firewalls. There is no obvious issue. I've attached the err.log of one of the incidents. Though it is clear that there is a problem with the slave connections, there is no clear 'cause'. I've attached a thread graph of the problem. (Different occurence). Normally, Jenkins runs at about 200 threads. _AFTER_ the problem occurs, thread growth occurs linearly until reboot. In the graph, we see there was a problem on Friday night, as activity died off. After a Sat service restart, we see the problem occur again 6-hours later, with corresponding thread growth. We suspect that if only the Jenkins Service is restarted, the time to next occurrence is lower, than if we restart the jenkins master host machine. Also, we configure some nodes to turn off when idle. However, though we originally suspected this to be a possible cause, we have not found any thing to further collaborate this theory. This graph was obtained using the Java Melody Plugin; We have disabled this but the problem has re-occurred. (https://wiki.jenkins-ci.org/display/JENKINS/Monitoring) I've attached a thread dump. Again, I cannot see anything amiss here myself, but this is not my area of expertise. The thread dump is not from the same incident at the attached log. I've attached the output from one of the jnlp slaves. There is a SEVERE error reported at the slave, though it seems to recover. This is not from the same time as the error log. In the error log, I believe that this is the first sign of the issue: Aug 07, 2014 7:30:20 PM org.jenkinsci.remoting.nio.NioChannelHub run WARNING: Communication problem java.io.IOException: An existing connection was forcibly closed by the remote host at sun.nio.ch.SocketDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(Unknown Source) Just prior to this, a remote slave successfully completes a job. I believe that the ci-25b-linux messages just before this messages is not related, as this slave was displaying problems in the time leading up to the crash.

    Jenkins JIRA | 2 years ago | James Noonan
    java.io.IOException: An existing connection was forcibly closed by the remote host
  2. 0

    The Jenkins Master reports that all its JNLP (and all our nodes are such) are offline. On the nodes, they report that they are connected. The only way out is to restart the Master. Of the 6 (or so) times this has occurred, 1/2 the time all the slaves need to have their slave process restarted to recover. We also see cases where after a restarting Jenkins, it recovers for a short time. Then the problem re-occurs. However, if it's running 10-minutes after a restart, we seem to be fine for 3-4 days. We were running on version 565 when this first occurred. We ran fine for 3-months. What changed for us is that we increased the number of nodes. We now have 93 nodes, up from about 50. There was also an increase in the number of jobs. We use the vSphere Cloud Plugin. However, we changed one slave to use ssh instead of jnlp. The problem was resolved for this slave, and it is not disconnected when the problem occurs. We did not find the same for a vsphere/jnlp slave where we removed the vsphere configuration. (Well, recreated the slave without vsphere). This seems to be similar to JENKINS-24155 JENKINS-24050 JENKINS-22714 JENKINS-22932 JENKINS-23384 We have examined the VM logs, the network logs and the firewalls. There is no obvious issue. I've attached the err.log of one of the incidents. Though it is clear that there is a problem with the slave connections, there is no clear 'cause'. I've attached a thread graph of the problem. (Different occurence). Normally, Jenkins runs at about 200 threads. _AFTER_ the problem occurs, thread growth occurs linearly until reboot. In the graph, we see there was a problem on Friday night, as activity died off. After a Sat service restart, we see the problem occur again 6-hours later, with corresponding thread growth. We suspect that if only the Jenkins Service is restarted, the time to next occurrence is lower, than if we restart the jenkins master host machine. Also, we configure some nodes to turn off when idle. However, though we originally suspected this to be a possible cause, we have not found any thing to further collaborate this theory. This graph was obtained using the Java Melody Plugin; We have disabled this but the problem has re-occurred. (https://wiki.jenkins-ci.org/display/JENKINS/Monitoring) I've attached a thread dump. Again, I cannot see anything amiss here myself, but this is not my area of expertise. The thread dump is not from the same incident at the attached log. I've attached the output from one of the jnlp slaves. There is a SEVERE error reported at the slave, though it seems to recover. This is not from the same time as the error log. In the error log, I believe that this is the first sign of the issue: Aug 07, 2014 7:30:20 PM org.jenkinsci.remoting.nio.NioChannelHub run WARNING: Communication problem java.io.IOException: An existing connection was forcibly closed by the remote host at sun.nio.ch.SocketDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(Unknown Source) Just prior to this, a remote slave successfully completes a job. I believe that the ci-25b-linux messages just before this messages is not related, as this slave was displaying problems in the time leading up to the crash.

    Jenkins JIRA | 2 years ago | James Noonan
    java.io.IOException: An existing connection was forcibly closed by the remote host
  3. 0

    Supported .Net and Elasticsearch versions?

    GitHub | 2 years ago | prabhu
    java.io.IOException: An existing connection was forcibly closed by the remote host
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    How to configure Apache QPID to work with RabbitMQ?

    Stack Overflow | 7 months ago | Altair
    javax.jms.JMSException: An existing connection was forcibly closed by the remote host
  6. 0

    How to catch the IOException "Connection reset by peer"?

    Stack Overflow | 5 years ago | chrisapotek
    java.io.IOException: Connection reset by peer

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. java.io.IOException

      An existing connection was forcibly closed by the remote host

      at sun.nio.ch.SocketDispatcher.read0()
    2. Java RT
      SocketDispatcher.read
      1. sun.nio.ch.SocketDispatcher.read0(Native Method)
      2. sun.nio.ch.SocketDispatcher.read(Unknown Source)
      2 frames