java.lang.Throwable

Terracotta Project Issue Tracker | Allen Qin | 4 years ago
  1. 0

    I have 2 Tomcat session servers running in active/standby mode. There are 14 app servers running Tomcat 6.0 and connecting to the session servers. Problem: The Tomcat server will crash sometimes and unable to recover. This has happened on prod a few times and I have been able to reproduce it on our test env. It seems load is a factor contributing to the failure but it's not the root cause. Sometimes the problem happens when the load is normal. I believe this has something to do with the Terracotta setup when I checked the TC server and client log file. Below are some of the errors I found after a load test. The problem started to happen after 13:40 in the log file. I've attached the following info: 1. TC server log 2. TC client log 3. Tomcat thread dump 4. TC config Please let me know if you need anything else. TC Client: {noformat} 2012-10-01 13:41:02,563 [http-8080-275] INFO com.tc.object.tx.RemoteTransactionManagerImpl - ClientID[17]: TransactionID=[879335] : Took more than 1000ms to add to sequencer : 1004 ms 2012-10-01 13:41:02,564 [http-8080-728] INFO com.tc.object.tx.RemoteTransactionManagerImpl - ClientID[17]: TransactionID=[879338] : Took more than 1000ms to add to sequencer : 1012 ms 2012-10-01 13:41:11,397 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 1 2012-10-01 13:41:15,388 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 2 2012-10-01 13:41:19,491 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 3 2012-10-01 13:41:23,472 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 4 2012-10-01 13:41:27,412 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 5 2012-10-01 13:41:31,420 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 6 2012-10-01 13:41:34,210 [ClientLockManager LockGC] INFO com.tc.object.locks.ClientLockManager - ClientID[17]: Lock GC collected 254 garbage locks 2012-10-01 13:41:41,425 [HealthChecker] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - Socket Connect to lt-mobses01.sb.com.au:9510(callbackport:9510) taking long time. probably not reachable. 2012-10-01 13:41:41,425 [HealthChecker] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 is DEAD 2012-10-01 13:41:41,432 [HealthChecker] ERROR com.tc.net.protocol.transport.ConnectionHealthCheckerImpl: DSO Client - Declared connection dead ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[] idle time 38812ms 2012-10-01 13:41:41,560 [L1_L2:TCComm Main Selector Thread_W (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: CLOSE EVENT : com.tc.net.core.TCConnectionImpl@430312049: connected: false, closed: true local=10.32.4.81:12735 remote=10.32.4.95:9510 connect=[Mon Oct 01 12:19:32 EST 2012] idle=7149ms [627982104 read, 908628612 write]. STATUS : DISCONNECTED 2012-10-01 13:41:41,564 [L1_L2:TCComm Main Selector Thread_W (listen 0.0.0.0:64336)] INFO com.tc.net.protocol.tcm.ClientMessageChannel - ClientMessageChannel moves to SessionID=[1] 2012-10-01 13:41:41,564 [WorkerThread(client_coordination_stage, 0)] INFO com.tc.object.handshakemanager.ClientHandshakeManagerImpl - ClientID[17]: Disconnected: Pausing from State[ RUNNING ] RemoteNode : GroupID[0]. Disconnect count: 0 2012-10-01 13:41:41,564 [WorkerThread(client_coordination_stage, 0)] INFO com.tc.object.handshakemanager.ClientHandshakeManagerImpl - ClientID[17]: ClientHandshakeManager moves to SessionID=[1] 2012-10-01 13:41:41,679 [ConnectionEstablisher] INFO com.tc.net.protocol.transport.ClientConnectionEstablisher$AsyncReconnect - Handling connection request: ConnectionRequest [type=RECONNECT, cmt=com.tc.net.protocol.transport.ClientMessageTransport@32f0c513] 2012-10-01 13:41:41,687 [ConnectionEstablisher] WARN com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: Reconnect attempt 0 of unlimited reconnect tries to lt-mobses02.sb.com.au:9510:Tc-Group-0, timeout=10000 2012-10-01 13:41:41,695 [ConnectionEstablisher] WARN com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: Connection refused [lossy interval: 10000ms] 2012-10-01 13:41:45,698 [ConnectionEstablisher] INFO com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: Attaching new connection: com.tc.net.core.TCConnectionImpl@1318131925: connected: true, closed: false local=10.32.4.81:39188 remote=10.32.4.95:9510 connect=[Mon Oct 01 13:41:45 EST 2012] idle=3001ms [0 read, 0 write] 2012-10-01 13:41:45,775 [ConnectionEstablisher] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - Health monitoring agent started for lt-mobses01.sb.com.au:9510 2012-10-01 13:41:45,790 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - HealthCheckCallbackPort verification PASSED for lt-mobses01.sb.com.au:9510(callbackport: 9510) 2012-10-01 13:41:45,821 [WorkerThread(client_coordination_stage, 0)] INFO com.tc.object.handshakemanager.ClientHandshakeManagerImpl - ClientID[17]: Connected: Unpausing from State[ PAUSED ] RemoteNode : GroupID[0]. Disconnect count : 1 2012-10-01 13:41:45,847 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: CLOSE EVENT : com.tc.net.core.TCConnectionImpl@1318131925: connected: false, closed: true local=10.32.4.81:39188 remote=10.32.4.95:9510 connect=[Mon Oct 01 13:41:45 EST 2012] idle=23ms [216 read, 352 write]. STATUS : ESTABLISHED 2012-10-01 13:41:45,848 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl: DSO Client - Connection to [lt-mobses01.sb.com.au:9510] DISCONNECTED. Health Monitoring for this node is now disabled. 2012-10-01 13:41:45,859 [ConnectionEstablisher] INFO com.tc.net.protocol.transport.ClientConnectionEstablisher$AsyncReconnect - Handling connection request: RestoreConnectionRequest [type=RESTORE_CONNECTION, clientMessageTransport=com.tc.net.protocol.transport.ClientMessageTransport@32f0c513, callback=com.tc.net.protocol.delivery.OOOConnectionWatcher@506bda8, timeoutMillis=5000, sa=10.32.4.95:9510] 2012-10-01 13:41:45,870 [ConnectionEstablisher] INFO com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: Attaching new connection: com.tc.net.core.TCConnectionImpl@2106417963: connected: true, closed: false local=10.32.4.81:39190 remote=10.32.4.95:9510 connect=[Mon Oct 01 13:41:45 EST 2012] idle=11ms [0 read, 0 write] 2012-10-01 13:41:45,960 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] ERROR com.tc.net.protocol.transport.TransportHandshakeErrorHandlerForL1 - com.tc.net.protocol.transport.TransportHandshakeErrorContext: com.tc.net.protocol.transport.TransportHandshakeErrorContext: "Client Cannot Reconnect. ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[] not found. Connection attempts from the Terracotta node at 10.32.4.81:39190 are being rejected by the Terracotta server array."Message Class: com.tc.net.protocol.transport.TransportMessageImpl Sealed: true, Header Length: 32, Data Length: 426, Total Length: 458 Header (com.tc.net.protocol.transport.WireProtocolHeader) Version: 2, Header Length: 8, TOS: 0, TTL: 64, Protocol: TRANSPORT HANDSHAKE Total Packet Length: 458 Adler32 Checksum: 1988298213 (valid: true) Source Addresss: 10.32.4.95 Destination Addresss: 10.32.4.81 Source Port: 9510, Destination Port: 39190 Total Msg Count: 1 Header Validity: true (no message) Payload: type: SYN_ACK, connectionId: ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[], errorContext com.tc.net.protocol.transport.TransportHandshakeErrorContext: "Client Cannot Reconnect. ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[] not found. Connection attempts from the Terracotta node at 10.32.4.81:39190 are being rejected by the Terracotta server array." {noformat} TC Server log {noformat} 2012-10-01 13:52:31,852 [L2_L2:TCWorkerComm # 10_R] WARN com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(7.3efdc8a7efb44662af7065f819369af7.e1c35822-4c48-4c11-831a-f8c4a58ec3e5-13a1a1ec057)[]: closing down connection - com.tc.net.core.TCConnectionImpl@845175459: connected: false, closed: true local=10.32.4.95:9530 remote=10.32.4.96:7112 connect=[Mon Oct 01 13:52:31 EST 2012] idle=448ms [136 read, 0 write] 2012-10-01 13:52:31,755 [L2_L2:TCWorkerComm # 6_W] WARN com.tc.net.core.CoreNIOServices - Exception trying to process interest request: java.nio.channels.ClosedChannelException 2012-10-01 13:52:31,880 [L2_L2:TCWorkerComm # 12_R] WARN com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(8.3efdc8a7efb44662af7065f819369af7.e1c35822-4c48-4c11-831a-f8c4a58ec3e5-13a1a1ec057)[]: CLOSE EVENT : com.tc.net.core.TCConnectionImpl@1651010777: connected: false, closed: true local=10.32.4.95:9530 remote=10.32.4.96:7113 connect=[Mon Oct 01 13:52:31 EST 2012] idle=492ms [136 read, 135 write]. STATUS : START 2012-10-01 13:52:31,880 [WorkerThread(group_handshake_message_stage, 0)] WARN tc.operator.event - NODE : lt-mobses01.sb.com.au Subsystem: CLUSTER_TOPOLOGY Message: Node lt-mobses02.sb.com.au left the cluster 2012-10-01 13:52:31,882 [WorkerThread(group_handshake_message_stage, 0)] WARN com.tc.net.groups.TCGroupManagerImpl - Remove dead member from waitFor response list, dead member: NodeID[lt-mobses02.sb.com.au:9510] 2012-10-01 13:52:31,933 [L2_L2:TCWorkerComm # 12_R] WARN com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(8.3efdc8a7efb44662af7065f819369af7.e1c35822-4c48-4c11-831a-f8c4a58ec3e5-13a1a1ec057)[]: closing down connection - com.tc.net.core.TCConnectionImpl@1651010777: connected: false, closed: true local=10.32.4.95:9530 remote=10.32.4.96:7113 connect=[Mon Oct 01 13:52:31 EST 2012] idle=492ms [136 read, 135 write] 2012-10-01 13:52:31,970 [WorkerThread(garbage_collect_stage, 0)] INFO tc.operator.event - NODE : lt-mobses01.sb.com.au Subsystem: DGC Message: DGC[ 24 ] started. 2012-10-01 13:52:31,936 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ConnectionPolicyImpl - Allocated connection license for jvm f78856f5-791e-400e-8757-1393a2f133d1-13a1a1eea1f; ConnectionPolicy[maxConnections=2147483647, connectedJvmCount=1] 2012-10-01 13:52:31,935 [WorkerThread(group_events_dispatch_stage, 0)] WARN com.tc.l2.ha.L2HACoordinator - NodeID[lt-mobses02.sb.com.au:9510] left the cluster 2012-10-01 13:52:31,935 [WorkerThread(group_handshake_message_stage, 0)] WARN com.tc.net.groups.TCGroupManagerImpl - Removed old member Group Member: NodeID[lt-mobses01.sb.com.au:9510] <-> NodeID[lt-mobses02.sb.com.au:9510] ChannelID[NULL_ID, Status:CLOSED]:10.32.4.95:9530 <--> 10.32.4.96:6895; Ready:false; Joined: false; memberAdding:false; HighPri: true for NodeID[lt-mobses02.sb.com.au:9510] 2012-10-01 13:52:31,973 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ConnectionPolicyImpl - New connection [11] from jvm f78856f5-791e-400e-8757-1393a2f133d1-13a1a1eea1f 2012-10-01 13:52:31,990 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ServerStackProvider - User null successfully authenticated 2012-10-01 13:52:31,991 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.CoreNIOServices - selection key cancelled key@735607632 2012-10-01 13:52:31,993 [WorkerThread(group_handshake_message_stage, 0)] INFO tc.operator.event - NODE : lt-mobses01.sb.com.au Subsystem: CLUSTER_TOPOLOGY Message: Node lt-mobses02.sb.com.au joined the cluster 2012-10-01 13:52:32,060 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(2.c8dd256f25c04e5eabe8bfff0c9ffbea.88dd022a-04b0-4e0d-b9ec-c2c785ced97c-13a1a1eea1e)[]: Attaching new connection: com.tc.net.core.TCConnectionImpl@1712347668: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.124:22752 connect=[Mon Oct 01 13:52:31 EST 2012] idle=68ms [0 read, 0 write] 2012-10-01 13:52:31,973 [WorkerThread(group_events_dispatch_stage, 0)] WARN com.terracottatech.console - NodeID[lt-mobses02.sb.com.au:9510] left the cluster 2012-10-01 13:52:32,092 [WorkerThread(channel_life_cycle_stage, 0)] INFO com.tc.objectserver.handler.ChannelLifeCycleHandler - : Received transport disconnect. Shutting down client NodeID[lt-mobses02.sb.com.au:9510] 2012-10-01 13:52:32,093 [WorkerThread(l2_state_message_handler_stage, 0)] ERROR com.tc.l2.state.StateManagerImpl - State[ ACTIVE-COORDINATOR ] Received Election Won Msg : L2StateMessage [ NodeID[lt-mobses02.sb.com.au:9510], type = ELECTION_WON_ALREADY, Enrollment [ NodeID[lt-mobses02.sb.com.au:9510], isNew = false, weights = 9223372036854775807,9223372036854775807 ]]. A Terracotta server tried to join the mirror group as a second ACTIVE 2012-10-01 13:52:32,093 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] ERROR com.tc.l2.ha.ReplicatedClusterStateManagerImpl - Recd wrong response from : NodeID[lt-mobses02.sb.com.au:9510] : msg = com.tc.l2.msg.ClusterStateMessage@490cb88d while publishing Cluster State: Killing the node 2012-10-01 13:52:32,093 [WorkerThread(group_discovery_stage, 3, 0)] INFO com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(-1.ffffffffffffffffffffffffffffffff.92f6a363-81ef-4ab7-955c-9bf242cd6c15-13a1a1dd3fe)[]: Attaching new connection: com.tc.net.core.TCConnectionImpl@618420666: connected: true, closed: false local=10.32.4.95:31166 remote=10.32.4.96:9530 connect=[Mon Oct 01 13:52:32 EST 2012] idle=156ms [0 read, 0 write] 2012-10-01 13:52:32,094 [WorkerThread(channel_life_cycle_stage, 0)] INFO com.tc.objectserver.persistence.db.TransactionStoreImpl - shutdownClient() : Removing txns from DB : 0 2012-10-01 13:52:32,135 [L2_L1:TCWorkerComm # 2_R] WARN com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(11.c8dd256f25c04e5eabe8bfff0c9ffbea.f78856f5-791e-400e-8757-1393a2f133d1-13a1a1eea1f)[]: CLOSE EVENT : com.tc.net.core.TCConnectionImpl@1874829407: connected: false, closed: true local=10.32.4.95:9510 remote=10.32.4.85:43507 connect=[Mon Oct 01 13:52:31 EST 2012] idle=144ms [136 read, 0 write]. STATUS : START 2012-10-01 13:52:32,136 [L2_L1:TCWorkerComm # 2_R] WARN com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(11.c8dd256f25c04e5eabe8bfff0c9ffbea.f78856f5-791e-400e-8757-1393a2f133d1-13a1a1eea1f)[]: closing down connection - com.tc.net.core.TCConnectionImpl@1874829407: connected: false, closed: true local=10.32.4.95:9510 remote=10.32.4.85:43507 connect=[Mon Oct 01 13:52:31 EST 2012] idle=145ms [136 read, 0 write] 2012-10-01 13:52:32,136 [L2_L1:TCWorkerComm # 2_W] WARN com.tc.net.core.CoreNIOServices - Exception trying to process interest request: java.nio.channels.ClosedChannelException 2012-10-01 13:52:32,136 [WorkerThread(group_discovery_stage, 3, 0)] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. TCGroupManager - Health monitoring agent started for lt-mobses02.sb.com.au:9530 2012-10-01 13:52:32,136 [WorkerThread(receive_group_message_stage, 0)] WARN com.tc.l2.ha.L2HAZapNodeRequestProcessor - State[ ACTIVE-COORDINATOR ] received Zap Node request from another State[ ACTIVE-COORDINATOR ] NodeID : NodeID[lt-mobses02.sb.com.au:9510] Error Type : Two or more Active servers detected in the cluster Details : Recd ClusterStateMessage from : NodeID[lt-mobses01.sb.com.au:9510] while in ACTIVE-COORDINATOR state Exception : java.lang.Throwable at com.tc.l2.ha.ReplicatedClusterStateManagerImpl.handleClusterStateMessage(ReplicatedClusterStateManagerImpl.java:155) at com.tc.l2.ha.ReplicatedClusterStateManagerImpl.messageReceived(ReplicatedClusterStateManagerImpl.java:143) at com.tc.net.groups.TCGroupManagerImpl.fireMessageReceivedEvent(TCGroupManagerImpl.java:721) at com.tc.net.groups.TCGroupManagerImpl.messageReceived(TCGroupManagerImpl.java:681) at com.tc.objectserver.handler.ReceiveGroupMessageHandler.handleEvent(ReceiveGroupMessageHandler.java:22) at com.tc.async.impl.StageImpl$WorkerThread.run(StageImpl.java:145) 2012-10-01 13:52:32,812 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] WARN com.terracottatech.console - Requesting node to quit : NodeID : NodeID[lt-mobses02.sb.com.au:9510] Error Type : Two or more Active servers detected in the cluster 2012-10-01 13:52:32,812 [WorkerThread(l2_state_message_handler_stage, 0)] WARN com.terracottatech.console - Requesting node to quit : NodeID : NodeID[lt-mobses02.sb.com.au:9510] Error Type : Two or more Active servers detected in the cluster 2012-10-01 13:52:33,002 [WorkerThread(group_events_dispatch_stage, 0)] INFO com.tc.l2.ha.L2HACoordinator - NodeID[lt-mobses02.sb.com.au:9510] joined the cluster 2012-10-01 13:52:33,317 [L2_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:9530)] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. TCGroupManager - HealthCheckCallbackPort verification PASSED for lt-mobses02.sb.com.au:9530(callbackport: 9530) 2012-10-01 13:52:33,003 [WorkerThread(group_discovery_stage, 3, 0)] WARN com.tc.net.groups.TCGroupMemberDiscoveryStatic - DiscoverStateMachine [Node{host=lt-mobses02.sb.com.au:9510}]: Ignored switching state from: Connecting, to: Connected, current: Member-In-Group 2012-10-01 13:52:33,363 [WorkerThread(group_events_dispatch_stage, 0)] INFO com.terracottatech.console - NodeID[lt-mobses02.sb.com.au:9510] joined the cluster 2012-10-01 13:52:33,621 [WorkerThread(group_handshake_message_stage, 0)] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl: TCGroupManager - Connection to [lt-mobses02.sb.com.au:9530] CLOSED. Health Monitoring for this node is now disabled. 2012-10-01 13:52:33,622 [WorkerThread(group_handshake_message_stage, 0)] WARN com.tc.net.groups.TCGroupMemberDiscoveryStatic - DiscoverStateMachine [Node{host=lt-mobses02.sb.com.au:9510}]: Ignored switching state from: Connecting, to: Init, current: Member-In-Group 2012-10-01 13:52:33,622 [WorkerThread(group_handshake_message_stage, 0)] WARN com.tc.net.groups.TCGroupMemberDiscoveryStatic - DiscoverStateMachine [Node{host=lt-mobses02.sb.com.au:9510}]: Ignored switching state from: Connected, to: Init, current: Member-In-Group 2012-10-01 13:52:33,745 [WorkerThread(receive_group_message_stage, 0)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@835486647: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.131:57157 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1753ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,745 [WorkerThread(receive_group_message_stage, 0)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@1712347668: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.124:22752 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1753ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,799 [WorkerThread(receive_group_message_stage, 0)] INFO com.tc.net.core.TCConnectionManager - Active connections : 0 out of 2 2012-10-01 13:52:33,800 [WorkerThread(receive_group_message_stage, 0)] WARN com.tc.l2.ha.L2HAZapNodeRequestProcessor - A Terracotta server tried to join the mirror group as a second ACTIVE : My weights = 0,-3576867000,12980939,11132628575526,8778404219038062035 Other servers weights = 7,-42000,227168,11132628641062,-2552508849076222291 2012-10-01 13:52:33,800 [WorkerThread(l2_state_message_handler_stage, 0)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@835486647: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.131:57157 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1808ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,800 [WorkerThread(receive_group_message_stage, 0)] FATAL tc.operator.event - NODE : lt-mobses01.sb.com.au Subsystem: CLUSTER_TOPOLOGY Message: SPLIT BRAIN, lt-mobses01.sb.com.au and lt-mobses02.sb.com.au are ACTIVE 2012-10-01 13:52:33,800 [WorkerThread(l2_state_message_handler_stage, 0)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@1712347668: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.124:22752 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1808ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,800 [WorkerThread(receive_group_message_stage, 0)] WARN com.tc.l2.ha.L2HAZapNodeRequestProcessor - NodeID[lt-mobses02.sb.com.au:9510] wins : Backing off : Exiting !!! 2012-10-01 13:52:33,800 [WorkerThread(l2_state_message_handler_stage, 0)] INFO com.tc.net.core.TCConnectionManager - Active connections : 0 out of 2 2012-10-01 13:52:33,801 [WorkerThread(receive_group_message_stage, 0)] FATAL tc.operator.event - NODE : lt-mobses01.sb.com.au Subsystem: CLUSTER_TOPOLOGY Message: NodeID[lt-mobses02.sb.com.au:9510] has more clients. Exiting!! 2012-10-01 13:52:33,801 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@835486647: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.131:57157 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1809ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,802 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@1712347668: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.124:22752 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1810ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,802 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.TCConnectionManager - Active connections : 0 out of 2 2012-10-01 13:52:33,801 [WorkerThread(l2_state_message_handler_stage, 0)] WARN com.tc.net.groups.TCGroupManagerImpl - Zapping node : NodeID[lt-mobses02.sb.com.au:9510] type = 255 reason = State[ ACTIVE-COORDINATOR ] Received Election Won Msg : L2StateMessage [ NodeID[lt-mobses02.sb.com.au:9510], type = ELECTION_WON_ALREADY, Enrollment [ NodeID[lt-mobses02.sb.com.au:9510], isNew = false, weights = 9223372036854775807,9223372036854775807 ]]. A Terracotta server tried to join the mirror group as a second ACTIVE my weight = [0, -3578404000, 12980939, 11132628575526, 4964923724550007331] 2012-10-01 13:52:33,802 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] WARN com.tc.net.groups.TCGroupManagerImpl - Zapping node : NodeID[lt-mobses02.sb.com.au:9510] type = 255 reason = Recd wrong response from : NodeID[lt-mobses02.sb.com.au:9510] while publishing Cluster State Exception : java.lang.Throwable at com.tc.l2.ha.ReplicatedClusterStateManagerImpl.validateResponse(ReplicatedClusterStateManagerImpl.java:89) at com.tc.l2.ha.ReplicatedClusterStateManagerImpl.publishToAll(ReplicatedClusterStateManagerImpl.java:132) at com.tc.l2.ha.ReplicatedClusterStateManagerImpl.connectionIDCreated(ReplicatedClusterStateManagerImpl.java:118) at com.tc.objectserver.persistence.db.ConnectionIDFactoryImpl.fireCreationEvent(ConnectionIDFactoryImpl.java:71) at com.tc.objectserver.persistence.db.ConnectionIDFactoryImpl.restoreConnectionId(ConnectionIDFactoryImpl.java:65) at com.tc.net.protocol.transport.ServerStackProvider.attachNewConnection(ServerStackProvider.java:124) at com.tc.net.protocol.transport.ServerStackProvider$MessageSink.handleSyn(ServerStackProvider.java:301) at com.tc.net.protocol.transport.ServerStackProvider$MessageSink.verifyAndHandleSyn(ServerStackProvider.java:245) at com.tc.net.protocol.transport.ServerStackProvider$MessageSink.putMessage(ServerStackProvider.java:226) at com.tc.net.protocol.transport.WireProtocolAdaptorImpl.addReadData(WireProtocolAdaptorImpl.java:54) at com.tc.net.protocol.ProtocolSwitch.addReadData(ProtocolSwitch.java:50) at com.tc.net.core.TCConnectionImpl.addNetworkData(TCConnectionImpl.java:777) at com.tc.net.core.TCConnectionImpl.doReadFromBufferInternal(TCConnectionImpl.java:463) at com.tc.net.core.TCConnectionImpl.doReadFromBuffer(TCConnectionImpl.java:306) at com.tc.net.core.TCConnectionImpl.doReadInternal(TCConnectionImpl.java:290) at com.tc.net.core.TCConnectionImpl.doRead(TCConnectionImpl.java:266) at com.tc.net.core.CoreNIOServices$CommThread.selectLoop(CoreNIOServices.java:624) at com.tc.net.core.CoreNIOServices$CommThread.run(CoreNIOServices.java:290) my weight = [0, -3579675000, 12980939, 11132628575526, 359689043280585659] 2012-10-01 13:52:33,946 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ConnectionPolicyImpl - Allocated connection license for jvm 88dd022a-04b0-4e0d-b9ec-c2c785ced97c-13a1a1eea1e; ConnectionPolicy[maxConnections=2147483647, connectedJvmCount=2] 2012-10-01 13:52:33,947 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ConnectionPolicyImpl - New connection [2] from jvm 88dd022a-04b0-4e0d-b9ec-c2c785ced97c-13a1a1eea1e 2012-10-01 13:52:33,947 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ServerStackProvider - User null successfully authenticated 2012-10-01 13:52:33,947 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.TCWorkerCommManager - Selecting [L2_L1:TCWorkerComm # 3, FD, wt:0] from [[L2_L1:TCWorkerComm # 0, FD, wt:0], [L2_L1:TCWorkerComm # 1, FD, wt:0], [L2_L1:TCWorkerComm # 2, FD, wt:0], [L2_L1:TCWorkerComm # 3, FD, wt:0], [L2_L1:TCWorkerComm # 4, FD, wt:0], [L2_L1:TCWorkerComm # 5, FD, wt:0], [L2_L1:TCWorkerComm # 6, FD, wt:0], [L2_L1:TCWorkerComm # 7, FD, wt:0], [L2_L1:TCWorkerComm # 8, FD, wt:0], [L2_L1:TCWorkerComm # 9, FD, wt:0], [L2_L1:TCWorkerComm # 10, FD, wt:0], [L2_L1:TCWorkerComm # 11, FD, wt:0], [L2_L1:TCWorkerComm # 12, FD, wt:0], [L2_L1:TCWorkerComm # 13, FD, wt:0], [L2_L1:TCWorkerComm # 14, FD, wt:0], [L2_L1:TCWorkerComm # 15, FD, wt:0]] 2012-10-01 13:52:33,947 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.CoreNIOServices - selection key cancelled key@998902629 {noformat} When I had a look at the Tomcat thread dump, it seem all the threads are in a BLOCKED status. {noformat} Thread 28127: (state = BLOCKED) - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=156 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() @bci=42, line=1987 (Interpreted frame) - com.tc.object.locks.ClientLockManagerImpl.waitUntilRunning() @bci=98, line=636 (Interpreted frame) - com.tc.object.locks.ClientLockManagerImpl.lock(com.tc.object.locks.LockID, com.tc.object.locks.LockLevel) @bci=1, line=93 (Compiled frame) - com.tc.object.bytecode.ManagerImpl.lock(com.tc.object.locks.LockID, com.tc.object.locks.LockLevel) @bci=14, line=869 (Compiled frame) - com.tc.object.bytecode.ManagerUtil.beginLock(java.lang.String, int) @bci=18, line=221 (Compiled frame) - com.tc.object.bytecode.ManagerUtil.beginLockWithoutTxn(java.lang.String, int) @bci=2, line=229 (Compiled frame) - com.terracotta.session.util.Lock.getReadLock() @bci=5, line=48 (Compiled frame) - com.terracotta.session.util.DefaultSessionId.getSessionInvalidatorReadLock() @bci=12, line=83 (Compiled frame) - com.terracotta.session.SessionDataStore.find(com.terracotta.session.SessionId) @bci=15, line=140 (Compiled frame) - com.terracotta.session.TerracottaSessionManager.getSessionIfExists(com.terracotta.session.SessionId, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=46, line=579 (Compiled frame) - com.terracotta.session.SessionRequest.getTerracottaSession(boolean) @bci=55, line=150 (Compiled frame) - com.terracotta.session.SessionRequest.getSession(boolean) @bci=2, line=109 (Compiled frame) - org.springframework.web.context.request.ServletRequestAttributes.<init>(javax.servlet.http.HttpServletRequest) @bci=29, line=81 (Compiled frame) - org.springframework.web.servlet.DispatcherServlet.doDispatch(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=35, line=825 (Compiled frame) - org.springframework.web.servlet.DispatcherServlet.doService(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=241, line=793 (Compiled frame) - org.springframework.web.servlet.FrameworkServlet.processRequest(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=10, line=476 (Compiled frame) - org.springframework.web.servlet.FrameworkServlet.doGet(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=3, line=431 (Compiled frame) - javax.servlet.http.HttpServlet.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=35, line=617 (Compiled frame) - javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=30, line=717 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=376, line=290 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=101, line=206 (Compiled frame) - com.mobenga.core.filter.DebugFilter.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain) @bci=24, line=36 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=119, line=235 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=101, line=206 (Compiled frame) - org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse, javax.servlet.FilterChain) @bci=59, line=96 (Compiled frame) - org.springframework.web.filter.OncePerRequestFilter.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain) @bci=90, line=75 (Compiled frame) - org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(javax.servlet.Filter, javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain) @bci=5, line=183 (Compiled frame) - org.springframework.web.filter.DelegatingFilterProxy.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain) @bci=34, line=138 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=119, line=235 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=101, line=206 (Compiled frame) - org.apache.catalina.core.StandardWrapperValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=804, line=233 (Compiled frame) - org.apache.catalina.core.StandardContextValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=365, line=191 (Compiled frame) - org.terracotta.modules.tomcat.tomcat_5_5.SessionValve55.tcInvoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=89, line=100 (Compiled frame) - org.terracotta.modules.tomcat.tomcat_5_5.SessionValve55.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=24, line=85 (Compiled frame) - org.terracotta.session.ModernTomcatSessionValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=6, line=66 (Compiled frame) - org.apache.catalina.core.StandardHostValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=64, line=127 (Compiled frame) - org.apache.catalina.valves.ErrorReportValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=6, line=102 (Compiled frame) - org.apache.catalina.core.StandardEngineValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=42, line=109 (Compiled frame) - org.apache.catalina.valves.AccessLogValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=24, line=615 (Compiled frame) - org.apache.catalina.connector.CoyoteAdapter.service(org.apache.coyote.Request, org.apache.coyote.Response) @bci=158, line=293 (Compiled frame) - org.apache.coyote.http11.Http11Processor.process(java.net.Socket) @bci=514, line=859 (Compiled frame) - org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(java.net.Socket) @bci=82, line=602 (Compiled frame) - org.apache.tomcat.util.net.JIoEndpoint$Worker.run() @bci=41, line=489 (Compiled frame) - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame) {noformat} Can someone please take a look? Thank you very much.

    Terracotta Project Issue Tracker | 4 years ago | Allen Qin
    java.lang.Throwable
  2. 0

    I have 2 Tomcat session servers running in active/standby mode. There are 14 app servers running Tomcat 6.0 and connecting to the session servers. Problem: The Tomcat server will crash sometimes and unable to recover. This has happened on prod a few times and I have been able to reproduce it on our test env. It seems load is a factor contributing to the failure but it's not the root cause. Sometimes the problem happens when the load is normal. I believe this has something to do with the Terracotta setup when I checked the TC server and client log file. Below are some of the errors I found after a load test. The problem started to happen after 13:40 in the log file. I've attached the following info: 1. TC server log 2. TC client log 3. Tomcat thread dump 4. TC config Please let me know if you need anything else. TC Client: {noformat} 2012-10-01 13:41:02,563 [http-8080-275] INFO com.tc.object.tx.RemoteTransactionManagerImpl - ClientID[17]: TransactionID=[879335] : Took more than 1000ms to add to sequencer : 1004 ms 2012-10-01 13:41:02,564 [http-8080-728] INFO com.tc.object.tx.RemoteTransactionManagerImpl - ClientID[17]: TransactionID=[879338] : Took more than 1000ms to add to sequencer : 1012 ms 2012-10-01 13:41:11,397 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 1 2012-10-01 13:41:15,388 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 2 2012-10-01 13:41:19,491 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 3 2012-10-01 13:41:23,472 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 4 2012-10-01 13:41:27,412 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 5 2012-10-01 13:41:31,420 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 6 2012-10-01 13:41:34,210 [ClientLockManager LockGC] INFO com.tc.object.locks.ClientLockManager - ClientID[17]: Lock GC collected 254 garbage locks 2012-10-01 13:41:41,425 [HealthChecker] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - Socket Connect to lt-mobses01.sb.com.au:9510(callbackport:9510) taking long time. probably not reachable. 2012-10-01 13:41:41,425 [HealthChecker] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 is DEAD 2012-10-01 13:41:41,432 [HealthChecker] ERROR com.tc.net.protocol.transport.ConnectionHealthCheckerImpl: DSO Client - Declared connection dead ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[] idle time 38812ms 2012-10-01 13:41:41,560 [L1_L2:TCComm Main Selector Thread_W (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: CLOSE EVENT : com.tc.net.core.TCConnectionImpl@430312049: connected: false, closed: true local=10.32.4.81:12735 remote=10.32.4.95:9510 connect=[Mon Oct 01 12:19:32 EST 2012] idle=7149ms [627982104 read, 908628612 write]. STATUS : DISCONNECTED 2012-10-01 13:41:41,564 [L1_L2:TCComm Main Selector Thread_W (listen 0.0.0.0:64336)] INFO com.tc.net.protocol.tcm.ClientMessageChannel - ClientMessageChannel moves to SessionID=[1] 2012-10-01 13:41:41,564 [WorkerThread(client_coordination_stage, 0)] INFO com.tc.object.handshakemanager.ClientHandshakeManagerImpl - ClientID[17]: Disconnected: Pausing from State[ RUNNING ] RemoteNode : GroupID[0]. Disconnect count: 0 2012-10-01 13:41:41,564 [WorkerThread(client_coordination_stage, 0)] INFO com.tc.object.handshakemanager.ClientHandshakeManagerImpl - ClientID[17]: ClientHandshakeManager moves to SessionID=[1] 2012-10-01 13:41:41,679 [ConnectionEstablisher] INFO com.tc.net.protocol.transport.ClientConnectionEstablisher$AsyncReconnect - Handling connection request: ConnectionRequest [type=RECONNECT, cmt=com.tc.net.protocol.transport.ClientMessageTransport@32f0c513] 2012-10-01 13:41:41,687 [ConnectionEstablisher] WARN com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: Reconnect attempt 0 of unlimited reconnect tries to lt-mobses02.sb.com.au:9510:Tc-Group-0, timeout=10000 2012-10-01 13:41:41,695 [ConnectionEstablisher] WARN com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: Connection refused [lossy interval: 10000ms] 2012-10-01 13:41:45,698 [ConnectionEstablisher] INFO com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: Attaching new connection: com.tc.net.core.TCConnectionImpl@1318131925: connected: true, closed: false local=10.32.4.81:39188 remote=10.32.4.95:9510 connect=[Mon Oct 01 13:41:45 EST 2012] idle=3001ms [0 read, 0 write] 2012-10-01 13:41:45,775 [ConnectionEstablisher] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - Health monitoring agent started for lt-mobses01.sb.com.au:9510 2012-10-01 13:41:45,790 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - HealthCheckCallbackPort verification PASSED for lt-mobses01.sb.com.au:9510(callbackport: 9510) 2012-10-01 13:41:45,821 [WorkerThread(client_coordination_stage, 0)] INFO com.tc.object.handshakemanager.ClientHandshakeManagerImpl - ClientID[17]: Connected: Unpausing from State[ PAUSED ] RemoteNode : GroupID[0]. Disconnect count : 1 2012-10-01 13:41:45,847 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: CLOSE EVENT : com.tc.net.core.TCConnectionImpl@1318131925: connected: false, closed: true local=10.32.4.81:39188 remote=10.32.4.95:9510 connect=[Mon Oct 01 13:41:45 EST 2012] idle=23ms [216 read, 352 write]. STATUS : ESTABLISHED 2012-10-01 13:41:45,848 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl: DSO Client - Connection to [lt-mobses01.sb.com.au:9510] DISCONNECTED. Health Monitoring for this node is now disabled. 2012-10-01 13:41:45,859 [ConnectionEstablisher] INFO com.tc.net.protocol.transport.ClientConnectionEstablisher$AsyncReconnect - Handling connection request: RestoreConnectionRequest [type=RESTORE_CONNECTION, clientMessageTransport=com.tc.net.protocol.transport.ClientMessageTransport@32f0c513, callback=com.tc.net.protocol.delivery.OOOConnectionWatcher@506bda8, timeoutMillis=5000, sa=10.32.4.95:9510] 2012-10-01 13:41:45,870 [ConnectionEstablisher] INFO com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: Attaching new connection: com.tc.net.core.TCConnectionImpl@2106417963: connected: true, closed: false local=10.32.4.81:39190 remote=10.32.4.95:9510 connect=[Mon Oct 01 13:41:45 EST 2012] idle=11ms [0 read, 0 write] 2012-10-01 13:41:45,960 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] ERROR com.tc.net.protocol.transport.TransportHandshakeErrorHandlerForL1 - com.tc.net.protocol.transport.TransportHandshakeErrorContext: com.tc.net.protocol.transport.TransportHandshakeErrorContext: "Client Cannot Reconnect. ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[] not found. Connection attempts from the Terracotta node at 10.32.4.81:39190 are being rejected by the Terracotta server array."Message Class: com.tc.net.protocol.transport.TransportMessageImpl Sealed: true, Header Length: 32, Data Length: 426, Total Length: 458 Header (com.tc.net.protocol.transport.WireProtocolHeader) Version: 2, Header Length: 8, TOS: 0, TTL: 64, Protocol: TRANSPORT HANDSHAKE Total Packet Length: 458 Adler32 Checksum: 1988298213 (valid: true) Source Addresss: 10.32.4.95 Destination Addresss: 10.32.4.81 Source Port: 9510, Destination Port: 39190 Total Msg Count: 1 Header Validity: true (no message) Payload: type: SYN_ACK, connectionId: ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[], errorContext com.tc.net.protocol.transport.TransportHandshakeErrorContext: "Client Cannot Reconnect. ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[] not found. Connection attempts from the Terracotta node at 10.32.4.81:39190 are being rejected by the Terracotta server array." {noformat} TC Server log {noformat} 2012-10-01 13:52:31,852 [L2_L2:TCWorkerComm # 10_R] WARN com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(7.3efdc8a7efb44662af7065f819369af7.e1c35822-4c48-4c11-831a-f8c4a58ec3e5-13a1a1ec057)[]: closing down connection - com.tc.net.core.TCConnectionImpl@845175459: connected: false, closed: true local=10.32.4.95:9530 remote=10.32.4.96:7112 connect=[Mon Oct 01 13:52:31 EST 2012] idle=448ms [136 read, 0 write] 2012-10-01 13:52:31,755 [L2_L2:TCWorkerComm # 6_W] WARN com.tc.net.core.CoreNIOServices - Exception trying to process interest request: java.nio.channels.ClosedChannelException 2012-10-01 13:52:31,880 [L2_L2:TCWorkerComm # 12_R] WARN com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(8.3efdc8a7efb44662af7065f819369af7.e1c35822-4c48-4c11-831a-f8c4a58ec3e5-13a1a1ec057)[]: CLOSE EVENT : com.tc.net.core.TCConnectionImpl@1651010777: connected: false, closed: true local=10.32.4.95:9530 remote=10.32.4.96:7113 connect=[Mon Oct 01 13:52:31 EST 2012] idle=492ms [136 read, 135 write]. STATUS : START 2012-10-01 13:52:31,880 [WorkerThread(group_handshake_message_stage, 0)] WARN tc.operator.event - NODE : lt-mobses01.sb.com.au Subsystem: CLUSTER_TOPOLOGY Message: Node lt-mobses02.sb.com.au left the cluster 2012-10-01 13:52:31,882 [WorkerThread(group_handshake_message_stage, 0)] WARN com.tc.net.groups.TCGroupManagerImpl - Remove dead member from waitFor response list, dead member: NodeID[lt-mobses02.sb.com.au:9510] 2012-10-01 13:52:31,933 [L2_L2:TCWorkerComm # 12_R] WARN com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(8.3efdc8a7efb44662af7065f819369af7.e1c35822-4c48-4c11-831a-f8c4a58ec3e5-13a1a1ec057)[]: closing down connection - com.tc.net.core.TCConnectionImpl@1651010777: connected: false, closed: true local=10.32.4.95:9530 remote=10.32.4.96:7113 connect=[Mon Oct 01 13:52:31 EST 2012] idle=492ms [136 read, 135 write] 2012-10-01 13:52:31,970 [WorkerThread(garbage_collect_stage, 0)] INFO tc.operator.event - NODE : lt-mobses01.sb.com.au Subsystem: DGC Message: DGC[ 24 ] started. 2012-10-01 13:52:31,936 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ConnectionPolicyImpl - Allocated connection license for jvm f78856f5-791e-400e-8757-1393a2f133d1-13a1a1eea1f; ConnectionPolicy[maxConnections=2147483647, connectedJvmCount=1] 2012-10-01 13:52:31,935 [WorkerThread(group_events_dispatch_stage, 0)] WARN com.tc.l2.ha.L2HACoordinator - NodeID[lt-mobses02.sb.com.au:9510] left the cluster 2012-10-01 13:52:31,935 [WorkerThread(group_handshake_message_stage, 0)] WARN com.tc.net.groups.TCGroupManagerImpl - Removed old member Group Member: NodeID[lt-mobses01.sb.com.au:9510] <-> NodeID[lt-mobses02.sb.com.au:9510] ChannelID[NULL_ID, Status:CLOSED]:10.32.4.95:9530 <--> 10.32.4.96:6895; Ready:false; Joined: false; memberAdding:false; HighPri: true for NodeID[lt-mobses02.sb.com.au:9510] 2012-10-01 13:52:31,973 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ConnectionPolicyImpl - New connection [11] from jvm f78856f5-791e-400e-8757-1393a2f133d1-13a1a1eea1f 2012-10-01 13:52:31,990 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ServerStackProvider - User null successfully authenticated 2012-10-01 13:52:31,991 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.CoreNIOServices - selection key cancelled key@735607632 2012-10-01 13:52:31,993 [WorkerThread(group_handshake_message_stage, 0)] INFO tc.operator.event - NODE : lt-mobses01.sb.com.au Subsystem: CLUSTER_TOPOLOGY Message: Node lt-mobses02.sb.com.au joined the cluster 2012-10-01 13:52:32,060 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(2.c8dd256f25c04e5eabe8bfff0c9ffbea.88dd022a-04b0-4e0d-b9ec-c2c785ced97c-13a1a1eea1e)[]: Attaching new connection: com.tc.net.core.TCConnectionImpl@1712347668: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.124:22752 connect=[Mon Oct 01 13:52:31 EST 2012] idle=68ms [0 read, 0 write] 2012-10-01 13:52:31,973 [WorkerThread(group_events_dispatch_stage, 0)] WARN com.terracottatech.console - NodeID[lt-mobses02.sb.com.au:9510] left the cluster 2012-10-01 13:52:32,092 [WorkerThread(channel_life_cycle_stage, 0)] INFO com.tc.objectserver.handler.ChannelLifeCycleHandler - : Received transport disconnect. Shutting down client NodeID[lt-mobses02.sb.com.au:9510] 2012-10-01 13:52:32,093 [WorkerThread(l2_state_message_handler_stage, 0)] ERROR com.tc.l2.state.StateManagerImpl - State[ ACTIVE-COORDINATOR ] Received Election Won Msg : L2StateMessage [ NodeID[lt-mobses02.sb.com.au:9510], type = ELECTION_WON_ALREADY, Enrollment [ NodeID[lt-mobses02.sb.com.au:9510], isNew = false, weights = 9223372036854775807,9223372036854775807 ]]. A Terracotta server tried to join the mirror group as a second ACTIVE 2012-10-01 13:52:32,093 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] ERROR com.tc.l2.ha.ReplicatedClusterStateManagerImpl - Recd wrong response from : NodeID[lt-mobses02.sb.com.au:9510] : msg = com.tc.l2.msg.ClusterStateMessage@490cb88d while publishing Cluster State: Killing the node 2012-10-01 13:52:32,093 [WorkerThread(group_discovery_stage, 3, 0)] INFO com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(-1.ffffffffffffffffffffffffffffffff.92f6a363-81ef-4ab7-955c-9bf242cd6c15-13a1a1dd3fe)[]: Attaching new connection: com.tc.net.core.TCConnectionImpl@618420666: connected: true, closed: false local=10.32.4.95:31166 remote=10.32.4.96:9530 connect=[Mon Oct 01 13:52:32 EST 2012] idle=156ms [0 read, 0 write] 2012-10-01 13:52:32,094 [WorkerThread(channel_life_cycle_stage, 0)] INFO com.tc.objectserver.persistence.db.TransactionStoreImpl - shutdownClient() : Removing txns from DB : 0 2012-10-01 13:52:32,135 [L2_L1:TCWorkerComm # 2_R] WARN com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(11.c8dd256f25c04e5eabe8bfff0c9ffbea.f78856f5-791e-400e-8757-1393a2f133d1-13a1a1eea1f)[]: CLOSE EVENT : com.tc.net.core.TCConnectionImpl@1874829407: connected: false, closed: true local=10.32.4.95:9510 remote=10.32.4.85:43507 connect=[Mon Oct 01 13:52:31 EST 2012] idle=144ms [136 read, 0 write]. STATUS : START 2012-10-01 13:52:32,136 [L2_L1:TCWorkerComm # 2_R] WARN com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(11.c8dd256f25c04e5eabe8bfff0c9ffbea.f78856f5-791e-400e-8757-1393a2f133d1-13a1a1eea1f)[]: closing down connection - com.tc.net.core.TCConnectionImpl@1874829407: connected: false, closed: true local=10.32.4.95:9510 remote=10.32.4.85:43507 connect=[Mon Oct 01 13:52:31 EST 2012] idle=145ms [136 read, 0 write] 2012-10-01 13:52:32,136 [L2_L1:TCWorkerComm # 2_W] WARN com.tc.net.core.CoreNIOServices - Exception trying to process interest request: java.nio.channels.ClosedChannelException 2012-10-01 13:52:32,136 [WorkerThread(group_discovery_stage, 3, 0)] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. TCGroupManager - Health monitoring agent started for lt-mobses02.sb.com.au:9530 2012-10-01 13:52:32,136 [WorkerThread(receive_group_message_stage, 0)] WARN com.tc.l2.ha.L2HAZapNodeRequestProcessor - State[ ACTIVE-COORDINATOR ] received Zap Node request from another State[ ACTIVE-COORDINATOR ] NodeID : NodeID[lt-mobses02.sb.com.au:9510] Error Type : Two or more Active servers detected in the cluster Details : Recd ClusterStateMessage from : NodeID[lt-mobses01.sb.com.au:9510] while in ACTIVE-COORDINATOR state Exception : java.lang.Throwable at com.tc.l2.ha.ReplicatedClusterStateManagerImpl.handleClusterStateMessage(ReplicatedClusterStateManagerImpl.java:155) at com.tc.l2.ha.ReplicatedClusterStateManagerImpl.messageReceived(ReplicatedClusterStateManagerImpl.java:143) at com.tc.net.groups.TCGroupManagerImpl.fireMessageReceivedEvent(TCGroupManagerImpl.java:721) at com.tc.net.groups.TCGroupManagerImpl.messageReceived(TCGroupManagerImpl.java:681) at com.tc.objectserver.handler.ReceiveGroupMessageHandler.handleEvent(ReceiveGroupMessageHandler.java:22) at com.tc.async.impl.StageImpl$WorkerThread.run(StageImpl.java:145) 2012-10-01 13:52:32,812 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] WARN com.terracottatech.console - Requesting node to quit : NodeID : NodeID[lt-mobses02.sb.com.au:9510] Error Type : Two or more Active servers detected in the cluster 2012-10-01 13:52:32,812 [WorkerThread(l2_state_message_handler_stage, 0)] WARN com.terracottatech.console - Requesting node to quit : NodeID : NodeID[lt-mobses02.sb.com.au:9510] Error Type : Two or more Active servers detected in the cluster 2012-10-01 13:52:33,002 [WorkerThread(group_events_dispatch_stage, 0)] INFO com.tc.l2.ha.L2HACoordinator - NodeID[lt-mobses02.sb.com.au:9510] joined the cluster 2012-10-01 13:52:33,317 [L2_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:9530)] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. TCGroupManager - HealthCheckCallbackPort verification PASSED for lt-mobses02.sb.com.au:9530(callbackport: 9530) 2012-10-01 13:52:33,003 [WorkerThread(group_discovery_stage, 3, 0)] WARN com.tc.net.groups.TCGroupMemberDiscoveryStatic - DiscoverStateMachine [Node{host=lt-mobses02.sb.com.au:9510}]: Ignored switching state from: Connecting, to: Connected, current: Member-In-Group 2012-10-01 13:52:33,363 [WorkerThread(group_events_dispatch_stage, 0)] INFO com.terracottatech.console - NodeID[lt-mobses02.sb.com.au:9510] joined the cluster 2012-10-01 13:52:33,621 [WorkerThread(group_handshake_message_stage, 0)] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl: TCGroupManager - Connection to [lt-mobses02.sb.com.au:9530] CLOSED. Health Monitoring for this node is now disabled. 2012-10-01 13:52:33,622 [WorkerThread(group_handshake_message_stage, 0)] WARN com.tc.net.groups.TCGroupMemberDiscoveryStatic - DiscoverStateMachine [Node{host=lt-mobses02.sb.com.au:9510}]: Ignored switching state from: Connecting, to: Init, current: Member-In-Group 2012-10-01 13:52:33,622 [WorkerThread(group_handshake_message_stage, 0)] WARN com.tc.net.groups.TCGroupMemberDiscoveryStatic - DiscoverStateMachine [Node{host=lt-mobses02.sb.com.au:9510}]: Ignored switching state from: Connected, to: Init, current: Member-In-Group 2012-10-01 13:52:33,745 [WorkerThread(receive_group_message_stage, 0)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@835486647: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.131:57157 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1753ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,745 [WorkerThread(receive_group_message_stage, 0)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@1712347668: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.124:22752 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1753ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,799 [WorkerThread(receive_group_message_stage, 0)] INFO com.tc.net.core.TCConnectionManager - Active connections : 0 out of 2 2012-10-01 13:52:33,800 [WorkerThread(receive_group_message_stage, 0)] WARN com.tc.l2.ha.L2HAZapNodeRequestProcessor - A Terracotta server tried to join the mirror group as a second ACTIVE : My weights = 0,-3576867000,12980939,11132628575526,8778404219038062035 Other servers weights = 7,-42000,227168,11132628641062,-2552508849076222291 2012-10-01 13:52:33,800 [WorkerThread(l2_state_message_handler_stage, 0)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@835486647: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.131:57157 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1808ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,800 [WorkerThread(receive_group_message_stage, 0)] FATAL tc.operator.event - NODE : lt-mobses01.sb.com.au Subsystem: CLUSTER_TOPOLOGY Message: SPLIT BRAIN, lt-mobses01.sb.com.au and lt-mobses02.sb.com.au are ACTIVE 2012-10-01 13:52:33,800 [WorkerThread(l2_state_message_handler_stage, 0)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@1712347668: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.124:22752 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1808ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,800 [WorkerThread(receive_group_message_stage, 0)] WARN com.tc.l2.ha.L2HAZapNodeRequestProcessor - NodeID[lt-mobses02.sb.com.au:9510] wins : Backing off : Exiting !!! 2012-10-01 13:52:33,800 [WorkerThread(l2_state_message_handler_stage, 0)] INFO com.tc.net.core.TCConnectionManager - Active connections : 0 out of 2 2012-10-01 13:52:33,801 [WorkerThread(receive_group_message_stage, 0)] FATAL tc.operator.event - NODE : lt-mobses01.sb.com.au Subsystem: CLUSTER_TOPOLOGY Message: NodeID[lt-mobses02.sb.com.au:9510] has more clients. Exiting!! 2012-10-01 13:52:33,801 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@835486647: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.131:57157 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1809ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,802 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@1712347668: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.124:22752 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1810ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,802 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.TCConnectionManager - Active connections : 0 out of 2 2012-10-01 13:52:33,801 [WorkerThread(l2_state_message_handler_stage, 0)] WARN com.tc.net.groups.TCGroupManagerImpl - Zapping node : NodeID[lt-mobses02.sb.com.au:9510] type = 255 reason = State[ ACTIVE-COORDINATOR ] Received Election Won Msg : L2StateMessage [ NodeID[lt-mobses02.sb.com.au:9510], type = ELECTION_WON_ALREADY, Enrollment [ NodeID[lt-mobses02.sb.com.au:9510], isNew = false, weights = 9223372036854775807,9223372036854775807 ]]. A Terracotta server tried to join the mirror group as a second ACTIVE my weight = [0, -3578404000, 12980939, 11132628575526, 4964923724550007331] 2012-10-01 13:52:33,802 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] WARN com.tc.net.groups.TCGroupManagerImpl - Zapping node : NodeID[lt-mobses02.sb.com.au:9510] type = 255 reason = Recd wrong response from : NodeID[lt-mobses02.sb.com.au:9510] while publishing Cluster State Exception : java.lang.Throwable at com.tc.l2.ha.ReplicatedClusterStateManagerImpl.validateResponse(ReplicatedClusterStateManagerImpl.java:89) at com.tc.l2.ha.ReplicatedClusterStateManagerImpl.publishToAll(ReplicatedClusterStateManagerImpl.java:132) at com.tc.l2.ha.ReplicatedClusterStateManagerImpl.connectionIDCreated(ReplicatedClusterStateManagerImpl.java:118) at com.tc.objectserver.persistence.db.ConnectionIDFactoryImpl.fireCreationEvent(ConnectionIDFactoryImpl.java:71) at com.tc.objectserver.persistence.db.ConnectionIDFactoryImpl.restoreConnectionId(ConnectionIDFactoryImpl.java:65) at com.tc.net.protocol.transport.ServerStackProvider.attachNewConnection(ServerStackProvider.java:124) at com.tc.net.protocol.transport.ServerStackProvider$MessageSink.handleSyn(ServerStackProvider.java:301) at com.tc.net.protocol.transport.ServerStackProvider$MessageSink.verifyAndHandleSyn(ServerStackProvider.java:245) at com.tc.net.protocol.transport.ServerStackProvider$MessageSink.putMessage(ServerStackProvider.java:226) at com.tc.net.protocol.transport.WireProtocolAdaptorImpl.addReadData(WireProtocolAdaptorImpl.java:54) at com.tc.net.protocol.ProtocolSwitch.addReadData(ProtocolSwitch.java:50) at com.tc.net.core.TCConnectionImpl.addNetworkData(TCConnectionImpl.java:777) at com.tc.net.core.TCConnectionImpl.doReadFromBufferInternal(TCConnectionImpl.java:463) at com.tc.net.core.TCConnectionImpl.doReadFromBuffer(TCConnectionImpl.java:306) at com.tc.net.core.TCConnectionImpl.doReadInternal(TCConnectionImpl.java:290) at com.tc.net.core.TCConnectionImpl.doRead(TCConnectionImpl.java:266) at com.tc.net.core.CoreNIOServices$CommThread.selectLoop(CoreNIOServices.java:624) at com.tc.net.core.CoreNIOServices$CommThread.run(CoreNIOServices.java:290) my weight = [0, -3579675000, 12980939, 11132628575526, 359689043280585659] 2012-10-01 13:52:33,946 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ConnectionPolicyImpl - Allocated connection license for jvm 88dd022a-04b0-4e0d-b9ec-c2c785ced97c-13a1a1eea1e; ConnectionPolicy[maxConnections=2147483647, connectedJvmCount=2] 2012-10-01 13:52:33,947 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ConnectionPolicyImpl - New connection [2] from jvm 88dd022a-04b0-4e0d-b9ec-c2c785ced97c-13a1a1eea1e 2012-10-01 13:52:33,947 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ServerStackProvider - User null successfully authenticated 2012-10-01 13:52:33,947 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.TCWorkerCommManager - Selecting [L2_L1:TCWorkerComm # 3, FD, wt:0] from [[L2_L1:TCWorkerComm # 0, FD, wt:0], [L2_L1:TCWorkerComm # 1, FD, wt:0], [L2_L1:TCWorkerComm # 2, FD, wt:0], [L2_L1:TCWorkerComm # 3, FD, wt:0], [L2_L1:TCWorkerComm # 4, FD, wt:0], [L2_L1:TCWorkerComm # 5, FD, wt:0], [L2_L1:TCWorkerComm # 6, FD, wt:0], [L2_L1:TCWorkerComm # 7, FD, wt:0], [L2_L1:TCWorkerComm # 8, FD, wt:0], [L2_L1:TCWorkerComm # 9, FD, wt:0], [L2_L1:TCWorkerComm # 10, FD, wt:0], [L2_L1:TCWorkerComm # 11, FD, wt:0], [L2_L1:TCWorkerComm # 12, FD, wt:0], [L2_L1:TCWorkerComm # 13, FD, wt:0], [L2_L1:TCWorkerComm # 14, FD, wt:0], [L2_L1:TCWorkerComm # 15, FD, wt:0]] 2012-10-01 13:52:33,947 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.CoreNIOServices - selection key cancelled key@998902629 {noformat} When I had a look at the Tomcat thread dump, it seem all the threads are in a BLOCKED status. {noformat} Thread 28127: (state = BLOCKED) - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=156 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() @bci=42, line=1987 (Interpreted frame) - com.tc.object.locks.ClientLockManagerImpl.waitUntilRunning() @bci=98, line=636 (Interpreted frame) - com.tc.object.locks.ClientLockManagerImpl.lock(com.tc.object.locks.LockID, com.tc.object.locks.LockLevel) @bci=1, line=93 (Compiled frame) - com.tc.object.bytecode.ManagerImpl.lock(com.tc.object.locks.LockID, com.tc.object.locks.LockLevel) @bci=14, line=869 (Compiled frame) - com.tc.object.bytecode.ManagerUtil.beginLock(java.lang.String, int) @bci=18, line=221 (Compiled frame) - com.tc.object.bytecode.ManagerUtil.beginLockWithoutTxn(java.lang.String, int) @bci=2, line=229 (Compiled frame) - com.terracotta.session.util.Lock.getReadLock() @bci=5, line=48 (Compiled frame) - com.terracotta.session.util.DefaultSessionId.getSessionInvalidatorReadLock() @bci=12, line=83 (Compiled frame) - com.terracotta.session.SessionDataStore.find(com.terracotta.session.SessionId) @bci=15, line=140 (Compiled frame) - com.terracotta.session.TerracottaSessionManager.getSessionIfExists(com.terracotta.session.SessionId, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=46, line=579 (Compiled frame) - com.terracotta.session.SessionRequest.getTerracottaSession(boolean) @bci=55, line=150 (Compiled frame) - com.terracotta.session.SessionRequest.getSession(boolean) @bci=2, line=109 (Compiled frame) - org.springframework.web.context.request.ServletRequestAttributes.<init>(javax.servlet.http.HttpServletRequest) @bci=29, line=81 (Compiled frame) - org.springframework.web.servlet.DispatcherServlet.doDispatch(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=35, line=825 (Compiled frame) - org.springframework.web.servlet.DispatcherServlet.doService(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=241, line=793 (Compiled frame) - org.springframework.web.servlet.FrameworkServlet.processRequest(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=10, line=476 (Compiled frame) - org.springframework.web.servlet.FrameworkServlet.doGet(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=3, line=431 (Compiled frame) - javax.servlet.http.HttpServlet.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=35, line=617 (Compiled frame) - javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=30, line=717 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=376, line=290 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=101, line=206 (Compiled frame) - com.mobenga.core.filter.DebugFilter.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain) @bci=24, line=36 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=119, line=235 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=101, line=206 (Compiled frame) - org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse, javax.servlet.FilterChain) @bci=59, line=96 (Compiled frame) - org.springframework.web.filter.OncePerRequestFilter.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain) @bci=90, line=75 (Compiled frame) - org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(javax.servlet.Filter, javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain) @bci=5, line=183 (Compiled frame) - org.springframework.web.filter.DelegatingFilterProxy.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain) @bci=34, line=138 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=119, line=235 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=101, line=206 (Compiled frame) - org.apache.catalina.core.StandardWrapperValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=804, line=233 (Compiled frame) - org.apache.catalina.core.StandardContextValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=365, line=191 (Compiled frame) - org.terracotta.modules.tomcat.tomcat_5_5.SessionValve55.tcInvoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=89, line=100 (Compiled frame) - org.terracotta.modules.tomcat.tomcat_5_5.SessionValve55.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=24, line=85 (Compiled frame) - org.terracotta.session.ModernTomcatSessionValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=6, line=66 (Compiled frame) - org.apache.catalina.core.StandardHostValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=64, line=127 (Compiled frame) - org.apache.catalina.valves.ErrorReportValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=6, line=102 (Compiled frame) - org.apache.catalina.core.StandardEngineValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=42, line=109 (Compiled frame) - org.apache.catalina.valves.AccessLogValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=24, line=615 (Compiled frame) - org.apache.catalina.connector.CoyoteAdapter.service(org.apache.coyote.Request, org.apache.coyote.Response) @bci=158, line=293 (Compiled frame) - org.apache.coyote.http11.Http11Processor.process(java.net.Socket) @bci=514, line=859 (Compiled frame) - org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(java.net.Socket) @bci=82, line=602 (Compiled frame) - org.apache.tomcat.util.net.JIoEndpoint$Worker.run() @bci=41, line=489 (Compiled frame) - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame) {noformat} Can someone please take a look? Thank you very much.

    Terracotta Project Issue Tracker | 4 years ago | Allen Qin
    java.lang.Throwable
  3. Speed up your debug routine!

    Automated exception search integrated into your IDE

  4. 0

    Why Apache CXF not being used in WebLogic 12c?

    Stack Overflow | 2 years ago | FĂ©lix 1973
    java.lang.Throwable

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. java.lang.Throwable

      No message provided

      at com.tc.l2.ha.ReplicatedClusterStateManagerImpl.handleClusterStateMessage()
    2. com.tc.l2
      ReplicatedClusterStateManagerImpl.messageReceived
      1. com.tc.l2.ha.ReplicatedClusterStateManagerImpl.handleClusterStateMessage(ReplicatedClusterStateManagerImpl.java:155)
      2. com.tc.l2.ha.ReplicatedClusterStateManagerImpl.messageReceived(ReplicatedClusterStateManagerImpl.java:143)
      2 frames
    3. com.tc.net
      TCGroupManagerImpl.messageReceived
      1. com.tc.net.groups.TCGroupManagerImpl.fireMessageReceivedEvent(TCGroupManagerImpl.java:721)
      2. com.tc.net.groups.TCGroupManagerImpl.messageReceived(TCGroupManagerImpl.java:681)
      2 frames
    4. com.tc.objectserver
      ReceiveGroupMessageHandler.handleEvent
      1. com.tc.objectserver.handler.ReceiveGroupMessageHandler.handleEvent(ReceiveGroupMessageHandler.java:22)
      1 frame
    5. com.tc.async
      StageImpl$WorkerThread.run
      1. com.tc.async.impl.StageImpl$WorkerThread.run(StageImpl.java:145)
      1 frame