java.lang.Throwable

Terracotta Project Issue Tracker | Allen Qin | 4 years ago
tip
Your exception is missing from the Samebug knowledge base.
Here are the best solutions we found on the Internet.
Click on the to mark the helpful solution and get rewards for you help.
  1. 0

    I have 2 Tomcat session servers running in active/standby mode. There are 14 app servers running Tomcat 6.0 and connecting to the session servers. Problem: The Tomcat server will crash sometimes and unable to recover. This has happened on prod a few times and I have been able to reproduce it on our test env. It seems load is a factor contributing to the failure but it's not the root cause. Sometimes the problem happens when the load is normal. I believe this has something to do with the Terracotta setup when I checked the TC server and client log file. Below are some of the errors I found after a load test. The problem started to happen after 13:40 in the log file. I've attached the following info: 1. TC server log 2. TC client log 3. Tomcat thread dump 4. TC config Please let me know if you need anything else. TC Client: {noformat} 2012-10-01 13:41:02,563 [http-8080-275] INFO com.tc.object.tx.RemoteTransactionManagerImpl - ClientID[17]: TransactionID=[879335] : Took more than 1000ms to add to sequencer : 1004 ms 2012-10-01 13:41:02,564 [http-8080-728] INFO com.tc.object.tx.RemoteTransactionManagerImpl - ClientID[17]: TransactionID=[879338] : Took more than 1000ms to add to sequencer : 1012 ms 2012-10-01 13:41:11,397 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 1 2012-10-01 13:41:15,388 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 2 2012-10-01 13:41:19,491 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 3 2012-10-01 13:41:23,472 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 4 2012-10-01 13:41:27,412 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 5 2012-10-01 13:41:31,420 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 6 2012-10-01 13:41:34,210 [ClientLockManager LockGC] INFO com.tc.object.locks.ClientLockManager - ClientID[17]: Lock GC collected 254 garbage locks 2012-10-01 13:41:41,425 [HealthChecker] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - Socket Connect to lt-mobses01.sb.com.au:9510(callbackport:9510) taking long time. probably not reachable. 2012-10-01 13:41:41,425 [HealthChecker] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 is DEAD 2012-10-01 13:41:41,432 [HealthChecker] ERROR com.tc.net.protocol.transport.ConnectionHealthCheckerImpl: DSO Client - Declared connection dead ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[] idle time 38812ms 2012-10-01 13:41:41,560 [L1_L2:TCComm Main Selector Thread_W (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: CLOSE EVENT : com.tc.net.core.TCConnectionImpl@430312049: connected: false, closed: true local=10.32.4.81:12735 remote=10.32.4.95:9510 connect=[Mon Oct 01 12:19:32 EST 2012] idle=7149ms [627982104 read, 908628612 write]. STATUS : DISCONNECTED 2012-10-01 13:41:41,564 [L1_L2:TCComm Main Selector Thread_W (listen 0.0.0.0:64336)] INFO com.tc.net.protocol.tcm.ClientMessageChannel - ClientMessageChannel moves to SessionID=[1] 2012-10-01 13:41:41,564 [WorkerThread(client_coordination_stage, 0)] INFO com.tc.object.handshakemanager.ClientHandshakeManagerImpl - ClientID[17]: Disconnected: Pausing from State[ RUNNING ] RemoteNode : GroupID[0]. Disconnect count: 0 2012-10-01 13:41:41,564 [WorkerThread(client_coordination_stage, 0)] INFO com.tc.object.handshakemanager.ClientHandshakeManagerImpl - ClientID[17]: ClientHandshakeManager moves to SessionID=[1] 2012-10-01 13:41:41,679 [ConnectionEstablisher] INFO com.tc.net.protocol.transport.ClientConnectionEstablisher$AsyncReconnect - Handling connection request: ConnectionRequest [type=RECONNECT, cmt=com.tc.net.protocol.transport.ClientMessageTransport@32f0c513] 2012-10-01 13:41:41,687 [ConnectionEstablisher] WARN com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: Reconnect attempt 0 of unlimited reconnect tries to lt-mobses02.sb.com.au:9510:Tc-Group-0, timeout=10000 2012-10-01 13:41:41,695 [ConnectionEstablisher] WARN com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: Connection refused [lossy interval: 10000ms] 2012-10-01 13:41:45,698 [ConnectionEstablisher] INFO com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: Attaching new connection: com.tc.net.core.TCConnectionImpl@1318131925: connected: true, closed: false local=10.32.4.81:39188 remote=10.32.4.95:9510 connect=[Mon Oct 01 13:41:45 EST 2012] idle=3001ms [0 read, 0 write] 2012-10-01 13:41:45,775 [ConnectionEstablisher] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - Health monitoring agent started for lt-mobses01.sb.com.au:9510 2012-10-01 13:41:45,790 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - HealthCheckCallbackPort verification PASSED for lt-mobses01.sb.com.au:9510(callbackport: 9510) 2012-10-01 13:41:45,821 [WorkerThread(client_coordination_stage, 0)] INFO com.tc.object.handshakemanager.ClientHandshakeManagerImpl - ClientID[17]: Connected: Unpausing from State[ PAUSED ] RemoteNode : GroupID[0]. Disconnect count : 1 2012-10-01 13:41:45,847 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: CLOSE EVENT : com.tc.net.core.TCConnectionImpl@1318131925: connected: false, closed: true local=10.32.4.81:39188 remote=10.32.4.95:9510 connect=[Mon Oct 01 13:41:45 EST 2012] idle=23ms [216 read, 352 write]. STATUS : ESTABLISHED 2012-10-01 13:41:45,848 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl: DSO Client - Connection to [lt-mobses01.sb.com.au:9510] DISCONNECTED. Health Monitoring for this node is now disabled. 2012-10-01 13:41:45,859 [ConnectionEstablisher] INFO com.tc.net.protocol.transport.ClientConnectionEstablisher$AsyncReconnect - Handling connection request: RestoreConnectionRequest [type=RESTORE_CONNECTION, clientMessageTransport=com.tc.net.protocol.transport.ClientMessageTransport@32f0c513, callback=com.tc.net.protocol.delivery.OOOConnectionWatcher@506bda8, timeoutMillis=5000, sa=10.32.4.95:9510] 2012-10-01 13:41:45,870 [ConnectionEstablisher] INFO com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: Attaching new connection: com.tc.net.core.TCConnectionImpl@2106417963: connected: true, closed: false local=10.32.4.81:39190 remote=10.32.4.95:9510 connect=[Mon Oct 01 13:41:45 EST 2012] idle=11ms [0 read, 0 write] 2012-10-01 13:41:45,960 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] ERROR com.tc.net.protocol.transport.TransportHandshakeErrorHandlerForL1 - com.tc.net.protocol.transport.TransportHandshakeErrorContext: com.tc.net.protocol.transport.TransportHandshakeErrorContext: "Client Cannot Reconnect. ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[] not found. Connection attempts from the Terracotta node at 10.32.4.81:39190 are being rejected by the Terracotta server array."Message Class: com.tc.net.protocol.transport.TransportMessageImpl Sealed: true, Header Length: 32, Data Length: 426, Total Length: 458 Header (com.tc.net.protocol.transport.WireProtocolHeader) Version: 2, Header Length: 8, TOS: 0, TTL: 64, Protocol: TRANSPORT HANDSHAKE Total Packet Length: 458 Adler32 Checksum: 1988298213 (valid: true) Source Addresss: 10.32.4.95 Destination Addresss: 10.32.4.81 Source Port: 9510, Destination Port: 39190 Total Msg Count: 1 Header Validity: true (no message) Payload: type: SYN_ACK, connectionId: ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[], errorContext com.tc.net.protocol.transport.TransportHandshakeErrorContext: "Client Cannot Reconnect. ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[] not found. Connection attempts from the Terracotta node at 10.32.4.81:39190 are being rejected by the Terracotta server array." {noformat} TC Server log {noformat} 2012-10-01 13:52:31,852 [L2_L2:TCWorkerComm # 10_R] WARN com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(7.3efdc8a7efb44662af7065f819369af7.e1c35822-4c48-4c11-831a-f8c4a58ec3e5-13a1a1ec057)[]: closing down connection - com.tc.net.core.TCConnectionImpl@845175459: connected: false, closed: true local=10.32.4.95:9530 remote=10.32.4.96:7112 connect=[Mon Oct 01 13:52:31 EST 2012] idle=448ms [136 read, 0 write] 2012-10-01 13:52:31,755 [L2_L2:TCWorkerComm # 6_W] WARN com.tc.net.core.CoreNIOServices - Exception trying to process interest request: java.nio.channels.ClosedChannelException 2012-10-01 13:52:31,880 [L2_L2:TCWorkerComm # 12_R] WARN com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(8.3efdc8a7efb44662af7065f819369af7.e1c35822-4c48-4c11-831a-f8c4a58ec3e5-13a1a1ec057)[]: CLOSE EVENT : com.tc.net.core.TCConnectionImpl@1651010777: connected: false, closed: true local=10.32.4.95:9530 remote=10.32.4.96:7113 connect=[Mon Oct 01 13:52:31 EST 2012] idle=492ms [136 read, 135 write]. STATUS : START 2012-10-01 13:52:31,880 [WorkerThread(group_handshake_message_stage, 0)] WARN tc.operator.event - NODE : lt-mobses01.sb.com.au Subsystem: CLUSTER_TOPOLOGY Message: Node lt-mobses02.sb.com.au left the cluster 2012-10-01 13:52:31,882 [WorkerThread(group_handshake_message_stage, 0)] WARN com.tc.net.groups.TCGroupManagerImpl - Remove dead member from waitFor response list, dead member: NodeID[lt-mobses02.sb.com.au:9510] 2012-10-01 13:52:31,933 [L2_L2:TCWorkerComm # 12_R] WARN com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(8.3efdc8a7efb44662af7065f819369af7.e1c35822-4c48-4c11-831a-f8c4a58ec3e5-13a1a1ec057)[]: closing down connection - com.tc.net.core.TCConnectionImpl@1651010777: connected: false, closed: true local=10.32.4.95:9530 remote=10.32.4.96:7113 connect=[Mon Oct 01 13:52:31 EST 2012] idle=492ms [136 read, 135 write] 2012-10-01 13:52:31,970 [WorkerThread(garbage_collect_stage, 0)] INFO tc.operator.event - NODE : lt-mobses01.sb.com.au Subsystem: DGC Message: DGC[ 24 ] started. 2012-10-01 13:52:31,936 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ConnectionPolicyImpl - Allocated connection license for jvm f78856f5-791e-400e-8757-1393a2f133d1-13a1a1eea1f; ConnectionPolicy[maxConnections=2147483647, connectedJvmCount=1] 2012-10-01 13:52:31,935 [WorkerThread(group_events_dispatch_stage, 0)] WARN com.tc.l2.ha.L2HACoordinator - NodeID[lt-mobses02.sb.com.au:9510] left the cluster 2012-10-01 13:52:31,935 [WorkerThread(group_handshake_message_stage, 0)] WARN com.tc.net.groups.TCGroupManagerImpl - Removed old member Group Member: NodeID[lt-mobses01.sb.com.au:9510] <-> NodeID[lt-mobses02.sb.com.au:9510] ChannelID[NULL_ID, Status:CLOSED]:10.32.4.95:9530 <--> 10.32.4.96:6895; Ready:false; Joined: false; memberAdding:false; HighPri: true for NodeID[lt-mobses02.sb.com.au:9510] 2012-10-01 13:52:31,973 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ConnectionPolicyImpl - New connection [11] from jvm f78856f5-791e-400e-8757-1393a2f133d1-13a1a1eea1f 2012-10-01 13:52:31,990 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ServerStackProvider - User null successfully authenticated 2012-10-01 13:52:31,991 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.CoreNIOServices - selection key cancelled key@735607632 2012-10-01 13:52:31,993 [WorkerThread(group_handshake_message_stage, 0)] INFO tc.operator.event - NODE : lt-mobses01.sb.com.au Subsystem: CLUSTER_TOPOLOGY Message: Node lt-mobses02.sb.com.au joined the cluster 2012-10-01 13:52:32,060 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(2.c8dd256f25c04e5eabe8bfff0c9ffbea.88dd022a-04b0-4e0d-b9ec-c2c785ced97c-13a1a1eea1e)[]: Attaching new connection: com.tc.net.core.TCConnectionImpl@1712347668: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.124:22752 connect=[Mon Oct 01 13:52:31 EST 2012] idle=68ms [0 read, 0 write] 2012-10-01 13:52:31,973 [WorkerThread(group_events_dispatch_stage, 0)] WARN com.terracottatech.console - NodeID[lt-mobses02.sb.com.au:9510] left the cluster 2012-10-01 13:52:32,092 [WorkerThread(channel_life_cycle_stage, 0)] INFO com.tc.objectserver.handler.ChannelLifeCycleHandler - : Received transport disconnect. Shutting down client NodeID[lt-mobses02.sb.com.au:9510] 2012-10-01 13:52:32,093 [WorkerThread(l2_state_message_handler_stage, 0)] ERROR com.tc.l2.state.StateManagerImpl - State[ ACTIVE-COORDINATOR ] Received Election Won Msg : L2StateMessage [ NodeID[lt-mobses02.sb.com.au:9510], type = ELECTION_WON_ALREADY, Enrollment [ NodeID[lt-mobses02.sb.com.au:9510], isNew = false, weights = 9223372036854775807,9223372036854775807 ]]. A Terracotta server tried to join the mirror group as a second ACTIVE 2012-10-01 13:52:32,093 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] ERROR com.tc.l2.ha.ReplicatedClusterStateManagerImpl - Recd wrong response from : NodeID[lt-mobses02.sb.com.au:9510] : msg = com.tc.l2.msg.ClusterStateMessage@490cb88d while publishing Cluster State: Killing the node 2012-10-01 13:52:32,093 [WorkerThread(group_discovery_stage, 3, 0)] INFO com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(-1.ffffffffffffffffffffffffffffffff.92f6a363-81ef-4ab7-955c-9bf242cd6c15-13a1a1dd3fe)[]: Attaching new connection: com.tc.net.core.TCConnectionImpl@618420666: connected: true, closed: false local=10.32.4.95:31166 remote=10.32.4.96:9530 connect=[Mon Oct 01 13:52:32 EST 2012] idle=156ms [0 read, 0 write] 2012-10-01 13:52:32,094 [WorkerThread(channel_life_cycle_stage, 0)] INFO com.tc.objectserver.persistence.db.TransactionStoreImpl - shutdownClient() : Removing txns from DB : 0 2012-10-01 13:52:32,135 [L2_L1:TCWorkerComm # 2_R] WARN com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(11.c8dd256f25c04e5eabe8bfff0c9ffbea.f78856f5-791e-400e-8757-1393a2f133d1-13a1a1eea1f)[]: CLOSE EVENT : com.tc.net.core.TCConnectionImpl@1874829407: connected: false, closed: true local=10.32.4.95:9510 remote=10.32.4.85:43507 connect=[Mon Oct 01 13:52:31 EST 2012] idle=144ms [136 read, 0 write]. STATUS : START 2012-10-01 13:52:32,136 [L2_L1:TCWorkerComm # 2_R] WARN com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(11.c8dd256f25c04e5eabe8bfff0c9ffbea.f78856f5-791e-400e-8757-1393a2f133d1-13a1a1eea1f)[]: closing down connection - com.tc.net.core.TCConnectionImpl@1874829407: connected: false, closed: true local=10.32.4.95:9510 remote=10.32.4.85:43507 connect=[Mon Oct 01 13:52:31 EST 2012] idle=145ms [136 read, 0 write] 2012-10-01 13:52:32,136 [L2_L1:TCWorkerComm # 2_W] WARN com.tc.net.core.CoreNIOServices - Exception trying to process interest request: java.nio.channels.ClosedChannelException 2012-10-01 13:52:32,136 [WorkerThread(group_discovery_stage, 3, 0)] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. TCGroupManager - Health monitoring agent started for lt-mobses02.sb.com.au:9530 2012-10-01 13:52:32,136 [WorkerThread(receive_group_message_stage, 0)] WARN com.tc.l2.ha.L2HAZapNodeRequestProcessor - State[ ACTIVE-COORDINATOR ] received Zap Node request from another State[ ACTIVE-COORDINATOR ] NodeID : NodeID[lt-mobses02.sb.com.au:9510] Error Type : Two or more Active servers detected in the cluster Details : Recd ClusterStateMessage from : NodeID[lt-mobses01.sb.com.au:9510] while in ACTIVE-COORDINATOR state Exception : java.lang.Throwable at com.tc.l2.ha.ReplicatedClusterStateManagerImpl.handleClusterStateMessage(ReplicatedClusterStateManagerImpl.java:155) at com.tc.l2.ha.ReplicatedClusterStateManagerImpl.messageReceived(ReplicatedClusterStateManagerImpl.java:143) at com.tc.net.groups.TCGroupManagerImpl.fireMessageReceivedEvent(TCGroupManagerImpl.java:721) at com.tc.net.groups.TCGroupManagerImpl.messageReceived(TCGroupManagerImpl.java:681) at com.tc.objectserver.handler.ReceiveGroupMessageHandler.handleEvent(ReceiveGroupMessageHandler.java:22) at com.tc.async.impl.StageImpl$WorkerThread.run(StageImpl.java:145) 2012-10-01 13:52:32,812 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] WARN com.terracottatech.console - Requesting node to quit : NodeID : NodeID[lt-mobses02.sb.com.au:9510] Error Type : Two or more Active servers detected in the cluster 2012-10-01 13:52:32,812 [WorkerThread(l2_state_message_handler_stage, 0)] WARN com.terracottatech.console - Requesting node to quit : NodeID : NodeID[lt-mobses02.sb.com.au:9510] Error Type : Two or more Active servers detected in the cluster 2012-10-01 13:52:33,002 [WorkerThread(group_events_dispatch_stage, 0)] INFO com.tc.l2.ha.L2HACoordinator - NodeID[lt-mobses02.sb.com.au:9510] joined the cluster 2012-10-01 13:52:33,317 [L2_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:9530)] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. TCGroupManager - HealthCheckCallbackPort verification PASSED for lt-mobses02.sb.com.au:9530(callbackport: 9530) 2012-10-01 13:52:33,003 [WorkerThread(group_discovery_stage, 3, 0)] WARN com.tc.net.groups.TCGroupMemberDiscoveryStatic - DiscoverStateMachine [Node{host=lt-mobses02.sb.com.au:9510}]: Ignored switching state from: Connecting, to: Connected, current: Member-In-Group 2012-10-01 13:52:33,363 [WorkerThread(group_events_dispatch_stage, 0)] INFO com.terracottatech.console - NodeID[lt-mobses02.sb.com.au:9510] joined the cluster 2012-10-01 13:52:33,621 [WorkerThread(group_handshake_message_stage, 0)] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl: TCGroupManager - Connection to [lt-mobses02.sb.com.au:9530] CLOSED. Health Monitoring for this node is now disabled. 2012-10-01 13:52:33,622 [WorkerThread(group_handshake_message_stage, 0)] WARN com.tc.net.groups.TCGroupMemberDiscoveryStatic - DiscoverStateMachine [Node{host=lt-mobses02.sb.com.au:9510}]: Ignored switching state from: Connecting, to: Init, current: Member-In-Group 2012-10-01 13:52:33,622 [WorkerThread(group_handshake_message_stage, 0)] WARN com.tc.net.groups.TCGroupMemberDiscoveryStatic - DiscoverStateMachine [Node{host=lt-mobses02.sb.com.au:9510}]: Ignored switching state from: Connected, to: Init, current: Member-In-Group 2012-10-01 13:52:33,745 [WorkerThread(receive_group_message_stage, 0)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@835486647: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.131:57157 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1753ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,745 [WorkerThread(receive_group_message_stage, 0)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@1712347668: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.124:22752 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1753ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,799 [WorkerThread(receive_group_message_stage, 0)] INFO com.tc.net.core.TCConnectionManager - Active connections : 0 out of 2 2012-10-01 13:52:33,800 [WorkerThread(receive_group_message_stage, 0)] WARN com.tc.l2.ha.L2HAZapNodeRequestProcessor - A Terracotta server tried to join the mirror group as a second ACTIVE : My weights = 0,-3576867000,12980939,11132628575526,8778404219038062035 Other servers weights = 7,-42000,227168,11132628641062,-2552508849076222291 2012-10-01 13:52:33,800 [WorkerThread(l2_state_message_handler_stage, 0)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@835486647: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.131:57157 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1808ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,800 [WorkerThread(receive_group_message_stage, 0)] FATAL tc.operator.event - NODE : lt-mobses01.sb.com.au Subsystem: CLUSTER_TOPOLOGY Message: SPLIT BRAIN, lt-mobses01.sb.com.au and lt-mobses02.sb.com.au are ACTIVE 2012-10-01 13:52:33,800 [WorkerThread(l2_state_message_handler_stage, 0)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@1712347668: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.124:22752 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1808ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,800 [WorkerThread(receive_group_message_stage, 0)] WARN com.tc.l2.ha.L2HAZapNodeRequestProcessor - NodeID[lt-mobses02.sb.com.au:9510] wins : Backing off : Exiting !!! 2012-10-01 13:52:33,800 [WorkerThread(l2_state_message_handler_stage, 0)] INFO com.tc.net.core.TCConnectionManager - Active connections : 0 out of 2 2012-10-01 13:52:33,801 [WorkerThread(receive_group_message_stage, 0)] FATAL tc.operator.event - NODE : lt-mobses01.sb.com.au Subsystem: CLUSTER_TOPOLOGY Message: NodeID[lt-mobses02.sb.com.au:9510] has more clients. Exiting!! 2012-10-01 13:52:33,801 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@835486647: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.131:57157 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1809ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,802 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@1712347668: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.124:22752 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1810ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,802 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.TCConnectionManager - Active connections : 0 out of 2 2012-10-01 13:52:33,801 [WorkerThread(l2_state_message_handler_stage, 0)] WARN com.tc.net.groups.TCGroupManagerImpl - Zapping node : NodeID[lt-mobses02.sb.com.au:9510] type = 255 reason = State[ ACTIVE-COORDINATOR ] Received Election Won Msg : L2StateMessage [ NodeID[lt-mobses02.sb.com.au:9510], type = ELECTION_WON_ALREADY, Enrollment [ NodeID[lt-mobses02.sb.com.au:9510], isNew = false, weights = 9223372036854775807,9223372036854775807 ]]. A Terracotta server tried to join the mirror group as a second ACTIVE my weight = [0, -3578404000, 12980939, 11132628575526, 4964923724550007331] 2012-10-01 13:52:33,802 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] WARN com.tc.net.groups.TCGroupManagerImpl - Zapping node : NodeID[lt-mobses02.sb.com.au:9510] type = 255 reason = Recd wrong response from : NodeID[lt-mobses02.sb.com.au:9510] while publishing Cluster State Exception : java.lang.Throwable at com.tc.l2.ha.ReplicatedClusterStateManagerImpl.validateResponse(ReplicatedClusterStateManagerImpl.java:89) at com.tc.l2.ha.ReplicatedClusterStateManagerImpl.publishToAll(ReplicatedClusterStateManagerImpl.java:132) at com.tc.l2.ha.ReplicatedClusterStateManagerImpl.connectionIDCreated(ReplicatedClusterStateManagerImpl.java:118) at com.tc.objectserver.persistence.db.ConnectionIDFactoryImpl.fireCreationEvent(ConnectionIDFactoryImpl.java:71) at com.tc.objectserver.persistence.db.ConnectionIDFactoryImpl.restoreConnectionId(ConnectionIDFactoryImpl.java:65) at com.tc.net.protocol.transport.ServerStackProvider.attachNewConnection(ServerStackProvider.java:124) at com.tc.net.protocol.transport.ServerStackProvider$MessageSink.handleSyn(ServerStackProvider.java:301) at com.tc.net.protocol.transport.ServerStackProvider$MessageSink.verifyAndHandleSyn(ServerStackProvider.java:245) at com.tc.net.protocol.transport.ServerStackProvider$MessageSink.putMessage(ServerStackProvider.java:226) at com.tc.net.protocol.transport.WireProtocolAdaptorImpl.addReadData(WireProtocolAdaptorImpl.java:54) at com.tc.net.protocol.ProtocolSwitch.addReadData(ProtocolSwitch.java:50) at com.tc.net.core.TCConnectionImpl.addNetworkData(TCConnectionImpl.java:777) at com.tc.net.core.TCConnectionImpl.doReadFromBufferInternal(TCConnectionImpl.java:463) at com.tc.net.core.TCConnectionImpl.doReadFromBuffer(TCConnectionImpl.java:306) at com.tc.net.core.TCConnectionImpl.doReadInternal(TCConnectionImpl.java:290) at com.tc.net.core.TCConnectionImpl.doRead(TCConnectionImpl.java:266) at com.tc.net.core.CoreNIOServices$CommThread.selectLoop(CoreNIOServices.java:624) at com.tc.net.core.CoreNIOServices$CommThread.run(CoreNIOServices.java:290) my weight = [0, -3579675000, 12980939, 11132628575526, 359689043280585659] 2012-10-01 13:52:33,946 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ConnectionPolicyImpl - Allocated connection license for jvm 88dd022a-04b0-4e0d-b9ec-c2c785ced97c-13a1a1eea1e; ConnectionPolicy[maxConnections=2147483647, connectedJvmCount=2] 2012-10-01 13:52:33,947 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ConnectionPolicyImpl - New connection [2] from jvm 88dd022a-04b0-4e0d-b9ec-c2c785ced97c-13a1a1eea1e 2012-10-01 13:52:33,947 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ServerStackProvider - User null successfully authenticated 2012-10-01 13:52:33,947 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.TCWorkerCommManager - Selecting [L2_L1:TCWorkerComm # 3, FD, wt:0] from [[L2_L1:TCWorkerComm # 0, FD, wt:0], [L2_L1:TCWorkerComm # 1, FD, wt:0], [L2_L1:TCWorkerComm # 2, FD, wt:0], [L2_L1:TCWorkerComm # 3, FD, wt:0], [L2_L1:TCWorkerComm # 4, FD, wt:0], [L2_L1:TCWorkerComm # 5, FD, wt:0], [L2_L1:TCWorkerComm # 6, FD, wt:0], [L2_L1:TCWorkerComm # 7, FD, wt:0], [L2_L1:TCWorkerComm # 8, FD, wt:0], [L2_L1:TCWorkerComm # 9, FD, wt:0], [L2_L1:TCWorkerComm # 10, FD, wt:0], [L2_L1:TCWorkerComm # 11, FD, wt:0], [L2_L1:TCWorkerComm # 12, FD, wt:0], [L2_L1:TCWorkerComm # 13, FD, wt:0], [L2_L1:TCWorkerComm # 14, FD, wt:0], [L2_L1:TCWorkerComm # 15, FD, wt:0]] 2012-10-01 13:52:33,947 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.CoreNIOServices - selection key cancelled key@998902629 {noformat} When I had a look at the Tomcat thread dump, it seem all the threads are in a BLOCKED status. {noformat} Thread 28127: (state = BLOCKED) - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=156 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() @bci=42, line=1987 (Interpreted frame) - com.tc.object.locks.ClientLockManagerImpl.waitUntilRunning() @bci=98, line=636 (Interpreted frame) - com.tc.object.locks.ClientLockManagerImpl.lock(com.tc.object.locks.LockID, com.tc.object.locks.LockLevel) @bci=1, line=93 (Compiled frame) - com.tc.object.bytecode.ManagerImpl.lock(com.tc.object.locks.LockID, com.tc.object.locks.LockLevel) @bci=14, line=869 (Compiled frame) - com.tc.object.bytecode.ManagerUtil.beginLock(java.lang.String, int) @bci=18, line=221 (Compiled frame) - com.tc.object.bytecode.ManagerUtil.beginLockWithoutTxn(java.lang.String, int) @bci=2, line=229 (Compiled frame) - com.terracotta.session.util.Lock.getReadLock() @bci=5, line=48 (Compiled frame) - com.terracotta.session.util.DefaultSessionId.getSessionInvalidatorReadLock() @bci=12, line=83 (Compiled frame) - com.terracotta.session.SessionDataStore.find(com.terracotta.session.SessionId) @bci=15, line=140 (Compiled frame) - com.terracotta.session.TerracottaSessionManager.getSessionIfExists(com.terracotta.session.SessionId, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=46, line=579 (Compiled frame) - com.terracotta.session.SessionRequest.getTerracottaSession(boolean) @bci=55, line=150 (Compiled frame) - com.terracotta.session.SessionRequest.getSession(boolean) @bci=2, line=109 (Compiled frame) - org.springframework.web.context.request.ServletRequestAttributes.<init>(javax.servlet.http.HttpServletRequest) @bci=29, line=81 (Compiled frame) - org.springframework.web.servlet.DispatcherServlet.doDispatch(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=35, line=825 (Compiled frame) - org.springframework.web.servlet.DispatcherServlet.doService(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=241, line=793 (Compiled frame) - org.springframework.web.servlet.FrameworkServlet.processRequest(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=10, line=476 (Compiled frame) - org.springframework.web.servlet.FrameworkServlet.doGet(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=3, line=431 (Compiled frame) - javax.servlet.http.HttpServlet.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=35, line=617 (Compiled frame) - javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=30, line=717 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=376, line=290 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=101, line=206 (Compiled frame) - com.mobenga.core.filter.DebugFilter.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain) @bci=24, line=36 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=119, line=235 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=101, line=206 (Compiled frame) - org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse, javax.servlet.FilterChain) @bci=59, line=96 (Compiled frame) - org.springframework.web.filter.OncePerRequestFilter.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain) @bci=90, line=75 (Compiled frame) - org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(javax.servlet.Filter, javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain) @bci=5, line=183 (Compiled frame) - org.springframework.web.filter.DelegatingFilterProxy.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain) @bci=34, line=138 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=119, line=235 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=101, line=206 (Compiled frame) - org.apache.catalina.core.StandardWrapperValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=804, line=233 (Compiled frame) - org.apache.catalina.core.StandardContextValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=365, line=191 (Compiled frame) - org.terracotta.modules.tomcat.tomcat_5_5.SessionValve55.tcInvoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=89, line=100 (Compiled frame) - org.terracotta.modules.tomcat.tomcat_5_5.SessionValve55.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=24, line=85 (Compiled frame) - org.terracotta.session.ModernTomcatSessionValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=6, line=66 (Compiled frame) - org.apache.catalina.core.StandardHostValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=64, line=127 (Compiled frame) - org.apache.catalina.valves.ErrorReportValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=6, line=102 (Compiled frame) - org.apache.catalina.core.StandardEngineValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=42, line=109 (Compiled frame) - org.apache.catalina.valves.AccessLogValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=24, line=615 (Compiled frame) - org.apache.catalina.connector.CoyoteAdapter.service(org.apache.coyote.Request, org.apache.coyote.Response) @bci=158, line=293 (Compiled frame) - org.apache.coyote.http11.Http11Processor.process(java.net.Socket) @bci=514, line=859 (Compiled frame) - org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(java.net.Socket) @bci=82, line=602 (Compiled frame) - org.apache.tomcat.util.net.JIoEndpoint$Worker.run() @bci=41, line=489 (Compiled frame) - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame) {noformat} Can someone please take a look? Thank you very much.

    Terracotta Project Issue Tracker | 4 years ago | Allen Qin
    java.lang.Throwable
  2. 0

    I have 2 Tomcat session servers running in active/standby mode. There are 14 app servers running Tomcat 6.0 and connecting to the session servers. Problem: The Tomcat server will crash sometimes and unable to recover. This has happened on prod a few times and I have been able to reproduce it on our test env. It seems load is a factor contributing to the failure but it's not the root cause. Sometimes the problem happens when the load is normal. I believe this has something to do with the Terracotta setup when I checked the TC server and client log file. Below are some of the errors I found after a load test. The problem started to happen after 13:40 in the log file. I've attached the following info: 1. TC server log 2. TC client log 3. Tomcat thread dump 4. TC config Please let me know if you need anything else. TC Client: {noformat} 2012-10-01 13:41:02,563 [http-8080-275] INFO com.tc.object.tx.RemoteTransactionManagerImpl - ClientID[17]: TransactionID=[879335] : Took more than 1000ms to add to sequencer : 1004 ms 2012-10-01 13:41:02,564 [http-8080-728] INFO com.tc.object.tx.RemoteTransactionManagerImpl - ClientID[17]: TransactionID=[879338] : Took more than 1000ms to add to sequencer : 1012 ms 2012-10-01 13:41:11,397 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 1 2012-10-01 13:41:15,388 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 2 2012-10-01 13:41:19,491 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 3 2012-10-01 13:41:23,472 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 4 2012-10-01 13:41:27,412 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 5 2012-10-01 13:41:31,420 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 might be in Long GC. Ping-probe cycles completed since last reply : 6 2012-10-01 13:41:34,210 [ClientLockManager LockGC] INFO com.tc.object.locks.ClientLockManager - ClientID[17]: Lock GC collected 254 garbage locks 2012-10-01 13:41:41,425 [HealthChecker] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - Socket Connect to lt-mobses01.sb.com.au:9510(callbackport:9510) taking long time. probably not reachable. 2012-10-01 13:41:41,425 [HealthChecker] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - lt-mobses01.sb.com.au:9510 is DEAD 2012-10-01 13:41:41,432 [HealthChecker] ERROR com.tc.net.protocol.transport.ConnectionHealthCheckerImpl: DSO Client - Declared connection dead ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[] idle time 38812ms 2012-10-01 13:41:41,560 [L1_L2:TCComm Main Selector Thread_W (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: CLOSE EVENT : com.tc.net.core.TCConnectionImpl@430312049: connected: false, closed: true local=10.32.4.81:12735 remote=10.32.4.95:9510 connect=[Mon Oct 01 12:19:32 EST 2012] idle=7149ms [627982104 read, 908628612 write]. STATUS : DISCONNECTED 2012-10-01 13:41:41,564 [L1_L2:TCComm Main Selector Thread_W (listen 0.0.0.0:64336)] INFO com.tc.net.protocol.tcm.ClientMessageChannel - ClientMessageChannel moves to SessionID=[1] 2012-10-01 13:41:41,564 [WorkerThread(client_coordination_stage, 0)] INFO com.tc.object.handshakemanager.ClientHandshakeManagerImpl - ClientID[17]: Disconnected: Pausing from State[ RUNNING ] RemoteNode : GroupID[0]. Disconnect count: 0 2012-10-01 13:41:41,564 [WorkerThread(client_coordination_stage, 0)] INFO com.tc.object.handshakemanager.ClientHandshakeManagerImpl - ClientID[17]: ClientHandshakeManager moves to SessionID=[1] 2012-10-01 13:41:41,679 [ConnectionEstablisher] INFO com.tc.net.protocol.transport.ClientConnectionEstablisher$AsyncReconnect - Handling connection request: ConnectionRequest [type=RECONNECT, cmt=com.tc.net.protocol.transport.ClientMessageTransport@32f0c513] 2012-10-01 13:41:41,687 [ConnectionEstablisher] WARN com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: Reconnect attempt 0 of unlimited reconnect tries to lt-mobses02.sb.com.au:9510:Tc-Group-0, timeout=10000 2012-10-01 13:41:41,695 [ConnectionEstablisher] WARN com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: Connection refused [lossy interval: 10000ms] 2012-10-01 13:41:45,698 [ConnectionEstablisher] INFO com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: Attaching new connection: com.tc.net.core.TCConnectionImpl@1318131925: connected: true, closed: false local=10.32.4.81:39188 remote=10.32.4.95:9510 connect=[Mon Oct 01 13:41:45 EST 2012] idle=3001ms [0 read, 0 write] 2012-10-01 13:41:45,775 [ConnectionEstablisher] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - Health monitoring agent started for lt-mobses01.sb.com.au:9510 2012-10-01 13:41:45,790 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Client - HealthCheckCallbackPort verification PASSED for lt-mobses01.sb.com.au:9510(callbackport: 9510) 2012-10-01 13:41:45,821 [WorkerThread(client_coordination_stage, 0)] INFO com.tc.object.handshakemanager.ClientHandshakeManagerImpl - ClientID[17]: Connected: Unpausing from State[ PAUSED ] RemoteNode : GroupID[0]. Disconnect count : 1 2012-10-01 13:41:45,847 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] WARN com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: CLOSE EVENT : com.tc.net.core.TCConnectionImpl@1318131925: connected: false, closed: true local=10.32.4.81:39188 remote=10.32.4.95:9510 connect=[Mon Oct 01 13:41:45 EST 2012] idle=23ms [216 read, 352 write]. STATUS : ESTABLISHED 2012-10-01 13:41:45,848 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl: DSO Client - Connection to [lt-mobses01.sb.com.au:9510] DISCONNECTED. Health Monitoring for this node is now disabled. 2012-10-01 13:41:45,859 [ConnectionEstablisher] INFO com.tc.net.protocol.transport.ClientConnectionEstablisher$AsyncReconnect - Handling connection request: RestoreConnectionRequest [type=RESTORE_CONNECTION, clientMessageTransport=com.tc.net.protocol.transport.ClientMessageTransport@32f0c513, callback=com.tc.net.protocol.delivery.OOOConnectionWatcher@506bda8, timeoutMillis=5000, sa=10.32.4.95:9510] 2012-10-01 13:41:45,870 [ConnectionEstablisher] INFO com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[]: Attaching new connection: com.tc.net.core.TCConnectionImpl@2106417963: connected: true, closed: false local=10.32.4.81:39190 remote=10.32.4.95:9510 connect=[Mon Oct 01 13:41:45 EST 2012] idle=11ms [0 read, 0 write] 2012-10-01 13:41:45,960 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] ERROR com.tc.net.protocol.transport.TransportHandshakeErrorHandlerForL1 - com.tc.net.protocol.transport.TransportHandshakeErrorContext: com.tc.net.protocol.transport.TransportHandshakeErrorContext: "Client Cannot Reconnect. ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[] not found. Connection attempts from the Terracotta node at 10.32.4.81:39190 are being rejected by the Terracotta server array."Message Class: com.tc.net.protocol.transport.TransportMessageImpl Sealed: true, Header Length: 32, Data Length: 426, Total Length: 458 Header (com.tc.net.protocol.transport.WireProtocolHeader) Version: 2, Header Length: 8, TOS: 0, TTL: 64, Protocol: TRANSPORT HANDSHAKE Total Packet Length: 458 Adler32 Checksum: 1988298213 (valid: true) Source Addresss: 10.32.4.95 Destination Addresss: 10.32.4.81 Source Port: 9510, Destination Port: 39190 Total Msg Count: 1 Header Validity: true (no message) Payload: type: SYN_ACK, connectionId: ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[], errorContext com.tc.net.protocol.transport.TransportHandshakeErrorContext: "Client Cannot Reconnect. ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[] not found. Connection attempts from the Terracotta node at 10.32.4.81:39190 are being rejected by the Terracotta server array." {noformat} TC Server log {noformat} 2012-10-01 13:52:31,852 [L2_L2:TCWorkerComm # 10_R] WARN com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(7.3efdc8a7efb44662af7065f819369af7.e1c35822-4c48-4c11-831a-f8c4a58ec3e5-13a1a1ec057)[]: closing down connection - com.tc.net.core.TCConnectionImpl@845175459: connected: false, closed: true local=10.32.4.95:9530 remote=10.32.4.96:7112 connect=[Mon Oct 01 13:52:31 EST 2012] idle=448ms [136 read, 0 write] 2012-10-01 13:52:31,755 [L2_L2:TCWorkerComm # 6_W] WARN com.tc.net.core.CoreNIOServices - Exception trying to process interest request: java.nio.channels.ClosedChannelException 2012-10-01 13:52:31,880 [L2_L2:TCWorkerComm # 12_R] WARN com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(8.3efdc8a7efb44662af7065f819369af7.e1c35822-4c48-4c11-831a-f8c4a58ec3e5-13a1a1ec057)[]: CLOSE EVENT : com.tc.net.core.TCConnectionImpl@1651010777: connected: false, closed: true local=10.32.4.95:9530 remote=10.32.4.96:7113 connect=[Mon Oct 01 13:52:31 EST 2012] idle=492ms [136 read, 135 write]. STATUS : START 2012-10-01 13:52:31,880 [WorkerThread(group_handshake_message_stage, 0)] WARN tc.operator.event - NODE : lt-mobses01.sb.com.au Subsystem: CLUSTER_TOPOLOGY Message: Node lt-mobses02.sb.com.au left the cluster 2012-10-01 13:52:31,882 [WorkerThread(group_handshake_message_stage, 0)] WARN com.tc.net.groups.TCGroupManagerImpl - Remove dead member from waitFor response list, dead member: NodeID[lt-mobses02.sb.com.au:9510] 2012-10-01 13:52:31,933 [L2_L2:TCWorkerComm # 12_R] WARN com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(8.3efdc8a7efb44662af7065f819369af7.e1c35822-4c48-4c11-831a-f8c4a58ec3e5-13a1a1ec057)[]: closing down connection - com.tc.net.core.TCConnectionImpl@1651010777: connected: false, closed: true local=10.32.4.95:9530 remote=10.32.4.96:7113 connect=[Mon Oct 01 13:52:31 EST 2012] idle=492ms [136 read, 135 write] 2012-10-01 13:52:31,970 [WorkerThread(garbage_collect_stage, 0)] INFO tc.operator.event - NODE : lt-mobses01.sb.com.au Subsystem: DGC Message: DGC[ 24 ] started. 2012-10-01 13:52:31,936 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ConnectionPolicyImpl - Allocated connection license for jvm f78856f5-791e-400e-8757-1393a2f133d1-13a1a1eea1f; ConnectionPolicy[maxConnections=2147483647, connectedJvmCount=1] 2012-10-01 13:52:31,935 [WorkerThread(group_events_dispatch_stage, 0)] WARN com.tc.l2.ha.L2HACoordinator - NodeID[lt-mobses02.sb.com.au:9510] left the cluster 2012-10-01 13:52:31,935 [WorkerThread(group_handshake_message_stage, 0)] WARN com.tc.net.groups.TCGroupManagerImpl - Removed old member Group Member: NodeID[lt-mobses01.sb.com.au:9510] <-> NodeID[lt-mobses02.sb.com.au:9510] ChannelID[NULL_ID, Status:CLOSED]:10.32.4.95:9530 <--> 10.32.4.96:6895; Ready:false; Joined: false; memberAdding:false; HighPri: true for NodeID[lt-mobses02.sb.com.au:9510] 2012-10-01 13:52:31,973 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ConnectionPolicyImpl - New connection [11] from jvm f78856f5-791e-400e-8757-1393a2f133d1-13a1a1eea1f 2012-10-01 13:52:31,990 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ServerStackProvider - User null successfully authenticated 2012-10-01 13:52:31,991 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.CoreNIOServices - selection key cancelled key@735607632 2012-10-01 13:52:31,993 [WorkerThread(group_handshake_message_stage, 0)] INFO tc.operator.event - NODE : lt-mobses01.sb.com.au Subsystem: CLUSTER_TOPOLOGY Message: Node lt-mobses02.sb.com.au joined the cluster 2012-10-01 13:52:32,060 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(2.c8dd256f25c04e5eabe8bfff0c9ffbea.88dd022a-04b0-4e0d-b9ec-c2c785ced97c-13a1a1eea1e)[]: Attaching new connection: com.tc.net.core.TCConnectionImpl@1712347668: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.124:22752 connect=[Mon Oct 01 13:52:31 EST 2012] idle=68ms [0 read, 0 write] 2012-10-01 13:52:31,973 [WorkerThread(group_events_dispatch_stage, 0)] WARN com.terracottatech.console - NodeID[lt-mobses02.sb.com.au:9510] left the cluster 2012-10-01 13:52:32,092 [WorkerThread(channel_life_cycle_stage, 0)] INFO com.tc.objectserver.handler.ChannelLifeCycleHandler - : Received transport disconnect. Shutting down client NodeID[lt-mobses02.sb.com.au:9510] 2012-10-01 13:52:32,093 [WorkerThread(l2_state_message_handler_stage, 0)] ERROR com.tc.l2.state.StateManagerImpl - State[ ACTIVE-COORDINATOR ] Received Election Won Msg : L2StateMessage [ NodeID[lt-mobses02.sb.com.au:9510], type = ELECTION_WON_ALREADY, Enrollment [ NodeID[lt-mobses02.sb.com.au:9510], isNew = false, weights = 9223372036854775807,9223372036854775807 ]]. A Terracotta server tried to join the mirror group as a second ACTIVE 2012-10-01 13:52:32,093 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] ERROR com.tc.l2.ha.ReplicatedClusterStateManagerImpl - Recd wrong response from : NodeID[lt-mobses02.sb.com.au:9510] : msg = com.tc.l2.msg.ClusterStateMessage@490cb88d while publishing Cluster State: Killing the node 2012-10-01 13:52:32,093 [WorkerThread(group_discovery_stage, 3, 0)] INFO com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(-1.ffffffffffffffffffffffffffffffff.92f6a363-81ef-4ab7-955c-9bf242cd6c15-13a1a1dd3fe)[]: Attaching new connection: com.tc.net.core.TCConnectionImpl@618420666: connected: true, closed: false local=10.32.4.95:31166 remote=10.32.4.96:9530 connect=[Mon Oct 01 13:52:32 EST 2012] idle=156ms [0 read, 0 write] 2012-10-01 13:52:32,094 [WorkerThread(channel_life_cycle_stage, 0)] INFO com.tc.objectserver.persistence.db.TransactionStoreImpl - shutdownClient() : Removing txns from DB : 0 2012-10-01 13:52:32,135 [L2_L1:TCWorkerComm # 2_R] WARN com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(11.c8dd256f25c04e5eabe8bfff0c9ffbea.f78856f5-791e-400e-8757-1393a2f133d1-13a1a1eea1f)[]: CLOSE EVENT : com.tc.net.core.TCConnectionImpl@1874829407: connected: false, closed: true local=10.32.4.95:9510 remote=10.32.4.85:43507 connect=[Mon Oct 01 13:52:31 EST 2012] idle=144ms [136 read, 0 write]. STATUS : START 2012-10-01 13:52:32,136 [L2_L1:TCWorkerComm # 2_R] WARN com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(11.c8dd256f25c04e5eabe8bfff0c9ffbea.f78856f5-791e-400e-8757-1393a2f133d1-13a1a1eea1f)[]: closing down connection - com.tc.net.core.TCConnectionImpl@1874829407: connected: false, closed: true local=10.32.4.95:9510 remote=10.32.4.85:43507 connect=[Mon Oct 01 13:52:31 EST 2012] idle=145ms [136 read, 0 write] 2012-10-01 13:52:32,136 [L2_L1:TCWorkerComm # 2_W] WARN com.tc.net.core.CoreNIOServices - Exception trying to process interest request: java.nio.channels.ClosedChannelException 2012-10-01 13:52:32,136 [WorkerThread(group_discovery_stage, 3, 0)] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. TCGroupManager - Health monitoring agent started for lt-mobses02.sb.com.au:9530 2012-10-01 13:52:32,136 [WorkerThread(receive_group_message_stage, 0)] WARN com.tc.l2.ha.L2HAZapNodeRequestProcessor - State[ ACTIVE-COORDINATOR ] received Zap Node request from another State[ ACTIVE-COORDINATOR ] NodeID : NodeID[lt-mobses02.sb.com.au:9510] Error Type : Two or more Active servers detected in the cluster Details : Recd ClusterStateMessage from : NodeID[lt-mobses01.sb.com.au:9510] while in ACTIVE-COORDINATOR state Exception : java.lang.Throwable at com.tc.l2.ha.ReplicatedClusterStateManagerImpl.handleClusterStateMessage(ReplicatedClusterStateManagerImpl.java:155) at com.tc.l2.ha.ReplicatedClusterStateManagerImpl.messageReceived(ReplicatedClusterStateManagerImpl.java:143) at com.tc.net.groups.TCGroupManagerImpl.fireMessageReceivedEvent(TCGroupManagerImpl.java:721) at com.tc.net.groups.TCGroupManagerImpl.messageReceived(TCGroupManagerImpl.java:681) at com.tc.objectserver.handler.ReceiveGroupMessageHandler.handleEvent(ReceiveGroupMessageHandler.java:22) at com.tc.async.impl.StageImpl$WorkerThread.run(StageImpl.java:145) 2012-10-01 13:52:32,812 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] WARN com.terracottatech.console - Requesting node to quit : NodeID : NodeID[lt-mobses02.sb.com.au:9510] Error Type : Two or more Active servers detected in the cluster 2012-10-01 13:52:32,812 [WorkerThread(l2_state_message_handler_stage, 0)] WARN com.terracottatech.console - Requesting node to quit : NodeID : NodeID[lt-mobses02.sb.com.au:9510] Error Type : Two or more Active servers detected in the cluster 2012-10-01 13:52:33,002 [WorkerThread(group_events_dispatch_stage, 0)] INFO com.tc.l2.ha.L2HACoordinator - NodeID[lt-mobses02.sb.com.au:9510] joined the cluster 2012-10-01 13:52:33,317 [L2_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:9530)] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. TCGroupManager - HealthCheckCallbackPort verification PASSED for lt-mobses02.sb.com.au:9530(callbackport: 9530) 2012-10-01 13:52:33,003 [WorkerThread(group_discovery_stage, 3, 0)] WARN com.tc.net.groups.TCGroupMemberDiscoveryStatic - DiscoverStateMachine [Node{host=lt-mobses02.sb.com.au:9510}]: Ignored switching state from: Connecting, to: Connected, current: Member-In-Group 2012-10-01 13:52:33,363 [WorkerThread(group_events_dispatch_stage, 0)] INFO com.terracottatech.console - NodeID[lt-mobses02.sb.com.au:9510] joined the cluster 2012-10-01 13:52:33,621 [WorkerThread(group_handshake_message_stage, 0)] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl: TCGroupManager - Connection to [lt-mobses02.sb.com.au:9530] CLOSED. Health Monitoring for this node is now disabled. 2012-10-01 13:52:33,622 [WorkerThread(group_handshake_message_stage, 0)] WARN com.tc.net.groups.TCGroupMemberDiscoveryStatic - DiscoverStateMachine [Node{host=lt-mobses02.sb.com.au:9510}]: Ignored switching state from: Connecting, to: Init, current: Member-In-Group 2012-10-01 13:52:33,622 [WorkerThread(group_handshake_message_stage, 0)] WARN com.tc.net.groups.TCGroupMemberDiscoveryStatic - DiscoverStateMachine [Node{host=lt-mobses02.sb.com.au:9510}]: Ignored switching state from: Connected, to: Init, current: Member-In-Group 2012-10-01 13:52:33,745 [WorkerThread(receive_group_message_stage, 0)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@835486647: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.131:57157 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1753ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,745 [WorkerThread(receive_group_message_stage, 0)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@1712347668: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.124:22752 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1753ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,799 [WorkerThread(receive_group_message_stage, 0)] INFO com.tc.net.core.TCConnectionManager - Active connections : 0 out of 2 2012-10-01 13:52:33,800 [WorkerThread(receive_group_message_stage, 0)] WARN com.tc.l2.ha.L2HAZapNodeRequestProcessor - A Terracotta server tried to join the mirror group as a second ACTIVE : My weights = 0,-3576867000,12980939,11132628575526,8778404219038062035 Other servers weights = 7,-42000,227168,11132628641062,-2552508849076222291 2012-10-01 13:52:33,800 [WorkerThread(l2_state_message_handler_stage, 0)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@835486647: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.131:57157 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1808ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,800 [WorkerThread(receive_group_message_stage, 0)] FATAL tc.operator.event - NODE : lt-mobses01.sb.com.au Subsystem: CLUSTER_TOPOLOGY Message: SPLIT BRAIN, lt-mobses01.sb.com.au and lt-mobses02.sb.com.au are ACTIVE 2012-10-01 13:52:33,800 [WorkerThread(l2_state_message_handler_stage, 0)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@1712347668: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.124:22752 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1808ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,800 [WorkerThread(receive_group_message_stage, 0)] WARN com.tc.l2.ha.L2HAZapNodeRequestProcessor - NodeID[lt-mobses02.sb.com.au:9510] wins : Backing off : Exiting !!! 2012-10-01 13:52:33,800 [WorkerThread(l2_state_message_handler_stage, 0)] INFO com.tc.net.core.TCConnectionManager - Active connections : 0 out of 2 2012-10-01 13:52:33,801 [WorkerThread(receive_group_message_stage, 0)] FATAL tc.operator.event - NODE : lt-mobses01.sb.com.au Subsystem: CLUSTER_TOPOLOGY Message: NodeID[lt-mobses02.sb.com.au:9510] has more clients. Exiting!! 2012-10-01 13:52:33,801 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@835486647: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.131:57157 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1809ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,802 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.TCConnectionManager - com.tc.net.core.TCConnectionImpl@1712347668: connected: true, closed: false local=10.32.4.95:9510 remote=10.32.4.124:22752 connect=[Mon Oct 01 13:52:31 EST 2012] idle=1810ms [0 read, 0 write] is not active; Max allowed Idle time:8000; Transport Established: false 2012-10-01 13:52:33,802 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.TCConnectionManager - Active connections : 0 out of 2 2012-10-01 13:52:33,801 [WorkerThread(l2_state_message_handler_stage, 0)] WARN com.tc.net.groups.TCGroupManagerImpl - Zapping node : NodeID[lt-mobses02.sb.com.au:9510] type = 255 reason = State[ ACTIVE-COORDINATOR ] Received Election Won Msg : L2StateMessage [ NodeID[lt-mobses02.sb.com.au:9510], type = ELECTION_WON_ALREADY, Enrollment [ NodeID[lt-mobses02.sb.com.au:9510], isNew = false, weights = 9223372036854775807,9223372036854775807 ]]. A Terracotta server tried to join the mirror group as a second ACTIVE my weight = [0, -3578404000, 12980939, 11132628575526, 4964923724550007331] 2012-10-01 13:52:33,802 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] WARN com.tc.net.groups.TCGroupManagerImpl - Zapping node : NodeID[lt-mobses02.sb.com.au:9510] type = 255 reason = Recd wrong response from : NodeID[lt-mobses02.sb.com.au:9510] while publishing Cluster State Exception : java.lang.Throwable at com.tc.l2.ha.ReplicatedClusterStateManagerImpl.validateResponse(ReplicatedClusterStateManagerImpl.java:89) at com.tc.l2.ha.ReplicatedClusterStateManagerImpl.publishToAll(ReplicatedClusterStateManagerImpl.java:132) at com.tc.l2.ha.ReplicatedClusterStateManagerImpl.connectionIDCreated(ReplicatedClusterStateManagerImpl.java:118) at com.tc.objectserver.persistence.db.ConnectionIDFactoryImpl.fireCreationEvent(ConnectionIDFactoryImpl.java:71) at com.tc.objectserver.persistence.db.ConnectionIDFactoryImpl.restoreConnectionId(ConnectionIDFactoryImpl.java:65) at com.tc.net.protocol.transport.ServerStackProvider.attachNewConnection(ServerStackProvider.java:124) at com.tc.net.protocol.transport.ServerStackProvider$MessageSink.handleSyn(ServerStackProvider.java:301) at com.tc.net.protocol.transport.ServerStackProvider$MessageSink.verifyAndHandleSyn(ServerStackProvider.java:245) at com.tc.net.protocol.transport.ServerStackProvider$MessageSink.putMessage(ServerStackProvider.java:226) at com.tc.net.protocol.transport.WireProtocolAdaptorImpl.addReadData(WireProtocolAdaptorImpl.java:54) at com.tc.net.protocol.ProtocolSwitch.addReadData(ProtocolSwitch.java:50) at com.tc.net.core.TCConnectionImpl.addNetworkData(TCConnectionImpl.java:777) at com.tc.net.core.TCConnectionImpl.doReadFromBufferInternal(TCConnectionImpl.java:463) at com.tc.net.core.TCConnectionImpl.doReadFromBuffer(TCConnectionImpl.java:306) at com.tc.net.core.TCConnectionImpl.doReadInternal(TCConnectionImpl.java:290) at com.tc.net.core.TCConnectionImpl.doRead(TCConnectionImpl.java:266) at com.tc.net.core.CoreNIOServices$CommThread.selectLoop(CoreNIOServices.java:624) at com.tc.net.core.CoreNIOServices$CommThread.run(CoreNIOServices.java:290) my weight = [0, -3579675000, 12980939, 11132628575526, 359689043280585659] 2012-10-01 13:52:33,946 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ConnectionPolicyImpl - Allocated connection license for jvm 88dd022a-04b0-4e0d-b9ec-c2c785ced97c-13a1a1eea1e; ConnectionPolicy[maxConnections=2147483647, connectedJvmCount=2] 2012-10-01 13:52:33,947 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ConnectionPolicyImpl - New connection [2] from jvm 88dd022a-04b0-4e0d-b9ec-c2c785ced97c-13a1a1eea1e 2012-10-01 13:52:33,947 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.protocol.transport.ServerStackProvider - User null successfully authenticated 2012-10-01 13:52:33,947 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.TCWorkerCommManager - Selecting [L2_L1:TCWorkerComm # 3, FD, wt:0] from [[L2_L1:TCWorkerComm # 0, FD, wt:0], [L2_L1:TCWorkerComm # 1, FD, wt:0], [L2_L1:TCWorkerComm # 2, FD, wt:0], [L2_L1:TCWorkerComm # 3, FD, wt:0], [L2_L1:TCWorkerComm # 4, FD, wt:0], [L2_L1:TCWorkerComm # 5, FD, wt:0], [L2_L1:TCWorkerComm # 6, FD, wt:0], [L2_L1:TCWorkerComm # 7, FD, wt:0], [L2_L1:TCWorkerComm # 8, FD, wt:0], [L2_L1:TCWorkerComm # 9, FD, wt:0], [L2_L1:TCWorkerComm # 10, FD, wt:0], [L2_L1:TCWorkerComm # 11, FD, wt:0], [L2_L1:TCWorkerComm # 12, FD, wt:0], [L2_L1:TCWorkerComm # 13, FD, wt:0], [L2_L1:TCWorkerComm # 14, FD, wt:0], [L2_L1:TCWorkerComm # 15, FD, wt:0]] 2012-10-01 13:52:33,947 [L2_L1:TCComm Main Selector Thread_R (listen 0.0.0.0:9510)] INFO com.tc.net.core.CoreNIOServices - selection key cancelled key@998902629 {noformat} When I had a look at the Tomcat thread dump, it seem all the threads are in a BLOCKED status. {noformat} Thread 28127: (state = BLOCKED) - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=156 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() @bci=42, line=1987 (Interpreted frame) - com.tc.object.locks.ClientLockManagerImpl.waitUntilRunning() @bci=98, line=636 (Interpreted frame) - com.tc.object.locks.ClientLockManagerImpl.lock(com.tc.object.locks.LockID, com.tc.object.locks.LockLevel) @bci=1, line=93 (Compiled frame) - com.tc.object.bytecode.ManagerImpl.lock(com.tc.object.locks.LockID, com.tc.object.locks.LockLevel) @bci=14, line=869 (Compiled frame) - com.tc.object.bytecode.ManagerUtil.beginLock(java.lang.String, int) @bci=18, line=221 (Compiled frame) - com.tc.object.bytecode.ManagerUtil.beginLockWithoutTxn(java.lang.String, int) @bci=2, line=229 (Compiled frame) - com.terracotta.session.util.Lock.getReadLock() @bci=5, line=48 (Compiled frame) - com.terracotta.session.util.DefaultSessionId.getSessionInvalidatorReadLock() @bci=12, line=83 (Compiled frame) - com.terracotta.session.SessionDataStore.find(com.terracotta.session.SessionId) @bci=15, line=140 (Compiled frame) - com.terracotta.session.TerracottaSessionManager.getSessionIfExists(com.terracotta.session.SessionId, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=46, line=579 (Compiled frame) - com.terracotta.session.SessionRequest.getTerracottaSession(boolean) @bci=55, line=150 (Compiled frame) - com.terracotta.session.SessionRequest.getSession(boolean) @bci=2, line=109 (Compiled frame) - org.springframework.web.context.request.ServletRequestAttributes.<init>(javax.servlet.http.HttpServletRequest) @bci=29, line=81 (Compiled frame) - org.springframework.web.servlet.DispatcherServlet.doDispatch(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=35, line=825 (Compiled frame) - org.springframework.web.servlet.DispatcherServlet.doService(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=241, line=793 (Compiled frame) - org.springframework.web.servlet.FrameworkServlet.processRequest(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=10, line=476 (Compiled frame) - org.springframework.web.servlet.FrameworkServlet.doGet(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=3, line=431 (Compiled frame) - javax.servlet.http.HttpServlet.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=35, line=617 (Compiled frame) - javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=30, line=717 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=376, line=290 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=101, line=206 (Compiled frame) - com.mobenga.core.filter.DebugFilter.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain) @bci=24, line=36 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=119, line=235 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=101, line=206 (Compiled frame) - org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse, javax.servlet.FilterChain) @bci=59, line=96 (Compiled frame) - org.springframework.web.filter.OncePerRequestFilter.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain) @bci=90, line=75 (Compiled frame) - org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(javax.servlet.Filter, javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain) @bci=5, line=183 (Compiled frame) - org.springframework.web.filter.DelegatingFilterProxy.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain) @bci=34, line=138 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=119, line=235 (Compiled frame) - org.apache.catalina.core.ApplicationFilterChain.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=101, line=206 (Compiled frame) - org.apache.catalina.core.StandardWrapperValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=804, line=233 (Compiled frame) - org.apache.catalina.core.StandardContextValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=365, line=191 (Compiled frame) - org.terracotta.modules.tomcat.tomcat_5_5.SessionValve55.tcInvoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=89, line=100 (Compiled frame) - org.terracotta.modules.tomcat.tomcat_5_5.SessionValve55.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=24, line=85 (Compiled frame) - org.terracotta.session.ModernTomcatSessionValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=6, line=66 (Compiled frame) - org.apache.catalina.core.StandardHostValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=64, line=127 (Compiled frame) - org.apache.catalina.valves.ErrorReportValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=6, line=102 (Compiled frame) - org.apache.catalina.core.StandardEngineValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=42, line=109 (Compiled frame) - org.apache.catalina.valves.AccessLogValve.invoke(org.apache.catalina.connector.Request, org.apache.catalina.connector.Response) @bci=24, line=615 (Compiled frame) - org.apache.catalina.connector.CoyoteAdapter.service(org.apache.coyote.Request, org.apache.coyote.Response) @bci=158, line=293 (Compiled frame) - org.apache.coyote.http11.Http11Processor.process(java.net.Socket) @bci=514, line=859 (Compiled frame) - org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(java.net.Socket) @bci=82, line=602 (Compiled frame) - org.apache.tomcat.util.net.JIoEndpoint$Worker.run() @bci=41, line=489 (Compiled frame) - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame) {noformat} Can someone please take a look? Thank you very much.

    Terracotta Project Issue Tracker | 4 years ago | Allen Qin
    java.lang.Throwable

    Root Cause Analysis

    1. java.lang.Throwable

      No message provided

      at com.tc.l2.ha.ReplicatedClusterStateManagerImpl.handleClusterStateMessage()
    2. com.tc.l2
      ReplicatedClusterStateManagerImpl.messageReceived
      1. com.tc.l2.ha.ReplicatedClusterStateManagerImpl.handleClusterStateMessage(ReplicatedClusterStateManagerImpl.java:155)
      2. com.tc.l2.ha.ReplicatedClusterStateManagerImpl.messageReceived(ReplicatedClusterStateManagerImpl.java:143)
      2 frames
    3. com.tc.net
      TCGroupManagerImpl.messageReceived
      1. com.tc.net.groups.TCGroupManagerImpl.fireMessageReceivedEvent(TCGroupManagerImpl.java:721)
      2. com.tc.net.groups.TCGroupManagerImpl.messageReceived(TCGroupManagerImpl.java:681)
      2 frames
    4. com.tc.objectserver
      ReceiveGroupMessageHandler.handleEvent
      1. com.tc.objectserver.handler.ReceiveGroupMessageHandler.handleEvent(ReceiveGroupMessageHandler.java:22)
      1 frame
    5. com.tc.async
      StageImpl$WorkerThread.run
      1. com.tc.async.impl.StageImpl$WorkerThread.run(StageImpl.java:145)
      1 frame