You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a crash scenario, 80.64 was previous leader then pull out its link interface. 80.58 become new leader during its start process. isConnected(rc, endipoint) return true and new leader start Replicator even in a down connection.
It seems sofa-bolt didn't close the connection in this short period yet.
So the trick thing is if called getAlivePeers() through cliService will return the unreachable nodes between 2023-02-06 10:40:45 and 2023-02-06 10:40:50 because r.lastRpcSendTimestamp = Utils.monotonicMs(); was updated to now.
below is some log info
2023-02-06 10:40:29.655 [JRaft-FSMCaller-Disruptor-0] INFO StateMachineAdapter:84 - onStopFollowing: LeaderChangeContext [leaderId=192.168.80.64:8082, term=4, status=Status[ERAFTTIMEDOUT<10001>: Lost connection from leader 192.168.80.64:8082.]].
2023-02-06 10:40:29.655 [JRaft-ElectionTimer-<election--1/192.168.80.58:8082>0] INFO NodeImpl:2650 - Node <election--1/192.168.80.58:8082> term 4 start preVote.
2023-02-06 10:40:29.658 [Bolt-conn-event-executor-15-thread-1] INFO ClientServiceConnectionEventProcessor:50 - Peer 192.168.80.118:8082 is connected
2023-02-06 10:40:29.663 [JRaft-RPC-Processor-4] INFO NodeImpl:2604 - Node <election--1/192.168.80.58:8082> received PreVoteResponse from 192.168.80.118:8082, term=4, granted=true.
2023-02-06 10:40:30.145 [Rpc-netty-server-worker-10-thread-9] INFO NamedThreadFactory:82 - Creates new Thread[rheakv-raft-rpc-executor #11,5,main].
2023-02-06 10:40:30.145 [rheakv-raft-rpc-executor #11] INFO NodeImpl:1697 - Node <election--1/192.168.80.58:8082> received PreVoteRequest from 192.168.80.117:8082, term=5, currTerm=4, granted=true, requestLastLogId=LogId [index=45295, term=4], lastLogId=LogId [index=45295, term=4].
2023-02-06 10:40:30.148 [JRaft-RPC-Processor-5] INFO NodeImpl:2604 - Node <election--1/192.168.80.58:8082> received PreVoteResponse from 192.168.80.117:8082, term=4, granted=true.
2023-02-06 10:40:30.148 [JRaft-RPC-Processor-5] INFO NodeImpl:1126 - Node <election--1/192.168.80.58:8082> start vote and grant vote self, term=4.
2023-02-06 10:40:30.154 [JRaft-RPC-Processor-5] INFO LocalRaftMetaStorage:132 - Save raft meta, path=/tmp/election/raft_data_region_-1_8082/meta, term=5, votedFor=192.168.80.58:8082, cost time=5 ms
2023-02-06 10:40:31.156 [Rpc-netty-server-worker-10-thread-9] INFO NamedThreadFactory:82 - Creates new Thread[rheakv-raft-rpc-executor #12,5,main].
2023-02-06 10:40:31.156 [rheakv-raft-rpc-executor #12] INFO NodeImpl:1768 - Node <election--1/192.168.80.58:8082> received RequestVoteRequest from 192.168.80.117:8082, term=5, currTerm=5.
2023-02-06 10:40:34.659 [JRaft-RPC-Processor-8] WARN NodeImpl:2639 - Node <election--1/192.168.80.58:8082> PreVote to 192.168.80.64:8082 error: Status[EINTERNAL<1004>: RPC exception:Invoke timeout when invoke with callback.The address is 192.168.80.64:8082].
2023-02-06 10:40:35.158 [JRaft-RPC-Processor-9] WARN NodeImpl:2576 - Node <election--1/192.168.80.58:8082> RequestVote to 192.168.80.64:8082 error: Status[EINTERNAL<1004>: RPC exception:Invoke timeout when invoke with callback.The address is 192.168.80.64:8082].
2023-02-06 10:40:37.127 [Rpc-netty-server-worker-10-thread-10] INFO NamedThreadFactory:82 - Creates new Thread[rheakv-raft-rpc-executor #13,5,main].
2023-02-06 10:40:37.128 [rheakv-raft-rpc-executor #13] INFO NodeImpl:1697 - Node <election--1/192.168.80.58:8082> received PreVoteRequest from 192.168.80.118:8082, term=6, currTerm=5, granted=true, requestLastLogId=LogId [index=45295, term=4], lastLogId=LogId [index=45295, term=4].
2023-02-06 10:40:37.401 [JRaft-VoteTimer-<election--1/192.168.80.58:8082>0] WARN NodeImpl:2717 - Candidate node <election--1/192.168.80.58:8082> term 5 steps down when election reaching vote timeout: fail to get quorum vote-granted.
2023-02-06 10:40:37.401 [JRaft-VoteTimer-<election--1/192.168.80.58:8082>0] INFO NodeImpl:2650 - Node <election--1/192.168.80.58:8082> term 5 start preVote.
2023-02-06 10:40:37.403 [JRaft-RPC-Processor-10] INFO NodeImpl:2604 - Node <election--1/192.168.80.58:8082> received PreVoteResponse from 192.168.80.118:8082, term=5, granted=true.
2023-02-06 10:40:37.863 [Rpc-netty-server-worker-10-thread-9] INFO NamedThreadFactory:82 - Creates new Thread[rheakv-raft-rpc-executor #14,5,main].
2023-02-06 10:40:37.863 [rheakv-raft-rpc-executor #14] INFO NodeImpl:1697 - Node <election--1/192.168.80.58:8082> received PreVoteRequest from 192.168.80.117:8082, term=6, currTerm=5, granted=true, requestLastLogId=LogId [index=45295, term=4], lastLogId=LogId [index=45295, term=4].
2023-02-06 10:40:37.864 [JRaft-RPC-Processor-11] INFO NodeImpl:2604 - Node <election--1/192.168.80.58:8082> received PreVoteResponse from 192.168.80.117:8082, term=5, granted=true.
2023-02-06 10:40:37.864 [JRaft-RPC-Processor-11] INFO NodeImpl:1126 - Node <election--1/192.168.80.58:8082> start vote and grant vote self, term=5.
2023-02-06 10:40:37.871 [JRaft-RPC-Processor-11] INFO LocalRaftMetaStorage:132 - Save raft meta, path=/tmp/election/raft_data_region_-1_8082/meta, term=6, votedFor=192.168.80.58:8082, cost time=7 ms
2023-02-06 10:40:38.870 [Rpc-netty-server-worker-10-thread-9] INFO NamedThreadFactory:82 - Creates new Thread[rheakv-raft-rpc-executor #15,5,main].
2023-02-06 10:40:38.870 [rheakv-raft-rpc-executor #15] INFO NodeImpl:1768 - Node <election--1/192.168.80.58:8082> received RequestVoteRequest from 192.168.80.117:8082, term=6, currTerm=6.
2023-02-06 10:40:38.872 [Rpc-netty-server-worker-10-thread-10] INFO NamedThreadFactory:82 - Creates new Thread[rheakv-raft-rpc-executor #16,5,main].
2023-02-06 10:40:38.873 [rheakv-raft-rpc-executor #16] INFO NodeImpl:1768 - Node <election--1/192.168.80.58:8082> received RequestVoteRequest from 192.168.80.118:8082, term=6, currTerm=6.
2023-02-06 10:40:42.407 [JRaft-RPC-Processor-14] WARN NodeImpl:2639 - Node <election--1/192.168.80.58:8082> PreVote to 192.168.80.64:8082 error: Status[EINTERNAL<1004>: RPC exception:Invoke timeout when invoke with callback.The address is 192.168.80.64:8082].
2023-02-06 10:40:42.867 [JRaft-RPC-Processor-15] WARN NodeImpl:2576 - Node <election--1/192.168.80.58:8082> RequestVote to 192.168.80.64:8082 error: Status[EINTERNAL<1004>: RPC exception:Invoke timeout when invoke with callback.The address is 192.168.80.64:8082].
2023-02-06 10:40:44.401 [JRaft-VoteTimer-<election--1/192.168.80.58:8082>0] WARN NodeImpl:2717 - Candidate node <election--1/192.168.80.58:8082> term 6 steps down when election reaching vote timeout: fail to get quorum vote-granted.
2023-02-06 10:40:44.401 [JRaft-VoteTimer-<election--1/192.168.80.58:8082>0] INFO NodeImpl:2650 - Node <election--1/192.168.80.58:8082> term 6 start preVote.
2023-02-06 10:40:44.404 [JRaft-RPC-Processor-16] INFO NodeImpl:2604 - Node <election--1/192.168.80.58:8082> received PreVoteResponse from 192.168.80.117:8082, term=6, granted=true.
2023-02-06 10:40:45.091 [Rpc-netty-server-worker-10-thread-10] INFO NamedThreadFactory:82 - Creates new Thread[rheakv-raft-rpc-executor #17,5,main].
2023-02-06 10:40:45.091 [rheakv-raft-rpc-executor #17] INFO NodeImpl:1697 - Node <election--1/192.168.80.58:8082> received PreVoteRequest from 192.168.80.118:8082, term=7, currTerm=6, granted=true, requestLastLogId=LogId [index=45295, term=4], lastLogId=LogId [index=45295, term=4].
2023-02-06 10:40:45.092 [JRaft-RPC-Processor-17] INFO NodeImpl:2604 - Node <election--1/192.168.80.58:8082> received PreVoteResponse from 192.168.80.118:8082, term=6, granted=true.
2023-02-06 10:40:45.092 [JRaft-RPC-Processor-17] INFO NodeImpl:1126 - Node <election--1/192.168.80.58:8082> start vote and grant vote self, term=6.
2023-02-06 10:40:45.098 [JRaft-RPC-Processor-17] INFO LocalRaftMetaStorage:132 - Save raft meta, path=/tmp/election/raft_data_region_-1_8082/meta, term=7, votedFor=192.168.80.58:8082, cost time=6 ms
2023-02-06 10:40:45.821 [Rpc-netty-server-worker-10-thread-9] INFO NamedThreadFactory:82 - Creates new Thread[rheakv-raft-rpc-executor #18,5,main].
2023-02-06 10:40:45.822 [rheakv-raft-rpc-executor #18] INFO NodeImpl:1697 - Node <election--1/192.168.80.58:8082> received PreVoteRequest from 192.168.80.117:8082, term=7, currTerm=7, granted=true, requestLastLogId=LogId [index=45295, term=4], lastLogId=LogId [index=45295, term=4].
2023-02-06 10:40:45.833 [JRaft-RPC-Processor-19] INFO NodeImpl:1223 - Node <election--1/192.168.80.58:8082> become leader of group, term=7, conf=192.168.80.58:8082,192.168.80.64:8082,192.168.80.117:8082,192.168.80.118:8082, oldConf=.
2023-02-06 10:40:45.848 [JRaft-RPC-Processor-19] INFO Replicator:911 - Replicator=Replicator [state=Created, statInfo=<running=null, firstLogIndex=0, lastLogIncluded=0, lastLogIndex=0, lastTermIncluded=0>, peerId=192.168.80.64:8082, waitId=-1, type=Follower]@192.168.80.64:8082 is started
2023-02-06 10:40:45.855 [JRaft-RPC-Processor-19] INFO Replicator:911 - Replicator=Replicator [state=Created, statInfo=<running=null, firstLogIndex=0, lastLogIncluded=0, lastLogIndex=0, lastTermIncluded=0>, peerId=192.168.80.118:8082, waitId=-1, type=Follower]@192.168.80.118:8082 is started
2023-02-06 10:40:45.855 [JRaft-RPC-Processor-19] INFO Replicator:911 - Replicator=Replicator [state=Created, statInfo=<running=null, firstLogIndex=0, lastLogIncluded=0, lastLogIndex=0, lastTermIncluded=0>, peerId=192.168.80.117:8082, waitId=-1, type=Follower]@192.168.80.117:8082 is started
2023-02-06 10:40:45.874 [JRaft-FSMCaller-Disruptor-0] INFO StateMachineAdapter:62 - onLeaderStart: term=7.
2023-02-06 10:40:46.860 [Rpc-netty-server-worker-10-thread-12] INFO NamedThreadFactory:82 - Creates new Thread[rheakv-kv-store-rpc-executor #0,5,main].
2023-02-06 10:40:46.869 [Rpc-netty-server-worker-10-thread-13] INFO NamedThreadFactory:82 - Creates new Thread[rheakv-kv-store-rpc-executor #1,5,main].
2023-02-06 10:40:46.881 [Rpc-netty-server-worker-10-thread-12] INFO NamedThreadFactory:82 - Creates new Thread[rheakv-kv-store-rpc-executor #2,5,main].
2023-02-06 10:40:46.892 [Rpc-netty-server-worker-10-thread-12] INFO NamedThreadFactory:82 - Creates new Thread[rheakv-kv-store-rpc-executor #3,5,main].
2023-02-06 10:40:48.857 [Append-Entries-Thread-Send2] WARN Replicator:1201 - Fail to issue RPC to 192.168.80.64:8082, consecutiveErrorTimes=1, error=Status[EINTERNAL<1004>: RPC exception:Invoke timeout when invoke with callback.The address is 192.168.80.64:8082]
2023-02-06 10:40:49.407 [JRaft-RPC-Processor-20] WARN NodeImpl:2639 - Node <election--1/192.168.80.58:8082> PreVote to 192.168.80.64:8082 error: Status[EINTERNAL<1004>: RPC exception:Invoke timeout when invoke with callback.The address is 192.168.80.64:8082].
2023-02-06 10:40:50.098 [JRaft-RPC-Processor-21] WARN NodeImpl:2576 - Node <election--1/192.168.80.58:8082> RequestVote to 192.168.80.64:8082 error: Status[EINTERNAL<1004>: RPC exception:Invoke timeout when invoke with callback.The address is 192.168.80.64:8082].
2023-02-06 10:40:53.265 [Rpc-netty-server-worker-10-thread-13] INFO NamedThreadFactory:82 - Creates new Thread[rheakv-kv-store-rpc-executor #4,5,main].
2023-02-06 10:40:53.268 [Rpc-netty-server-worker-10-thread-13] INFO NamedThreadFactory:82 - Creates new Thread[rheakv-kv-store-rpc-executor #5,5,main].
2023-02-06 10:40:53.274 [Rpc-netty-server-worker-10-thread-13] INFO NamedThreadFactory:82 - Creates new Thread[rheakv-kv-store-rpc-executor #6,5,main].
2023-02-06 10:40:59.241 [Rpc-netty-server-worker-10-thread-12] INFO NamedThreadFactory:82 - Creates new Thread[rheakv-kv-store-rpc-executor #7,5,main].
2023-02-06 10:41:07.387 [Append-Entries-Thread-Send2] WARN Replicator:1434 - Fail to issue RPC to 192.168.80.64:8082, consecutiveErrorTimes=11, error=Status[EINTERNAL<1004>: RPC exception:Invoke timeout when invoke with callback.The address is 192.168.80.64:8082]
2023-02-06 10:41:08.242 [Rpc-netty-server-worker-10-thread-13] INFO NamedThreadFactory:82 - Creates new Thread[rheakv-kv-store-rpc-executor #8,5,main].
2023-02-06 10:41:17.337 [Append-Entries-Thread-Send2] WARN Replicator:1434 - Fail to issue RPC to 192.168.80.64:8082, consecutiveErrorTimes=21, error=Status[EINTERNAL<1004>: Check connection[192.168.80.64:8082] fail and try to create new one]
2023-02-06 10:41:19.841 [Append-Entries-Thread-Send2] WARN Replicator:1434 - Fail to issue RPC to 192.168.80.64:8082, consecutiveErrorTimes=31, error=Status[EINTERNAL<1004>: Check connection[192.168.80.64:8082] fail and try to create new one]
2023-02-06 10:41:22.343 [Append-Entries-Thread-Send2] WARN Replicator:1434 - Fail to issue RPC to 192.168.80.64:8082, consecutiveErrorTimes=41, error=Status[EINTERNAL<1004>: Check connection[192.168.80.64:8082] fail and try to create new one]
2023-02-06 10:41:24.846 [Append-Entries-Thread-Send2] WARN Replicator:1434 - Fail to issue RPC to 192.168.80.64:8082, consecutiveErrorTimes=51, error=Status[EINTERNAL<1004>: Check connection[192.168.80.64:8082] fail and try to create new one]
2023-02-06 10:41:27.347 [Append-Entries-Thread-Send2] WARN Replicator:1434 - Fail to issue RPC to 192.168.80.64:8082, consecutiveErrorTimes=61, error=Status[EINTERNAL<1004>: Check connection[192.168.80.64:8082] fail and try to create new one]
Expected behavior
Original connection is exists, but not able to use.
Do an explicit connection ping-pong check to make sure the connection is works fine.
diff --git a/jraft-core/src/main/java/com/alipay/sofa/jraft/rpc/impl/AbstractClientService.java b/jraft-core/src/main/java/com/alipay/sofa/jraft/rpc/impl/AbstractClientService.java
index fec82f7..9166db1 100644
--- a/jraft-core/src/main/java/com/alipay/sofa/jraft/rpc/impl/AbstractClientService.java
+++ b/jraft-core/src/main/java/com/alipay/sofa/jraft/rpc/impl/AbstractClientService.java
@@ -139,9 +139,10 @@ public abstract class AbstractClientService implements ClientService {
if (rc == null) {
throw new IllegalStateException("Client service is uninitialized.");
}
- if (isConnected(rc, endpoint)) {
- return true;
- }
+ isConnected(rc, endpoint);
try {
final PingRequest req = PingRequest.newBuilder() //
.setSendTimestamp(System.currentTimeMillis()) //
Actual behavior
trusted an unreachable connection.
Steps to reproduce
hard to reproduce.
Minimal yet complete reproducer code (or GitHub URL to code)
Environment
SOFAJRaft version: 1.3.11
JVM version (e.g. java -version):
openjdk version "1.8.0_262"
OpenJDK Runtime Environment (build 1.8.0_262-b10)
OpenJDK 64-Bit Server VM (build 25.262-b10, mixed mode)
OS version (e.g. uname -a): Linux uni58 5.4.42-200.el7.x86_64 #1 SMP Fri May 22 14:46:15 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Maven version:
IDE version:
The text was updated successfully, but these errors were encountered:
Describe the bug
In a crash scenario, 80.64 was previous leader then pull out its link interface. 80.58 become new leader during its start process.
isConnected(rc, endipoint) return true
and new leader startReplicator
even in a down connection.It seems
sofa-bolt
didn't close the connection in this short period yet.So the trick thing is if called
getAlivePeers()
throughcliService
will return the unreachable nodes between2023-02-06 10:40:45
and2023-02-06 10:40:50
becauser.lastRpcSendTimestamp = Utils.monotonicMs();
was updated to now.below is some log info
Expected behavior
Original connection is exists, but not able to use.
Do an explicit connection ping-pong check to make sure the connection is works fine.
Actual behavior
trusted an unreachable connection.
Steps to reproduce
hard to reproduce.
Minimal yet complete reproducer code (or GitHub URL to code)
Environment
java -version
):uname -a
):Linux uni58 5.4.42-200.el7.x86_64 #1 SMP Fri May 22 14:46:15 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: