- Fixed: f632b36c when a snapshot is installed, all logs convered by snapshot should be purged; by 张炎泼; 2023-04-04
-
Changed: 1bd22edc remove AddLearnerError::Exists, which is not actually used; by 张炎泼; 2022-09-30
-
Changed: c6fe29d4 change-membership does not return error when replication lags; by 张炎泼; 2022-10-22
If
blocking
istrue
,Raft::change_membership(..., blocking)
will block until repliication to new nodes become upto date. But it won't return an error when proposing change-membership log.-
Change: remove two errors:
LearnerIsLagging
andLearnerNotFound
. -
Fix: #581
-
-
Fixed: 2896b98e changing membership should not remove replication to all learners; by 张炎泼; 2022-09-30
When changing membership, replications to the learners(non-voters) that are not added as voter should be kept.
E.g.: with a cluster of voters
{0}
and learners{1, 2, 3}
, changing membership to{0, 1, 2}
should not remove replication to node3
.Only replications to removed members should be removed.
- Added: 9a22bb03 add rocks-store as a
RaftStorage
implementation based on rocks-db; by 张炎泼; 2023-02-22
-
Changed: 25e94c36 InstallSnapshotResponse: replies the last applied log id; Do not install a smaller snapshot; by 张炎泼; 2022-09-22
A snapshot may not be installed by a follower if it already has a higher
last_applied
log id locally. In such a case, it just ignores the snapshot and respond with its locallast_applied
log id.This way the applied state(i.e.,
last_applied
) will never revert back.
-
Fixed: 21684bbd potential inconsistency when installing snapshot; by 张炎泼; 2022-09-22
The conflicting logs that are before
snapshot_meta.last_log_id
should be deleted before installing a snapshot.Otherwise there is chance the snapshot is installed but conflicting logs are left in the store, when a node crashes.
- Added: 568ca470 add Raft::remove_learner(); by 张炎泼; 2022-09-02
-
Added: ea696474 add feature-flag:
bt
enables backtrace; by 张炎泼; 2022-03-12--features bt
enables backtrace when generating errors. By default errors does not contain backtrace info.Thus openraft can be built on stable rust by default.
To use on stable rust with backtrace, set
RUSTC_BOOTSTRAP=1
, e.g.:RUSTUP_TOOLCHAIN=stable RUSTC_BOOTSTRAP=1 make test
- Changed: f99ade30 API: move default impl methods in RaftStorage to StorageHelper; by 张炎泼; 2022-07-04
-
Fixed: 44381b0c when handling append-entries, if prev_log_id is purged, it should not delete any logs.; by 张炎泼; 2022-08-14
When handling append-entries, if the local log at
prev_log_id.index
is purged, a follower should not believe it is a conflict and should not delete all logs. It will get committed log lost.To fix this issue, use
last_applied
instead ofcommitted
:last_applied
is always the committed log id, whilecommitted
is not persisted and may be smaller than the actually applied, when a follower is restarted.
-
Fixed: 30058c03 #424 wrong range when searching for membership entries:
[end-step, end)
.; by 张炎泼; 2022-07-03The iterating range searching for membership log entries should be
[end-step, end)
, not[start, end)
. With this bug it will return duplicated membership entries.- Bug: #424
-
Fixed: d836d85c if there may be more logs to replicate, continue to call send_append_entries in next loop, no need to wait heartbeat tick; by lichuang; 2022-01-04
-
Fixed: 5a026674 defensive_no_dirty_log hangs tests; by YangKian; 2022-01-08
-
Fixed: 8651625e save leader_id if a higher term is seen when handling append-entries RPC; by 张炎泼; 2022-01-10
Problem:
A follower saves hard state
(term=msg.term, voted_for=None)
when amsg.term > local.term
when handling append-entries RPC.This is quite enough to be correct but not perfect. Correct because:
-
In one term, only an established leader will send append-entries;
-
Thus, there is a quorum voted for this leader;
-
Thus, no matter what
voted_for
is saved, it is still correct. E.g. when handling append-entries, a follower node could save hard state(term=msg.term, voted_for=Some(ANY_VALUE))
.
The problem is that a follower already knows the legal leader for a term but still does not save it. This leads to an unstable cluster state: The test sometimes fails.
Solution:
A follower always save hard state with the id of a known legal leader.
-
-
Fixed: 1a781e1b when lack entry, the snapshot to build has to include at least all purged logs; by 张炎泼; 2022-01-18
-
Fixed: a0a94af7 span.enter() in async loop causes memory leak; by 张炎泼; 2022-06-17
It is explained in: https://onesignal.com/blog/solving-memory-leaks-in-rust/
-
Changed: c9c8d898 trait RaftStore: remove get_membership_config(), add last_membership_in_log() and get_membership() with default impl; by drdr xp; 2022-01-04
Goal: minimize the work for users to implement a correct raft application.
Now RaftStorage provides default implementations for
get_membership()
andlast_membership_in_log()
.These two methods just can be implemented with other basic user impl methods.
- fix: #59
-
Changed: abda0d10 rename RaftStorage methods do_log_compaction: build_snapshot, delete_logs_from: delete_log; by 张炎泼; 2022-01-15
-
Changed: a52a9300 RaftStorage::get_log_state() returns last purge log id; by 张炎泼; 2022-01-16
-
Change:
get_log_state()
returns thelast_purged_log_id
instead of thefirst_log_id
. Because there are some cases in which log are empty: When a snapshot is install that covers all logs, or whenmax_applied_log_to_keep
is 0.Returning
None
is not clear about if there are no logs at all or all logs are deleted.In such cases, raft still needs to maintain log continuity when repilcating. Thus the last log id that once existed is important. Previously this is done by checking the
last_applied_log_id
, which is dirty and buggy.Now an implementation of
RaftStorage
has to maintain thelast_purged_log_id
in its store. -
Change: Remove
first_id_in_log()
,last_log_id()
,first_known_log_id()
, because concepts are changed. -
Change: Split
delete_logs()
into two method for clarity:delete_conflict_logs_since()
for deleting conflict logs when the replication receiving end find a conflict log.purge_logs_upto()
for cleaning applied logs -
Change: Rename
finalize_snapshot_installation()
toinstall_snapshot()
. -
Refactor: Remove
initial_replicate_to_state_machine()
, which does nothing more than a normal applying-logs. -
Refactor: Remove
enum UpdateCurrentLeader
. It is just a wrapper of Option.
-
-
Changed: 7424c968 remove unused error MembershipError::Incompatible; by 张炎泼; 2022-01-17
-
Changed: beeae721 add ChangeMembershipError sub error for reuse; by 张炎泼; 2022-01-17
-
Fixed: 4d58a51e a non-voter not in joint config should not block replication; by drdr xp; 2021-08-31
-
Fixed: eed681d5 race condition of concurrent snapshot-install and apply.; by drdr xp; 2021-09-01
Problem:
Concurrent snapshot-install and apply mess up
last_applied
.finalize_snapshot_installation
runs in theRaftCore
thread.apply_to_state_machine
runs in a separate tokio task(thread).Thus there is chance the
last_applied
being reset to a previous value:-
apply_to_state_machine
is called and finished in a thread. -
finalize_snapshot_installation
is called inRaftCore
thread and finished withlast_applied
updated. -
RaftCore
thread finished waiting forapply_to_state_machine
, and updatedlast_applied
to a previous value.
RaftCore: -. install-snapshot, .-> replicate_to_sm_handle.next(), | update last_applied=5 | update last_applied=2 | | v | task: apply 2------------------------' --------------------------------------------------------------------> time
Solution:
Rule: All changes to state machine must be serialized.
A temporary simple solution for now is to call all methods that modify state machine in
RaftCore
thread. But this way it blocksRaftCore
thread.A better way is to move all tasks that modifies state machine to a standalone thread, and send update request back to
RaftCore
to update its fields such aslast_applied
-
-
Fixed: a48a3282 handle-vote should compare last_log_id in dictionary order, not in vector order; by drdr xp; 2021-09-09
A log
{term:2, index:1}
is definitely greater than log{term:1, index:2}
in raft spec. Comparing log id in the way ofterm1 >= term2 && index1 >= index2
blocks election: no one can become a leader. -
Fixed: 228077a6 a restarted follower should not wait too long to elect. Otherwise the entire cluster hangs; by drdr xp; 2021-11-19
-
Fixed: 6c0ccaf3 consider joint config when starting up and committing.; by drdr xp; 2021-12-24
-
Change: MembershipConfig support more than 2 configs
-
Makes fields in MembershipConfig privates. Provides methods to manipulate membership.
-
Fix: commit without replication only when membership contains only one node. Previously it just checks the first config, which results in data loss if the cluster is in a joint config.
-
Fix: when starting up, count all nodes but not only the nodes in the first config to decide if it is a single node cluster.
-
-
Fixed: b390356f first_known_log_id() should returns the min one in log or in state machine; by drdr xp; 2021-12-28
-
Fixed: cd5a570d clippy warning; by lichuang; 2022-01-02
-
Changed: deda6d76 remove PurgedMarker. keep logs clean; by drdr xp; 2021-09-09
Changing log(add a PurgedMarker(original SnapshotPointer)) makes it diffeicult to impl
install-snapshot
for a RaftStore without a lock protecting both logs and state machine.Adding a PurgedMarker and installing the snapshot has to be atomic in storage layer. But usually logs and state machine are separated store. e.g., logs are stored in fast flash disk and state machine is stored some where else.
To get rid of the big lock, PurgedMarker is removed and installing a snaphost does not need to keep consistent with logs any more.
-
Changed: 734eec69 VoteRequest: use last_log_id:LogId to replace last_log_term and last_log_index; by drdr xp; 2021-09-09
-
Changed: 74b16524 introduce StorageError. RaftStorage gets rid of anyhow::Error; by drdr xp; 2021-09-13
StorageError
is anenum
of DefensiveError and StorageIOError. An error a RaftStorage impl returns could be a defensive check error or an actual io operation error.Why:
anyhow::Error is not enough to support the flow control in RaftCore. It is typeless thus RaftCore can not decide what next to do depending on the returned error.
Inside raft, anyhow::Error should never be used, although it could be used as
source()
of some other error types. -
Changed: 46bb3b1c
RaftStorage::finalize_snapshot_installation
is no more responsible to delete logs included in snapshot; by drdr xp; 2021-09-13A RaftStorage should be as simple and intuitive as possible.
One should be able to correctly impl a RaftStorage without reading the guide but just by guessing what a trait method should do.
RaftCore is able to do the job of deleting logs that are included in the state machine, RaftStorage should just do what is asked.
-
Changed: 2cd23a37 use structopt to impl config default values; by drdr xp; 2021-09-14
-
Changed: ac4bf4bd InitialState: rename last_applied_log to last_applied; by drdr xp; 2021-09-14
-
Changed: 74283fda RaftStorage::do_log_compaction() do not need to delete logs any more raft-core will delete them.; by drdr xp; 2021-09-14
-
Changed: 112252b5 RaftStorage add 2 API: last_id_in_log() and last_applied_state(), remove get_last_log_id(); by drdr xp; 2021-09-15
-
Changed: 7f347934 simplify membership change; by drdr xp; 2021-09-16
-
Change: if leadership is lost, the cluster is left with the joint config. One does not receive response of the change-membership request should always re-send to ensure membership config is applied.
-
Change: remove joint-uniform logic from RaftCore, which brings a lot complexity to raft impl. This logic is now done in Raft(which is a shell to control RaftCore).
-
Change: RaftCore.membership is changed to
ActiveMembership
, which includes a log id and a membership config. Making this change to let raft be able to check if a membership is committed by comparing the log index and its committed index. -
Change: when adding a existent non-voter, it returns an
Ok
value instead of anErr
. -
Change: add arg
blocking
toadd_non_voter
andchange_membership
. A blockingchange_membership
still wait for the two config change log to commit.blocking
only indicates if to wait for replication to non-voter to be up to date. -
Change: remove
non_voters
. Merge it intonodes
. Now both voters and non-voters share the same replication handle. -
Change: remove field
ReplicationState.is_ready_to_join
, it can be just calculated when needed. -
Change: remove
is_stepping_down
,membership.contains()
is quite enough. -
Change: remove
consensus_state
.
-
-
Changed: df684131 bsearch to find matching log between leader and follower; by drdr xp; 2021-12-17
-
Refactor: simplify algo to find matching log between leader and follower. It adopts a binary-search like algo:
The leader tracks the max matched log id(
self.matched
) and the least unmatched log id(self.max_possible_matched_index
).The follower just responds if the
prev_log_id
inAppendEntriesRequest matches the log at
prev_log_id.index
in its store.Remove the case-by-case algo.
-
Change: RaftStorage adds 2 new API:
try_get_log_entries()
,first_id_in_log()
andfirst_known_log_id()
.These a are not stable, may be removed soon.
-
Fix: the timeout for
Wait()
should be a total timeout. Otherwise aWait()
never quits. -
Fix: when send append-entries request, if a log is not found, it should retry loading, but not enter snapshot state. Because a log may be deleted by RaftCore just after Replication read
prev_log_id
from the store. -
Refactor: The two replication loop: line-rate loop and snapshot loop should not change the
ReplicationState
, but instead returning an error. Otherwise it has to check the state everywhere. -
Refactor: simplify receiving RaftCore messages: split
drain_raft_rx()
intoprocess_raft_event()
andtry_drain_raft_rx()
. -
Feature: a store impl has to add an initial log at index 0 to make the store mathematics complete.
-
Feature: add
ReplicationError
to describe all errors that is emitted when replicating entries or snapshot.
-
-
Changed: 6625484c remove EntryNormal; by drdr xp; 2021-12-23
-
Changed: 61551178 remove EntryMembership; by drdr xp; 2021-12-23
-
Changed: c61b4c49 remove ConflictOpt, which is a wrapper of log_id; add matched log id in AppendEntriesResponse; by drdr xp; 2021-12-23
-
Changed: 3511e439 rename MembershipConfig to Membership; by drdr xp; 2021-12-27
-
Changed: b43c085a track committed log id instead of just a commit index; by drdr xp; 2021-12-29
-
Changed: 8506102f remove unused field SnapshotMeta::membership; by drdr xp; 2021-12-29
-
Dependency: 7848c219 update pretty_assertions requirement from 0.7.2 to 1.0.0; by dependabot[bot]; 2021-09-28
Updates the requirements on pretty_assertions to permit the latest version.
updated-dependencies:
- dependency-name: pretty_assertions dependency-type: direct:production ...
Signed-off-by: dependabot[bot] support@github.com
-
Dependency: cd080192 update tracing-subscriber requirement from 0.2.10 to 0.3.3; by dependabot[bot]; 2021-11-30
Updates the requirements on tracing-subscriber to permit the latest version.
updated-dependencies:
- dependency-name: tracing-subscriber dependency-type: direct:production ...
Signed-off-by: dependabot[bot] support@github.com
-
Added: 1451f962 Membership provides method is_majority() and simplify quorum calculation for voting; by drdr xp; 2021-12-25
-
Added: a2a48c56 make DefensiveCheck a reuseable trait; by drdr xp; 2021-12-26
-
Defensive checks in
MemStore
are moved out into a traitDefensiveCheck
. -
Let user impl a base
RaftStorage
. Then raft wraps it with aStoreExt
thus the defensive checks apply to every impl ofRaftStorage
.
-
-
Changed: 79a39970 to get last_log and membership, Storage should search for both logs and state machines.; by drdr xp; 2021-08-24
Why:
depending on the impl, a RaftStore may have logs that are included in the state machine still present. This may be caused by a non-transactional impl of the store, e.g. installing snapshot and removing logs are not atomic.
Thus when searching for last_log or last membership, a RaftStore should search for both logs and state machine, and returns the greater one that is found.
-
Test: add test to prove these behaviors, which includes:
get_initial_state()
andget_membership()
. -
Refactor: Make store tests a suite that could be applied to other impl.
-
-
Changed: 07d71c67 RaftStore::delete_logs_from() use range instead of (start, end); by drdr xp; 2021-08-28
-
Changed: 1c46a712 RaftStore::get_log_entries use range as arg; add try_get_log_entry() that does not return error even when defensive check is on; by drdr xp; 2021-08-28
-
Added: 420cdd71 add defensive check to MemStore; by drdr xp; 2021-08-28
-
Added: ab6689d9 RaftStore::get_last_log_id() to get the last known log id in log or state machine; by drdr xp; 2021-08-29
-
Fixed: 6d53aa12 too many(50) inconsistent log should not live lock append-entries; by drdr xp; 2021-08-31
-
Reproduce the bug that when append-entries, if there are more than 50 inconsistent logs, the responded
conflict
is always set toself.last_log
, which blocks replication for ever. Because the next time append-entries use the sameprev_log_id
, it actually does not search backward for the first consistent log entry.
The test to reproduce it fakes a cluster of node 0,1,2: R0 has 100 uncommitted log at term 2. R2 has 100 uncommitted log at term 3.
R0 ... 2,99 2,100 R1 R2 ... 3,99, 3,00
Before this fix, brings up the cluster, R2 becomes leader and will never sync any log to R0.
The fix is also quite simple:
-
Search backward instead of searching forward, to find the last log entry that matches
prev_log_id.term
, and responds this log id to the leader to let it send nextappend_entries
RPC since this log id. -
If no such matching term is found, use the first log id it sees, e.g., the entry at index
prev_log_id.index - 50
for nextappend_entries
.
-
-
Fixed: 9540c904 when append-entries, deleting entries after prev-log-id causes committed entry to be lost; by drdr xp; 2021-08-31
Problem:
When append-entries, raft core removes old entries after
prev_log_id.index
, then append new logs sent from leader.Since deleting then appending entries are not atomic(two calls to
RaftStore
), deleting consistent entries may cause loss of committed entries, if server crashes after the delete.E.g., an example cluster state with logs as following and R1 now is the leader:
R1 1,1 1,2 1,3 R2 1,1 1,2 R3
Committed entry
{1,2}
gets lost after the following steps:- R1 to R2:
append_entries(entries=[{1,2}, {1,3}], prev_log_id={1,1})
- R2 deletes 1,2
- R2 crash
- R2 is elected as leader with R3, and only see 1,1; the committed entry 1,2 is lost.
Solution:
The safe way is to skip every entry that are consistent to the leader. And delete only the inconsistent entries.
Another issue with this solution is that:
Because we can not just delete
log[prev_log_id.index..]
, the commit index:- must be update only after append-entries,
- and must point to a log entry that is consistent to leader.
Or there could be chance applying an uncommitted entry:
R0 1,1 1,2 3,3 R1 1,1 1,2 2,3 R2 1,1 1,2 3,3
- R0 to R1
append_entries: entries=[{1,2}], prev_log_id = {1,1}, commit_index = 3
- R1 accepted this
append_entries
request but was not aware of that entry {2,3} is inconsistent to leader. Updating commit index to 3 allows it to apply an uncommitted entrie{2,3}
.
- R1 to R2:
-
Changed: a1a05bb4 rename Network methods to send_xxx; by drdr xp; 2021-08-23
-
Changed: f168696b rename RaftStorage::Snapshot to RaftStorage::SnapsthoData; by drdr xp; 2021-08-23
-
Changed: fea63b2f rename CurrentSnapshotData to Snapshot; by drdr xp; 2021-08-23
-
Changed: fabf3e74 rename RaftStorage::create_snapshot() to RaftStorage::begin_receiving_snapshot; by drdr xp; 2021-08-23
-
Changed: 90329fbf RaftStorage: merge append_entry_to_log and replicate_to_log into one method append_to_log; by drdr xp; 2021-08-24
-
Changed: daf2ed89 RaftStorage: remove apply_entry_to_state_machine; by drdr xp; 2021-08-24
-
Changed: a18b98f1 SnapshotPointer do not store any info.; by drdr xp; 2021-08-24
Use SnapshotPointer to store membership is a bad idea. It brings in troubles proving the consistency, e.g.:
-
When concurrent
do_log_compaction()
is called(it is not possible for now, may be possible in future. The correctness proof involving multiple component is a nightmare.) -
Proof of correctness of consistency between
StateMachine.last_membership
andSnapshotPointer.membership
is complicated.
What we need is actually:
- At least one committed log is left in the log storage,
- and info in the purged log must be present in the state machine, e.g. membership
-
-
Changed: 5d0c0b25 rename SnapshotPointer to PurgedMarker; by drdr xp; 2021-08-24
-
Changed: 72b02249 rename replicate_to_state_machine to apply_to_state_machine; by drdr xp; 2021-08-24
-
Fixed: eee8e534 snapshot replication does not need to send a last 0 size chunk; by drdr xp; 2021-08-22
-
Fixed: 8cd24ba0 RaftCore.entries_cache is inconsistent with storage. removed it.; by drdr xp; 2021-08-23
-
When leader changes,
entries_cache
is cleared. Thus there may be cached entries wont be applied to state machine. -
When applying finished, the applied entries are not removed from the cache. Thus there could be entries being applied more than once.
-
-
Fixed: 2eccb9e1 install snapshot req with offset GE 0 should not start a new session.; by drdr xp; 2021-08-22
A install-snapshot always ends with a req with data len to be 0 and offset GE 0. If such a req is re-sent, e.g., when timeout, the receiver will try to install a snapshot with empty data, if it just finished the previous install snapshot req(
snapshot_state
is None) and do not reject a install snapshot req with offset GE 0. Which results in afatal storage error
, since the storage tries to decode an empty snapshot data.- feature: add config
install_snapshot_timeout
.
- feature: add config
-
Fixed: beb0302b leader should not commit when there is no replication to voters.; by drdr xp; 2021-08-18
When there is no replication to voters but there are replications to non-voters, the leader did not check non-voters for a quorum but just commits a log at once.
This cause the membership change log from a single node always commits. E.g. start node 0, and non-voter 1, 2; then
change_membership({0, 1, 2})
, It just commits the joint-log at once. But according to raft paper, it should await a quorum of {0} and a quorum of {0, 1, 2}.
-
Changed: 6350514c change-membership should be log driven but not channel driven; by drdr xp; 2021-08-18
A membership change involves two steps: the joint config phase and the final config phase. Each phase has a corresponding log invovled.
Previously the raft setup several channel to organize this workflow, which makes the logic hard to understand and introduces complexity when restarting or leadership transfered: it needs to re-establish the channels and tasks.
According to the gist of raft, all workflow should be log driven. Thus the new approach:
- Write two log(the joint and the final) at once it recevies a change-membership request.
- All following job is done according to just what log is committed.
This simplifies the workflow and makes it more reliable and intuitive to understand.
Related changes:
-
When
change_membership
is called, append 2 logs at once. -
Introduce universal response channel type to send back a message when some internal task is done:
ResponseTx
, and a universal response error type:ResponseError
. -
Internal response channel is now an
Option<ResponseTx>
, since the first step of membership change does not need to respond to the caller. -
When a new leaser established, if the last log is a joint config log, append a final config log to let the partial change-membership be able to complete.
And the test is added.
-
Removed membership related channels.
-
Refactor: convert several func from async to sync.
- Changed: 8b59966d MembershipConfig.member type is changed form HashSet BTreeSet; by drdr xp; 2021-08-17
-
Changed: adc24f55 pass all logs to apply_entry_to_state_machine(), not just Normal logs.; by drdr xp; 2021-08-16
Pass
Entry<D>
toapply_entry_to_state_machine()
, not just the onlyEntryPayload::Normal(normal_log)
.Thus the state machine is able to save the membership changes if it prefers to.
Why:
In practice, a snapshot contains info about all applied logs, including the membership config log. Before this change, the state machine does not receive any membership log thus when making a snapshot, one needs to walk through all applied logs to get the last membership that is included in state machine.
By letting the state machine remember the membership log applied, the snapshto creation becomes more convinient and intuitive: it does not need to scan the applied logs any more.
-
Changed: 82a3f2f9 use LogId to track last applied instead of using just an index.; by drdr xp; 2021-07-19
It provides more info by Using LogId to track last applied log. E.g. when creating a snapshot, it need to walk through logs to find the term of the last applied log, just like it did in memstore impl.
Using LogId{term, index} is a more natural way in every aspect.
changes: RaftCore: change type of
last_applied
from u64 to LogId.
-
Fixed: fc8e92a8 typo; by drdr xp; 2021-07-12
-
Fixed: 447dc11c when finalize_snapshot_installation, memstore should not load membership from its old log that are going to be overridden by snapshot.; by drdr xp; 2021-07-13
-
Fixed: dba24036 after 2 log compaction, membership should be able to be extract from prev compaction log; by drdr xp; 2021-07-14
-
Changed: 7792cccd add CurrentSnapshotData.meta: SnapshotMeta, which is a container of all meta data of a snapshot: last log id included, membership etc.; by drdr xp; 2021-07-13
-
Changed: 0c870cc1 reduce one unnecessary snapshot serialization; by drdr xp; 2021-07-14
-
Change:
get_current_snapshot()
: remove double-serialization: convert MemStoreSnapshot to CurrentSnapshotData instead of serializing MemStoreSnapshot:Before:
MemStoreSnapshot.data = serialize(state-machine) CurrentSnapshotData.data = serialize(MemStoreSnapshot)
After:
MemStoreSnapshot.data = serialize(state-machine) CurrentSnapshotData.data = MemStoreSnapshot.data
when
finalize_snapshot_installation
, extract snapshot meta info fromInstallSnapshotRequest
. Reduce one unnecessary deserialization. -
Change: InstallSnapshotRequest: merge
snapshot_id
,last_log_id
,membership
into one fieldmeta
. -
Refactor: use SnapshotMeta(
snapshot_id
,last_log_id
,membership
) as a container of metadata of a snapshot. Reduce parameters. -
Refactor: remove redundent param
delete_through
fromfinalize_snapshot_installation
.
-
-
Changed: 954c67a9 InstallSnapshotRequest: merge last_included{term,index} into last_included; by drdr xp; 2021-07-08
-
Changed: 933e0b32 use snapshot-id to identify a snapshot stream; by drdr xp; 2021-07-09
A snapshot stream should be identified by some id, since the server end should not assume messages are arrived in the correct order. Without an id, two
install_snapshot
request belonging to different snapshot data may corrupt the snapshot data, explicitly or even worse, silently.-
Add SnapshotId to identify a snapshot stream.
-
Add SnapshotSegmentId to identify a segment in a snapshot stream.
-
Add field
snapshot_id
to snapshot related data structures. -
Add error
RaftError::SnapshotMismatch
. -
Storage::create_snapshot()
does not need to return and id. Since the receiving end only keeps one snapshot stream session at most. Instead,Storage::do_log_compaction()
should build a unique id everytime it is called. -
When the raft node receives an
install_snapshot
request, the id must match to continue. A request with a different id should be rejected. A new id with offset=0 indicates the sender has started a new stream. In this case, the old unfinished stream is dropped and cleaned. -
Add test for
install_snapshot
API.
-
-
Changed: 85859d07 CurrentSnapshotData: merge
term
andindex
intoincluded
.; by drdr xp; 2021-07-09 -
Changed: 5eb9d3af RaftCore: replace
snapshot_index
withsnapshot_last_included: LogId
. Keep tracks of both snapshot last log term and index.; by drdr xp; 2021-07-09Also
SnapshotUpdate::SnapshotComplete
now contains an LogId instead of an u64 index. -
Changed: 9c5f3d7e RaftCore: merge last_log_{term,index} into last_log: LogId; by drdr xp; 2021-07-09
-
Changed: 58d8e3a2 AppendEntriesRequest: merge prev_log_{term,index} into prev_log: LogId; by drdr xp; 2021-07-10
-
Changed: 9e4fb64f InitialState: last_log_{term,index} into last_log: LogId; by drdr xp; 2021-07-10
-
Changed: 24e38130 Entry: merge term and index to log_id: LogId; by drdr xp; 2021-07-11
-
Added: 8e0b0df9 report snapshot metrics to RaftMetrics::snapshot, which is a LogId: (term, index) that a snapshot includes; by drdr xp; 2021-07-09
- Add:
Wait.snapshot()
to watch snapshot changes. - Test: replace
sleep()
withwait_for_snapshot()
to speed up tests.
- Add:
- Dependency: b351c87f upgrade tokio from 1.7 to 1.8; by drdr xp; 2021-07-08
-
Fixed: cf4badd0 leader should re-create and send snapshot when
threshold/2 < last_log_index - snapshot < threshold
; by drdr xp; 2021-07-08The problem:
If
last_log_index
advancessnapshot.applied_index
too many, i.e.:threshold/2 < last_log_index - snapshot < threshold
(e.g.,10/2 < 16-10 < 20
in the test that reproduce this bug), the leader tries to re-create a new snapshot. But whenlast_log_index < threshold
, it won't create, which result in a dead loop.Solution:
In such case, force to create a snapshot without considering the threshold.
- Dependency: 70e1773e adapt to changes of rand-0.8: gen_range() accepts a range instead of two args; by drdr xp; 2021-06-21
-
Added: 32a67e22 add metrics about leader; by drdr xp; 2021-06-29
In LeaderState it also report metrics about the replication to other node when report metrics.
When switched to other state, LeaderState will be destroyed as long as the cached replication metrics.
Other state report an
None
to raft core to override the previous metrics data.At some point the raft core, without knonwning the state, just report metrics with an
Update::Ignore
, to indicate that leave replication metrics intact.
- Fixed: d60f1e85 client_read has using wrong quorum=majority-1; by drdr xp; 2021-07-02
-
Added: 1ad17e8e move wait_for_xxx util into metrics.; by drdr xp; 2021-06-16
Introduce struct
Wait
as a wrapper of the metrics channel to impl wait-for utils:log()
: wait for log to apply.current_leader()
: wait for known leader.state()
: wait for the role.members()
: wait for membership_config.members.next_members()
: wait for membership_config.members_after_consensus.
E.g.:
// wait for ever for raft node's current leader to become 3: r.wait(None).current_leader(2).await?;
The timeout is now an option arg to all wait_for_xxx functions in fixtures. wait_for_xxx_timeout are all removed.
-
Added: 3388f1a2 link to discord server.; by Anthony Dodd; 2021-05-21
-
Added: bcc246cc a pull request template.; by Anthony Dodd; 2021-05-26
-
Added: ea539069 wait_for_nodes_log(); by drdr xp; 2021-05-24
-
Added: 668ad478 some wait_for func:; by drdr xp; 2021-05-24
- wait_for_log()
- wait_for_log_timeout()
- wait_for_state()
- wait_for_state_timeout()
-
Fixed: 89bb48f8 last_applied should be updated only when logs actually applied.; by drdr xp; 2021-05-20
-
Fixed: e9f40450 usage of get_storage_handle; by drdr xp; 2021-05-23
-
Fixed: 22cd1a0c clippy complains; by drdr xp; 2021-05-23
-
Fixed: 6202138f a conflict is expected even when appending empty enties; by drdr xp; 2021-05-24
-
Fixed: f449b64a discarded log in replication_buffer should be finally sent.; by drdr xp; 2021-05-22
Internally when replication goes to LaggingState(a non-leader lacks a lot logs), the ReplicationCore purges
outbound_buffer
andreplication_buffer
and then sends all committed logs found in storage.Thus if there are uncommitted logs in
replication_buffer
, these log will never have chance to be replicated, even when replication goes back to LineRateState. Since LineRateState only replicates logs fromReplicationCore.outbound_buffer
andReplicationCore.replication_buffer
.This test ensures that when replication goes to LineRateState, it tries to re-send all logs found in storage(including those that are removed from the two buffers.
-
Fixed: 6d680484 #112 : when a follower is removed, leader should stops sending log to it.; by drdr xp; 2021-05-21
A leader adds all follower replication states to a hashset
nodes
, when the leader is established. But the leader does not do it when membership changed. Thus when a follower is removed, the leader can not stop replication to it because the follower is not innodes
.The solution is to move replication state from
non_voters
tonodes
. So that next time a follower is removed the leader is able to remove the replication fromnodes
. -
Fixed: 39690593 a NonVoter should stay as NonVoter instead of Follower after restart; by drdr xp; 2021-05-14
-
Fixed: d882e743 when calc quorum, the non-voter should be count; by drdr xp; 2021-06-02
Counting only the follower(nodes) as quorum for new config(c1) results in unexpected log commit. E.g.: change from 012 to 234, when 3 and 4 are unreachable, the first log of joint should not be committed.
-
Fixed: a10d9906 when handle_update_match_index(), non-voter should also be considered, because when member change a non-voter is also count as a quorum member; by drdr xp; 2021-06-16
-
Fixed: 11cb5453 doc-include can only be used in nightly build; by drdr xp; 2021-06-16
-
Simplify CI test: test all in one action.
-
Disable clippy: it suggests inappropriate assert_eq to assert convertion which is used in a macro.
-
Add makefile
-
Build only with nightly rust. Add rust-toolchain to specify toolchain version.
-
- Dependency: 919d91cb upgrade tokio from 1.0 to 1.7; by drdr xp; 2021-06-16
- Fixed #105 where function
set_target_state
missingelse
condition. - Fixed #106 which ensures that counting of replicas to determine a new commit value only considers entries replicated as part of the current term.
- Fixed a bug where Learner nodes could be restarted and come back as voting members.
The big news for this release is that we are now based on Tokio 1.0! Big shoutout to @xu-cheng for doing all of the heavy lifting for the Tokio 1.0 update, along with many other changes which are part of this release.
It is important to note that 0.6.0 does include two breaking changes from 0.5: the new RaftStorage::ShutdownError
associated type, and Tokio 1.0. Both of these changes are purely code related, and it is not expected that they will negatively impact running systems.
- Updated to Tokio 1.0!
- BREAKING: this introduces a
RaftStorage::ShutdownError
associated type. This allows for the Raft system to differentiate between fatal storage errors which should cause the system to shutdown vs errors which should be propagated back to the client for application specific error handling. These changes only apply to theRaftStorage::apply_entry_to_state_machine
method. - A small change to Raft startup semantics. When a node comes online and successfully recovers state (the node was already part of a cluster), the node will start with a 30 second election timeout, ensuring that it does not disrupt a running cluster.
- #89 removes the
Debug
bounds requirement on theAppData
&AppDataResponse
types. - The
Raft
type can now be cloned. The clone is very cheap and helps to facilitate async workflows while feeding client requests and Raft RPCs into the Raft instance. - The
Raft.shutdown
interface has been changed slightly. Instead of returning aJoinHandle
, the method is now async and simply returns a result. - The
ClientWriteError::ForwardToLeader
error variant has been modified slightly. It now exposes the data (generic typeD
of the type) of the original client request directly. This ensures that the data can actually be used for forwarding, if that is what the parent app wants to do. - Implemented #12. This is a pretty old issue and a pretty solid optimization. The previous implementation of this algorithm would go to storage (typically disk) for every process of replicating entries to the state machine. Now, we are caching entries as they come in from the leader, and using only the cache as the source of data. There are a few simple measures needed to ensure this is correct, as the leader entry replication protocol takes care of most of the work for us in this case.
- Updated / clarified the interface for log compaction. See the guide or the updated
do_log_compaction
method docs for more details.
- #97 adds the new
Raft.current_leader
method. This is a convenience method which builds upon the Raft metrics system to quickly and easily identify the current cluster leader.
- Fixed #98 where heartbeats were being passed along into the log consistency check algorithm. This had the potential to cause a Raft node to go into shutdown under some circumstances.
- Fixed a bug where the timestamp of the last received heartbeat from a leader was not being stored, resulting in degraded cluster stability under some circumstances.
- Updated async-raft dependency to
0.6.0
& updated storage interface as needed.
- Fixed #76 by moving the process of replicating log entries to the state machine off of the main task. This ensures that the process never blocks the main task. This also includes a few nice optimizations mentioned below.
- Added
#[derive(Serialize, Deserialize)]
toRaftMetrics
,State
.
- Fixed #82 where client reads were not behaving correctly for single node clusters. Single node integration tests have been updated to ensure this functionality is working as needed.
- Fixed #79 ... for real this time! Add an integration test to prove it.
- Fixed #79. The Raft core state machine was not being properly updated in response to shutdown requests. That has been addressed and shutdowns are now behaving as expected.
ChangeMembershipError::NodeNotLeader
now returns the ID of the current cluster leader if known.- Fix off-by-one error in
get_log_entries
during the replication process. - Added
#[derive(Serialize, Deserialize)]
toConfig
,ConfigBuilder
&SnapshotPolicy
.
The only thing which hasn't changed is that this crate is still an implementation of the Raft protocol. Pretty much everything else has changed.
- Everything is built directly on Tokio now.
- The guide has been updated.
- Docs have been updated.
- The
Raft
type is now the primary API of this crate, and is a simple struct with a few public methods. - Lots of fixes to the implementation of the protocol, ranging from subtle issues in joint consensus to non-voter syncing.
- Implemented
Error
forconfig::ConfigError
Added a few convenience derivations.
- Derive
Eq
onmessages::MembershipConfig
. - Derive
Eq
onmetrics::State
. - Derive
PartialEq
&Eq
onmetrics::RaftMetrics
. - Update development dependencies.
- Fixed bug #41 where nodes were not starting a new election timeout task after comign down from leader state. Thanks @lionesswardrobe for the report!
A few QOL improvements.
- Fixed an issue where the value for
current_leader
was not being set toNone
when becoming a candidate. This isn't really a bug per se, as no functionality depended on this value as far as Raft is concerned, but it is an issue that impacts the metrics system. This value is now being updated properly. - Made the
messages::ClientPayload::new_base
constructorpub(crate)
instead ofpub
, which is what the intention was originally, but I was apparently tired:)
. - Implemented #25. Implementing Display+Error for the admin error types.
A few bug fixes.
- Fixed an issue where a node in a single-node Raft was not resuming as leader after a crash.
- Fixed an issue where hard state was not being saved after a node becomes leader in a single-node Raft.
- Fixed an issue where the client request pipeline (a
Stream
with theactix::StreamFinish
) was being closed after an error was returned during processing of client requests (which should not cause the stream to close). This was unexpected and undocumented behavior, very simple fix though.
This changeset introduces a new AppDataResponse
type which represents a concrete data type which must be sent back from the RaftStorage
impl from the ApplyEntryToStateMachine
handler. This provides a more direct path for returning application level data from the storage impl. Often times this is needed for responding to client requests in a timely / efficient manner.
AppDataResponse
type has been added (see above).- A few handlers have been updated in the
RaftStorage
type. The handlers are now separated based on where they are invoked from the Raft node. The three changed handlers are:AppendEntryToLog
: this is the same. It is the initial step of handling client requests to apply an entry to the log. This is still where application level errors may be safely returned to the client.ReplicateToLog
: this is for replicating entries to the log. This is part of the replication process.ApplyEntryToStateMachine
: this is for applying an entry to the state machine as the final part of a client request. This is where the newAddDataResponse
type must be returned.ReplicateToStateMachine
: this is for replicating entries to the state machine. This is part of the replication process.
Overhauled the election timeout mechanism. This uses an interval job instead of juggling a rescheduling processes. Seems to offer quite a lot more stability. Along with the interval job, we are using std::time::Instants for performing the comparisons against the last received heartbeat.
Another backwards incompatible change to the RaftStorage
trait. It is now using associated types to better express the needed trait constraints. These changes were the final bit of work needed to get the entire actix-raft system to work with a Synchronous RaftStorage
impl. Async impls continue to work as they have, the RaftStorage
impl block will need to be updated to use the associated types though. The recommend pattern is as follows:
impl RaftStorage<..., ...> for MyStorage {
type Actor = Self;
type Context = Context<Self>; // Or SyncContext<Self>;
}
My hope is that this will be the last backwards incompatible change needed before a 1.0 release. This crate is still young though, so we will see.
- Made a few backwards incompatible changes to the
RaftStorage
trait. Overwrite its third type parameter withactix::SyncContext<Self>
to enable sync storage. - Also removed the
RaftStorage::new
constructor, as it is a bit restrictive. Just added some docs instead describing what is needed.
- Added a few addition top-level exports for convenience.
- Changes to the README for docs.rs.
- Changes to the README for docs.rs.
- Initial release!