Skip to content

Latest commit

 

History

History
2616 lines (1701 loc) · 135 KB

change-log.md

File metadata and controls

2616 lines (1701 loc) · 135 KB

v0.8.9

Summary:

Detail:

v0.8.7

Summary:

Detail:

v0.8.6

Summary:

  • Fixed:
    • 63e69b9a restore replication progress when a leader starts up (#884).
    • c7725c79 Do not report snapshot.last_log_id to metrics until snapshot is finished building/installing.
    • f469878c AsyncReadExt::read_buf() only reads at most 2MB per call.
    • 2c715d6e End tick_loop() when the receiver is gone.; by Ivan Schréter

Detail:

Fixed:

  • Fixed: 63e69b9a restore replication progress when a leader starts up (#884); by 张炎泼; 2023-06-29

    As a leader, the replication progress to itself should be restored upon startup.

    And if this leader is the only node in a cluster, it should re-apply all of the logs to state machine at once.

    • Fix: #883
  • Fixed: c7725c79 Do not report snapshot.last_log_id to metrics until snapshot is finished building/installing; by 张炎泼; 2023-10-18

    Before this commit RaftMetrics.snapshot contains the last log id of a snapshot that is going to install. Therefore there is chance the metrics is updated but the store does not.

    In this commit, RaftMetrics.snapshot will only be updated when a snapshot is finished building or installing, by adding a new field snpashot to IOState for tracking persisted snapshot meta data.

    • Fix: #912
  • Fixed: f469878c AsyncReadExt::read_buf() only reads at most 2MB per call; by 张炎泼; 2023-11-08

    When streaming a snapshot chunk, it should repeatly read_buf() until snapshot_max_chunk_size is full or read EOF.

  • Fixed: 2c715d6e End tick_loop() when the receiver is gone.; by Ivan Schréter; 2023-11-13

    Currently, tick_loop() would keep printing the trace message every tick even when the receiver (Raft main loop) is gone in this form:

    INFO openraft::core::tick: .../tick.rs:70: Tick fails to send, receiving end quit: channel closed

    If the tick message fails to send, then terminate the loop, since every future message will fail to send as well.

    Also adjust the trace message to better describe what happened.

v0.8.4

Summary:

  • Added:
    • e39da9f0 define custom Entry type for raft log.
    • 87d62d56 add feature flag storage-v2 to enable RaftLogStorage and RaftStateMachine.
    • 229f3368 add backoff strategy for unreachable nodes.
    • f0dc0eb7 add RaftMetrics::purged to report the last purged log id.
    • 37e96482 add Wait::purged() to wait for purged to become the expected.
    • 0b419eb6 add RaftMetrics.vote, Wait::vote().
    • ee7b8853 new RaftNetwork API with argument RCPOption.
    • 1ee82cb8 RaftNetwork::send_append_entries() can return PartialSuccess.
    • e9eed210 leader lease.
    • 269d221c add SnapshotPolicy::Never.
    • 9e7195a1 add Raft::purge_log().
  • Improved:
    • dbac91d5 send AppendEntries response before committing entries.
    • 6769cdd0 move state machine operations to another task.
    • dcd18c53 getting a snapshot does not block RaftCore task.
    • 6eeb5246 reduce rate to flush metrics.
    • 1f3bf203 create a channel notify specifically for interal messages.
    • 54154202 move receiving snapshot chunk to sm::Worker.
    • fa4085b9 build snapshot in anohter task.
    • 47048a53 IntoIterator::IntoIter should be Send.
    • 8dae9ac6 impl Clone for Raft does not require Clone for its type parameters.
  • Fixed:
    • cd31970d trait RaftLogId should be public.
    • 26dc8837 ProgressEntry::is_log_range_inflight() checks a log range, not a log entry.
    • 04e40606 if the application does not persist snapshot, build a snapshot when starting up.
    • 3433126c compat07::SnapshotMeta should decode v08 SnapshotMeta.
    • b97edb49 incorrect debug level log.
    • d012705d replication should be able to shutdown when replicating snapshot to unreachable node.
    • f505d7e6 Raft::add_learner() should block forever.
  • Changed:
    • a92499f2 StoreBuilder does not need to run a test, it only needs to build a store.
    • 6e9d3573 remove Clone from trait AppData.
    • 285e6225 instead of a slice, RaftStorage::append_to_log() now accepts an IntoIterator.
    • e0569988 remove unused trait RaftStorageDebug.
    • 88f947a6 remove defensive check utilities.
    • eaf45dfa move RaftStateMachine out of RaftStorage.
    • 9f8ae43e RaftMetrics.replication type to BTreeMap<NodeId, Option<LogId>>.
    • 84539cb0 move snapshot type definition from storage traits to RaftTypeConfig.
    • e78bbffe remove unused error CommittedAdvanceTooMany.

Detail:

Added:

  • Added: e39da9f0 define custom Entry type for raft log; by 张炎泼; 2023-03-16

    This commit introduces a new feature that allows applications to define a custom type for Raft log entries in RaftTypeConfig. By setting Entry = MyEntry, where MyEntry implements the RaftEntry trait, an application can now define its own log entry type that reflects its architecture. However, the default implementation, the Entry type is still available.

    This change provides more flexibility for applications to tailor the Raft log entry to their specific needs.

    • Fix: #705

    • Changes: RaftStorage::append_to_log() and RaftStorage::apply_to_state_machine() accepts slice of entries instead of slice of entry references.

  • Added: 87d62d56 add feature flag storage-v2 to enable RaftLogStorage and RaftStateMachine; by 张炎泼; 2023-04-19

    storage-v2: enables RaftLogStorage and RaftStateMachine as the v2 storage This is a temporary feature flag, and will be removed in the future, when v2 storage is stable. This feature disables Adapter, which is for v1 storage to be used as v2. V2 storage separates log store and state machine store so that log IO and state machine IO can be parallelized naturally.

  • Added: 229f3368 add backoff strategy for unreachable nodes; by 张炎泼; 2023-04-21

    Implements a backoff strategy for temporarily or permanently unreachable nodes. If the Network implementation returns Unreachable error, Openraft will backoff for a while before sending next RPC to this target. This mechanism prevents error logging flood.

    Adds a new method backoff() to RaftNetwork to let an application return a customized backoff policy, the default provided backoff just constantly sleep 500ms.

    Adds an unreachable_nodes setting to the testing router TypedRaftRouteryped to emulate unreachable nodes. Add new error Unreachable and an RPCError variant Unreachable.

    • Fix: #462
  • Added: f0dc0eb7 add RaftMetrics::purged to report the last purged log id; by 张炎泼; 2023-05-01

  • Added: 37e96482 add Wait::purged() to wait for purged to become the expected; by 张炎泼; 2023-05-01

  • Added: 0b419eb6 add RaftMetrics.vote, Wait::vote(); by 张炎泼; 2023-05-02

    The latest approved value of Vote, which has been saved to disk, is referred to as RaftMetrics.vote. Additionally, a new vote() method has been included in Wait to enable the application to wait for vote to reach the anticipated value.

  • Added: ee7b8853 new RaftNetwork API with argument RCPOption; by 张炎泼; 2023-05-02

    • RaftNetwork introduced 3 new API append_entries, install_snapshot and vote which accept an additional argument RPCOption, and deprecated the old API send_append_entries, send_install_snapshot and send_vote.

    • The old API will be removed in 0.9. An application can still implement the old API without any changes. Openraft calls only the new API and the default implementation will delegate to the old API.

    • Implementing the new APIs will disable the old APIs.

    • The new APIs accepts an additional argument RPCOption, to enable an application control the networking behaviors based on the parameters in RPCOption.

      The hard_ttl() and soft_ttl() in RPCOption sets the hard limit and the moderate limit of the duration for which an RPC should run. Once the soft_ttl() ends, the RPC implementation should start to gracefully cancel the RPC, and once the hard_ttl() ends, Openraft will terminate the ongoing RPC at once.

    • Fix: #819

  • Added: 1ee82cb8 RaftNetwork::send_append_entries() can return PartialSuccess; by 张炎泼; 2023-05-03

    If there are too many log entries and the RPCOption.ttl is not sufficient, an application can opt to only send a portion of the entries, with AppendEntriesResponse::PartialSuccess(Option<LogId>), to inform Openraft with the last replicated log id. Thus replication can still make progress.

    For example, it tries to send log entries [1-2..3-10], the application is allowed to send just [1-2..1-3] and return PartialSuccess(1-3),

    Caution

    The returned matching log id must be greater than or equal to the first log id(AppendEntriesRequest::prev_log_id) of the entries to send. If no RPC reply is received, RaftNetwork::send_append_entries must return an RPCError to inform Openraft that the first log id(AppendEntriesRequest::prev_log_id) may not match on the remote target node.

    • Fix: #822
  • Added: e9eed210 leader lease; by 张炎泼; 2023-05-19

    The leader records the most recent time point when an RPC is initiated towards a target node. The highest timestamp associated with RPCs made to a quorum serves as the starting time point for a leader lease.

    Improve: use tokio::Instant to replace TimeState

    Use Instant for timekeeping instead of a custom TimeState struct. Because multiple components need to generate timestamp, not only the RaftCore, e.g., the ReplicationCore. And generating a timestamp is not in the hot path, therefore caching it introduces unnecessary complexity.

  • Added: 269d221c add SnapshotPolicy::Never; by 张炎泼; 2023-05-24

    With SnapshotPolicy::Never, Openraft will not build snapshots automatically based on a policy. Instead, the application has full control over when snapshots are built. In this scenario, the application can call the Raft::trigger_snapshot() API at the desired times to manually trigger Openraft to build a snapshot.

    Rename integration tests:

    • log_compaction -> snapshot_building

    • snapshto -> snapshot_streaming

    • Fix: #851

  • Added: 9e7195a1 add Raft::purge_log(); by 张炎泼; 2023-05-24

    This method allows users to purge logs when required. It initiates the log purge up to and including the given upto log index.

    Logs that are not included in a snapshot will NOT be purged. In such scenario it will delete as many log as possible. The max_in_snapshot_log_to_keep config is not taken into account when purging logs.

    Openraft won't purge logs at once, e.g. it may be delayed by several seconds, because if it is a leader and a replication task has been replicating the logs to a follower, the logs can't be purged until the replication task is finished.

    • Fix: #852

Improved:

  • Improved: dbac91d5 send AppendEntries response before committing entries; by 张炎泼; 2023-04-04

    When a follower receives an append-entries request that includes a series of log entries to append and the log id that the leader has committed, it responds with an append-entries response after committing and applying the entries.

    However, this is not strictly necessary. The follower could simply send the response as soon as the log entries have been appended and flushed to disk, without waiting for them to be committed.

  • Improved: 6769cdd0 move state machine operations to another task; by 张炎泼; 2023-04-13

    State machine operations, such as applying log entries, building/installing/getting snapshot are moved to core::sm::Worker, which is run in a standalone task other than the one running RaftCore. In this way, log io operation(mostly appending log entries) and state machine io operations(mostly applying log entries) can be paralleled.

    • Log io are sitll running in RaftCore task.

    • Snapshot receiving/streaming are removed from RaftCore.

    • Add IOState to RaftState to track the applied log id.

      This field is used to determine whether a certain command, such as sending a response, can be executed after a specific log has been applied.

    • Refactor: leader_step_down() can only be run when the response of the second change-membership is sent. Before this commit, updating the committed is done atomically with sending back response. Since thie commit, these two steps are done separately, because applying log entries are moved to another task. Therefore leader_step_down() must wait for these two steps to be finished.

  • Improved: dcd18c53 getting a snapshot does not block RaftCore task; by 张炎泼; 2023-04-16

    RaftCore no longer blocks on receiving a snapshot from the state-machine worker while replicating a snapshot. Instead, it sends the Receiver to the replication task and the replication task blocks on receiving the snapshot.

  • Improved: 6eeb5246 reduce rate to flush metrics; by 张炎泼; 2023-04-23

    The performance increases by 40% with this optimization:

    clients put/s ns/op Changes
    64 652,000 1,532 Reduce metrics report rate
    64 467,000 2,139
  • Improved: 1f3bf203 create a channel notify specifically for interal messages; by 张炎泼; 2023-04-25

    tx_notify will be used for components such as state-machine worker or replication stream to send back notification when an action is done.

    tx_api is left for receiving only external messages, such as append-entries request or client-write request.

    A Balancer is added to prevent one channel from starving the others.

    The benchmark shows a better performance with 64 clients:

    clients put/s ns/op Changes
    64 730,000 1,369 This commit
    64 652,000 1,532 Previous commit
  • Improved: 54154202 move receiving snapshot chunk to sm::Worker; by 张炎泼; 2023-04-27

    Receiving snapshot chunk should not be run in RaftCore task. Otherwise it will block RaftCore.

    In this commit this task is moved to sm::Worker, running in another task. The corresponding responding command will not be run until sm::Worker notify RaftCore receiving is finished.

  • Improved: fa4085b9 build snapshot in anohter task; by 张炎泼; 2023-05-02

    Before this commit, snapshot is built in the sm::Worker, which blocks other state-machine writes, such as applying log entries.

    This commit parallels applying log entries and building snapshot: A snapshot is built in another tokio::task.

    Because building snapshot is a read operation, it does not have to block the entire state machine. Instead, it only needs a consistent view of the state machine or holding a lock of the state machine.

    • Fix: #596
  • Improved: 47048a53 IntoIterator::IntoIter should be Send; by 张炎泼; 2023-06-16

    The RaftStateMachine::apply() and RaftLogStorage::append_to_log() method contains a Send bound on the IntoIterator passed to it. However, the actual iterator returned from IntoIterator doesn't have it. Thus, it's impossible to iterate across awaits in the implementation.

    The correct API should be:

    async fn apply<I>(&mut self, entries: I) -> Result<...>
    where
        I: IntoIterator<Item = C::Entry> + Send,
        I::IntoIter: Send;
    

    Thanks to schreter

    • Fix: #860
  • Improved: 8dae9ac6 impl Clone for Raft does not require Clone for its type parameters; by 张炎泼; 2023-06-16

    Thanks to xDarksome

    • Fix: #870

Fixed:

  • Fixed: cd31970d trait RaftLogId should be public; by 张炎泼; 2023-03-21

  • Fixed: 26dc8837 ProgressEntry::is_log_range_inflight() checks a log range, not a log entry; by 张炎泼; 2023-04-12

    This bug causes replication tries to send pruged log.

  • Fixed: 04e40606 if the application does not persist snapshot, build a snapshot when starting up; by 张炎泼; 2023-04-15

  • Fixed: 3433126c compat07::SnapshotMeta should decode v08 SnapshotMeta; by 张炎泼; 2023-04-15

  • Fixed: b97edb49 incorrect debug level log; by 张炎泼; 2023-04-22

    This results in unnecessary debug log output.

  • Fixed: d012705d replication should be able to shutdown when replicating snapshot to unreachable node; by 张炎泼; 2023-05-01

    If a replication is sending a snapshot, it should periodically verify the input channel's status. When the input channel is closed during replication rebuilding, it should immediately exit the loop instead of attempting retries indefinitely.

    • Fix: #808
  • Fixed: f505d7e6 Raft::add_learner() should block forever; by 张炎泼; 2023-05-20

    The Raft::add_learner() method, when invoked with the blocking parameter set to true, should block forever until the learner synchronizes its logs with the leader.

    In its current implementation, add_learner() calls the Raft::wait() method, which has a default timeout of 500ms. To achieve the desired blocking behavior, the default timeout should be increased significantly.

    • Fix: #846

Changed:

  • Changed: a92499f2 StoreBuilder does not need to run a test, it only needs to build a store; by 张炎泼; 2023-03-21

    StoreBuilder::run_test() is removed, and a new method build() is added. To test a RaftStorage implementation, just implementing StoreBuilder::build() is enough now. It returns a store instance and a guard, for disk backed store to clean up the data when the guard is dropped.

  • Changed: 6e9d3573 remove Clone from trait AppData; by 张炎泼; 2023-03-26

    Application data AppData does not have to be Clone.

    Upgrade tip:

    Nothing needs to be done. The default RaftEntry implementation Entry provided by openraft is still Clone, if the AppData in it is Clone.

  • Changed: 285e6225 instead of a slice, RaftStorage::append_to_log() now accepts an IntoIterator; by 张炎泼; 2023-03-27

    Using an IntoIterator is more generic than using a slice, and could avoid potential memory allocation for the slice.

    Upgrade tip:

    Update the method signature in the RaftStorage implementation and ensure that it compiles without errors. The method body may require minimal modifications as as the new input type is just a more general type.

  • Changed: e0569988 remove unused trait RaftStorageDebug; by 张炎泼; 2023-04-10

    RaftStorageDebug has only one method get_state_machine(), and state machine is entirely a user defined struct. Obtaining a state machine does not imply anything about the struct or behavior of it.

  • Changed: 88f947a6 remove defensive check utilities; by 张炎泼; 2023-04-11

    Most defensive checks are replaced with debug_assert! embedded in Engine. StoreExt as a RaftStorage wrapper that implements defensive checks are no longer needed. StoreExt are mainly used for testing and it is very slow so that can not be used in production.

    • Remove structs: StoreExt, DefensiveStoreBuilder
    • Remove traits: Wrapper, DefensiveCheckBase, DefensiveCheck,
  • Changed: eaf45dfa move RaftStateMachine out of RaftStorage; by 张炎泼; 2023-04-01

    In Raft, the state machine is an independent storage component that operates separately from the log store. As a result, accessing the log store and accessing the state machine can be naturally parallelized.

    This commit replaces the type parameter RaftStorage in Raft<.., S: RaftStorage> with two type parameters: RaftLogStorage and RaftStateMachine.

    • Add: RaftLogReaderExt to provide additional log access methods based on a RaftLogReader implementation. Some of the methods are moved from StorageHelper to this trait.

    • Add: Adapter to let application use the seperated log state machine framework without rewriting RaftStorage implementation.

    • Refactor: shorten type names for the 2 example crates

    Upgrade tip

    Use an adapter to wrap RaftStorage:

    // Before:
    let store = MyRaftStorage::new();
    Raft::new(..., store);
    
    // After:
    let store = MyRaftStorage::new();
    let (log_store, sm) = Adaptoer::new(store);
    Raft::new(..., log_store, sm);
  • Changed: 9f8ae43e RaftMetrics.replication type to BTreeMap<NodeId, Option<LogId>>; by 张炎泼; 2023-04-24

    The RaftMetrics.replication used to be of type ReplicationMetrics{ replication: BTreeMap<NodeId, ReplicationTargetMetrics> } which contained an atomic log index value for each ReplicationTargetMetrics stored in the BTreeMap. The purpose of this type was to reduce the cost of copying a metrics instance. However, since the metrics report rate has been significantly reduced, this cost is now negligible. As a result, these complicated implementations have been removed. When reporting metrics, they can simply be cloned from the progress information maintained by Engine.

    Upgrade tip

    Replace usage of RaftMetrics.replication.data().replication.get(node_id) with RaftMetrics.replication.get(node_id).

  • Changed: 84539cb0 move snapshot type definition from storage traits to RaftTypeConfig; by 张炎泼; 2023-04-26

    Similar to NodeId or Entry, SnapshotData is also a data type that is specified by the application and needs to be defined in RaftTypeConfig, which is a collection of all application types.

    Public types changes:

    • Add SnapshotData to RaftTypeConfig:
      pub trait RaftTypeConfig {
          /// ...
          type SnapshotData: AsyncRead + AsyncWrite + AsyncSeek + Send + Sync + Unpin + 'static;
      }
    • Remove associated type SnapshotData from storage::RaftStorage.
    • Remove associated type SnapshotData from storage::v2::RaftStateMachine.

    Corresponding API changes:

    • Change storage::RaftSnapshotBuilder<C: RaftTypeConfig, SNAPSHOT_DATA> to RaftSnapshotBuilder<C>
    • Change storage::Snapshot<NID: NodeId, N: Node, SNAPSHOT_DATA> to storage::Snapshot<C>

    Upgrade tip:

    Update generic type parameter in application types to pass compilation.

  • Changed: e78bbffe remove unused error CommittedAdvanceTooMany; by 张炎泼; 2023-05-14

    Upgrade tip:

    Do not use it.

v0.8.3

Improved:

  • Improved: 23f4a73b AppDataResponse does not need a Clone trait bound; by 张炎泼; 2023-03-09

    • Fix: #703
  • Improved: 664635e0 loosen validity check with RaftState.snapshot_last_log_id(); by 张炎泼; 2023-03-10

    A application may not persist snapshot. And when it restarted, the last-purged-log-id is not None but snapshot_last_log_id() is None. This is a valid state and should not emit error.

  • Improved: 54aea8a2 fix: delay election if a greater last log id is seen; by 张炎泼; 2023-03-14

    If this node sees a greater last-log-id on another node, it will be less likely to be elected as a leader. In this case, it is necessary to sleep for a longer period of time smaller_log_timeout so that other nodes with a greater last-log-id have a chance to elect themselves.

    Fix: such as state should be kept until next election, i.e., it should be a field of Engine instead of a field of internal_server_state. And this value should be greater than the election_timeout of every other node.

Changed:

  • Changed: 9ddb5715 RaftState: make RaftState.vote private. Accesses vote via 2 new public methods: vote_ref() and vote_last_modified().; by 张炎泼; 2023-03-12

  • Changed: 3b4f4e18 move log id related traits to mod openraft::log_id; by 张炎泼; 2023-03-14

    Move trait RaftLogId, LogIdOptionExt and LogIndexOptionExt from openraft::raft_types to mod openraft::log_id

v0.8.2

Changed:

  • Changed: 342d0de2 rename variants in ChangeMembers, add AddVoters; by 张炎泼; 2023-03-01

    Rename ChangeMembers::AddVoter to AddVoterIds, because it just updates voter ids.

    Rename ChangeMembers::RemoveVoter to RemoveVoters.

    Add ChangeMembers::AddVoters(BTreeMap) to add voters with corresponding Node, i.e., it adds nodes as learners and update the voter-ids in a Membership.

Added:

  • Added: 50821c37 impl PartialEq for Entry; by 张炎泼; 2023-03-02

Fixed:

  • Fixed: 97fa1581 discard blank log heartbeat, revert to the standard heartbeat; by 张炎泼; 2023-03-04

    The blank log heartbeat design has two problems:

    • The heartbeat that sends a blank log introduces additional I/O, as a follower has to persist every log to maintain correctness.

    • Although (term, log_index) serves as a pseudo time in Raft, measuring whether a node has caught up with the leader and is capable of becoming a new leader, leadership is not solely determined by this pseudo time. Wall clock time is also taken into account.

      There may be a case where the pseudo time is not upto date but the clock time is, and the node should not become the leader. For example, in a cluster of three nodes, if the leader (node-1) is busy sending a snapshot to node-2(it has not yet replicated the latest logs to a quorum, but node-2 received message from the leader(node-1), thus it knew there is an active leader), node-3 should not seize leadership from node-1. This is why there needs to be two types of time, pseudo time (term, log_index) and wall clock time, to protect leadership.

      In the follow graph:

      • node-1 is the leader, has 4 log entries, and is sending a snapshot to node-2,
      • node-2 received several chunks of snapshot, and it perceived an active leader thus extended leader lease.
      • node-3 tried to send vote request to node-2, although node-2 do not have as many logs as node-3, it should still reject node-3's vote request because the leader lease has not yet expired.

      In the obsolete design, extending pseudo time (term, index) with a tick, in this case node-3 will seize the leadership from node-2.

      Ni: Node i
      Ei: log entry i
      
      N1 E1 E2 E3 E4
            |
            v
      N2    snapshot
            +-----------------+
                     ^        |
                     |        leader lease
                     |
      N3 E1 E2 E3    | vote-request
      ---------------+----------------------------> clock time
                     now
      
      

    The original document is presented below for reference.

  • Fixed: b5caa44d Wait::members() should not count learners as members; by 张炎泼; 2023-03-04

    Wait::members() waits until membership becomes the expected value. It should not check against all nodes. Instead, it should only check voters, excluding learners.

v0.8.1

Added:

  • Added: b3c2ff7e add Membership methods: voter_ids(), learner_ids(), get_node(); by 张炎泼; 2023-02-28

v0.8.0

Fixed:

  • Fixed: 86e2ccd0 a single Candidate should be able to vote itself.; by 张炎泼; 2022-01-20

    A Candidate should check if it is the only member in a cluster before sending vote request. Otherwise a single node cluster does work.

  • Fixed: 4015cc38 a Candidate should revert to Follower at once when a higher vote is seen; by 张炎泼; 2022-02-03

    When a Candidate saw a higher vote, it store it at once. Then no more further granted votes are valid to this candidate, because vote they granted are changed.

    Thus it was wrong to compare last_log_id before deciding if to revert to Follower. The right way is to revert to Follower at once and stop the voting procedure.

  • Fixed: 1219a880 consistency issue between ReplicationCore.last_log_id and last_log_state.last_log_id; by 张炎泼; 2022-02-28

  • Fixed: efdc321d a leader should report leader metrics with value Update::AsIs instead of Update::Update(None). Otherwise it mistakenly purges metrics about replication; by 张炎泼; 2022-04-01

  • Fixed: 797fb9b1 update replication metrics only when the replication task stopped, to provide a consistent view of RaftMetrics; by 张炎泼; 2022-06-04

  • Fixed: 918b48bc #424 wrong range when searching for membership entries: [end-step, end).; by 张炎泼; 2022-07-03

    The iterating range searching for membership log entries should be [end-step, end), not [start, end). With this bug it will return duplicated membership entries.

    • Related: #424
  • Fixed: 8594807c metrics has to be updated last; by 张炎泼; 2022-07-13

    Otherwise the application receives updated metrics while the internal raft state is still stale.

  • Fixed: 59ddc982 avoid creating log-id with uninitialized matched.leader_id.; by 张炎泼; 2022-07-26

    When waiting for a newly added learner to become up to date, it tries to compare last-log-id and the reported matched replication state. But the matched may have not yet receive any update and is uninitialized, in such case, it tries to create a temp LogId with leader_id(0, 0), which is illegal.

    The fix is simple: do not use log-id. Just calculating replication lag by log index.

    Add test to reproduce it: openraft/tests/membership/t99_issue_471_adding_learner_uses_uninit_leader_id.rs

    • Fix: #471
  • Fixed: 43dd8b6f when leader reverts to follower, send error to waiting clients; by 张炎泼; 2022-08-06

    When a leader reverts to follower, e.g., if a higher vote is seen, it should inform waiting clients that leadership is lost.

  • Fixed: 71a290cd when handling append-entries, if prev_log_id is purged, it should not treat it as a conflict; by 张炎泼; 2022-08-14

    when handling append-entries, if prev_log_id is purged, it should not treat it as a conflict log and should not delete any log.

    This bug is caused by using committed as last_applied. committed may be smaller than last_applied when a follower just starts up.

    The solution is merging committed and last_applied into one field: committed, which is always greater than or equal the actually committed(applied).

  • Fixed: 674e78aa potential inconsistency when installing snapshot; by 张炎泼; 2022-09-21

    The conflicting logs that are before snapshot_meta.last_log_Id should be deleted before installing a snapshot.

    Otherwise there is chance the snapshot is installed but conflicting logs are left in the store, when a node crashes.

  • Fixed: 4ea66acd stop tick task when shutting down Raft; by Matthias Wahl; 2022-09-27

  • Fixed: 56486a60 Error after change_membership: assertion failed: value > prev: #584; by 张炎泼; 2022-10-29

    Problem:

    Error occurs after calling change_membership(): assertion failed: value > prev, when changing membership by converting a learner to a voter.

    Because the replication streams are re-spawned, thus progress reverts to zero. Then a reverted progress causes the panic.

    Solution:

    When re-spawning replications, remember the previous progress.

    • Fix: #584
  • Fixed: 678af4a8 when responding ForwardToLeader, make leader_id a None if the leader is no longer in the cluster; by 张炎泼; 2022-11-02

  • Fixed: 0023cff1 delay leader step down; by 张炎泼; 2022-11-06

    When a membership that removes the leader is committed, the leader continue to work for a short while before reverting to a learner. This way, let the leader replicate the membership-log-is-committed message to followers.

    Otherwise, if the leader step down at once, the follower might have to re-commit the membership log again.

    After committing the membership log that does not contain the leader, the leader will step down in the next tick.

  • Fixed: ff9a9335 it should make a node non-leader when restarting single node cluster; by 张炎泼; 2022-12-03

    A node should not set server_state to Leader when just starting up, even when it's the only voter in a cluster. It still needs several step to initialize leader related fields to become a leader.

    • Fix: #607
  • Fixed: 0e7ab5a7 workaround cargo leaking SSL_CERT_FILE issue; by 张炎泼; 2022-12-09

    On Linux: command cargo run pollutes environment variables: It leaks SSL_CERT_FILE and SSL_CERT_DIR to the testing sub progress it runs. Which cause reqwest spending ~50 ms loading the certificates for every RPC.

    We just extend the RPC timeout to work around.

    • Fix: #550
  • Fixed: cc8af8cd last_purged_log_id is not loaded correctly; by 张炎泼; 2023-01-08

    • Fix: last_purged_log_id should be None, but not LogId{index=0, ..} when raft startup with a store with log at index 0.

      This is fixed by adding another field next_purge to distinguish last_purged_log_id value None and LogId{index=0, ..}, because RaftState.log_ids stores LogId but not Option<LogId>.

    • Add a wrapper Valid<RaftState> of RaftState to check if the state is valid every time accessing it. This check is done only when debug_assertions is turned on.

  • Fixed: 9dbbe14b check_is_leader() should return at once if encountering StorageError; by 张炎泼; 2023-02-12

    Refactor: ExtractFatal is not used any more. Fatal error should only be raised by Command executor, no more by API handler. There is no need to extract Fatal error from an API error.

  • Fixed: a80579ef a stepped down leader should ignore replication progress message; by 张炎泼; 2023-02-12

  • Fixed: c8fccb22 when adding a learner, ensure the last membership is committed; by 张炎泼; 2023-02-19

    Previously, when adding a learner to a Raft cluster, the last membership was not always marked as committed, which could cause issues when a follower tried to truncate logs by reverting to the last committed membership. To prevent this issue, we have updated the code to ensure the last membership is committed when adding a learner.

    In addition to this fix, we have also made several refactoring changes, including refining method names for trait Coherent, renaming Membership::next_safe() to next_coherent() for consistency, and updating enum ChangeMembers to include more variants for adding and removing learners. We have also removed RaftCore::add_learner() in favor of using change_membership() for all membership operations, and added a ChangeHandler to build new membership configurations for change-membership requests.

    Finally, we have updated the Membership API with a new method new_with_nodes() for building a new membership configuration, and moved the validation check out into a separate function, ensure_valid(). Validation is now done only when needed.

Changed:

  • Changed: 86e2ccd0 Wait::log_at_least() use Option<u64> as the input log index, instead of using u64; by 张炎泼; 2022-01-20

  • Changed: 71a290cd remove RaftState.last_applied, use committed to represent the already committed and applied log id; by 张炎泼; 2022-08-14

  • Changed: 2254ffc5 add sub error types of ReplicationError; by 张炎泼; 2022-01-20

    • Add sub errors such as Timeout and NetworkError.

    • Remove ReplicationError::IO, use StorageError instead.

  • Changed: f08a3e6d RaftNetwork return RPCError instead of anyhow::Error; by 张炎泼; 2022-01-23

    • When a remote error encountered when replication, the replication will be stopped at once.

    • Fix: #140

  • Changed: d55fa625 add ConfigError sub error; remove anyhow; by 张炎泼; 2022-01-23

    • Fix: #144
  • Changed: 58f2491f RaftStorage: use Vote to replace HardState; by 张炎泼; 2022-01-25

    • Rename: save_hard_state() and read_hard_state() to save_vote() and read_vote().

    • Replace term, node_id pair with Vote in RaftCore and RPC struct-s.

  • Changed: a68a9a9a use term, node_id, index to identify a log entry; by 张炎泼; 2022-01-26

  • Changed: 0b753622 Raft::add_learner() accepts optional arg Node.; by 张炎泼; 2022-02-17

    When adding a learner, an optional Node can be provided to store additional info of a node in Membership.

    A common usage if to store node address in the Membership so that an application does not need another component to get address of a node when implementing RaftNetwork.

  • Changed: 5ba730c9 Replace replication state in RaftMetrics with a reference to atomic values; by Ivan Schréter; 2022-02-22

  • Changed: a76f41ac Extract RaftLogReader, RaftSnapshotBuilder from RaftStorage, split RaftNetwork and RaftNetworkFactory; by Ivan Schréter; 2022-02-22

    RaftStorage is now refactored to:

    • RaftLogReader to read data from the log in parallel tasks independent of the main Raft loop
    • RaftStorage to modify the log and the state machine (implements also RaftLogReader) intended to be used in the main Raft loop
    • RaftSnapshotBuilder to build the snapshot in background independent of the main Raft loop

    The RaftStorage API offers to create new RaftLogReader or RaftSnapshotBuilder on it.

    RaftNetwork is also refactored to:

    • RaftNetwork responsible for sending RPCs
    • RaftNetworkFactory responsible for creating instances of RaftNetwork for sending data to a particular node
  • Changed: f40c2055 Add a RaftTypeConfig trait to configure common types; by Ivan Schréter; 2022-02-25

  • Changed: 650e2352 Membership remove redundant field learners: the node ids that are in Membership.nodes but not in Membership.configs are learners; by 张炎泼; 2022-03-07

  • Changed: 81cd3443 EffectiveMembership.log_id to Option; by 张炎泼; 2022-04-05

  • Changed: 67375a2a RaftStorage: use EffectiveMembership instead of Option<_>; by 张炎泼; 2022-04-05

  • Changed: ffc82682 rename ReplicationMetrics and methods in MetricsChangeFlags; by 张炎泼; 2022-04-05

    • Change: rename ReplicationMetrics to ReplicationTargetMetrics

    • Change: rename LeaderMetrics to ReplicationMetrics

  • Changed: 7b1d4660 rename RaftMetrics.leader_metrics to replication; by 张炎泼; 2022-04-06

  • Changed: 30b485b7 rename State to ServerState; by 张炎泼; 2022-04-16

  • Changed: ca8a09c1 rename InitialState to RaftState; by 张炎泼; 2022-04-16

  • Changed: 8496a48a add error Fatal::Panicked, storing RaftCore panic; by 张炎泼; 2022-05-09

    Changes:

    • Add committed_membership to RaftState, to store the previous committed membership config.

    • Change: RaftStorage::get_membership() returns a vec of at most 2 memberships.

    • Change: RaftStorage::last_membership_in_log() returns a vec of at most 2 memberships.

  • Changed: 1f645feb add last_membership to SnapshotMeta; by 张炎泼; 2022-05-12

  • Changed: bf4e0497 Make serde optional; by devillve084; 2022-05-22

  • Changed: b96803cc external_request() replace the 1st arg ServerState with RaftState; by 张炎泼; 2022-06-08

    This change let user do more things with a external fn request.

  • Changed: d81c7279 after shutdown(), it should return an error when accessing Raft, instead of panicking.; by devillve084; 2022-06-16

  • Changed: 0de003ce remove RaftState.last_log_id and RaftState.last_purged_log_id; by 张炎泼; 2022-06-22

    Remove these two fields, which are already included in RaftState.log_ids; use last_log_id() and last_purged_log_id() instead.

  • Changed: 7f00948d API: cleanup APIs in Membership and EffectiveMembership; by 张炎泼; 2022-06-29

    • Refactor: move impl of QuorumSet from Membership to EffectiveMembership.

      Add a field EffectiveMembership.quorum_set, to store a QuorumSet built from the Membership config. This quorum set can have a different structure from the Membership, to optimized quorum check.

    • Refactor: impl methods in Membership or EffectiveMembership with Iterator if possible.

    • Refactor: use term voter and learner for methods and fields.

  • Changed: 01a16d08 remove tx from spawn_replication_stream(); by 张炎泼; 2022-07-01

    Replication should not be responsible invoke the callback when replication become upto date. It makes the logic dirty. Such a job can be done by watching the metrics change.

    • Change: API: AddLearnerResponse has a new field membership_log_id which is the log id of the membership log that contains the newly added learner.
  • Changed: 6b9ae52f remove error AddLearnerError::Exists; by 张炎泼; 2022-07-01

    Even when the learner to add already exists, the caller may still want to block until the replication catches up. Thus it does not expect an error.

    And Exists is not an issue the caller has to deal with, it does not have to be an error.

  • Changed: d7afc721 move default impl methods in RaftStorage to StorageHelper.; by 张炎泼; 2022-07-01

    • get_initial_state()
    • get_log_id()
    • get_membership()
    • last_membership_in_log()

    In the trait RaftStorage, these methods provide several default methods that users do not need to care about. It should no longer be methods that user may need to implement.

    To upgrade:

    If you have been using these methods, replace sto.xxx() with StorageHelper::new(&mut sto).xxx().

  • Changed: a010fddd Stop replication to removed node at once when new membership is seen; by 张炎泼; 2022-07-12

    Before this commit, when membership changes, e.g., from a joint config [(1,2,3), (3,4,5)] to uniform config [3,4,5](assuming the leader is 3), the leader stops replication to 1,2 when [3,4,5] is committed.

    This is an unnecessarily complicated solution. It is OK for the leader to stop replication to 1,2 as soon as config [3,4,5] is seen, instead of when config [3,4,5] is committed.

    • If the leader(3) finally committed [3,4,5], it will eventually stop replication to 1,2.
    • If the leader(3) crashes before committing [3,4,5]:
      • And a new leader sees the membership config log [3,4,5], it will continue to commit it and finally stop replication to 1,2.
      • Or a new leader does not see membership config log [3,4,5], it will re-establish replication to 1,2.

    In any case, stopping replication at once is OK.

    One of the considerations about this modification is: The nodes, e.g., 1,2 do not know they have been removed from the cluster:

    • Removed node will enter the candidate state and keeps increasing its term and electing itself. This won't affect the working cluster:

      • The nodes in the working cluster have greater logs; thus, the election will never succeed.

      • The leader won't try to communicate with the removed nodes thus it won't see their higher term.

    • Removed nodes should be shut down finally. No matter whether the leader replicates the membership without these removed nodes to them, there should always be an external process that shuts them down. Because there is no guarantee that a removed node can receive the membership log in a finite time.

    Changes:

    • Change: remove config remove_replication, since replication will be removed at once.

    • Refactor: Engine outputs Command::UpdateReplicationStream to inform the Runtime to update replication, when membership changes.

    • Refactor: remove ReplicationState.failures, replication does not need count failures to remove it.

    • Refactor: remove ReplicationState.matched: the matched log id has been tracked by Engine.state.leader.progress.

    • Fix: #446

  • Changed: 2d1aff03 error InProgress: add field committed; by 张炎泼; 2022-07-15

    • Refactor: Simplify Engine command executor
  • Changed: 8c7f0857 remove ClientWriteRequest; by 张炎泼; 2022-08-01

    Remove struct ClientWriteRequest. ClientWriteRequest is barely a wrapper that does not provide any additional function.

    Raft::client_write(ClientWriteRequest) is changed to Raft::client_write(app_data: D), where D is application defined AppData implementation.

  • Changed: 565b6921 ErrorSubject::Snapshot(SnapshotSignature); by 张炎泼; 2022-08-02

    Change ErrorSubject::Snapshot(SnapshotMeta) to ErrorSubject::Snapshot(SnapshotSignature).

    SnapshotSignature is the same as SnapshotMeta except it does not include Membership information. This way errors do not have to depend on type Node, which is used in Membership and it is a application specific type.

    Then when a user-defined generic type NodeData is introduced, error types do not need to change.

    • Part of: #480
  • Changed: e4b705ca Turn Node into a trait (#480); by Heinz N. Gies; 2022-08-03

    Structs that depend on Node now have to implement trait Node, or use a predefined basic implementation BasicNode. E.g., struct Membership now has two type parameters: impl<NID, N> Membership<NID, N> where N: Node, NID: NodeId.

  • Changed: c836355a Membership.nodes remove Option from value; by 张炎泼; 2022-08-04

    Before this commit, the value of Membership.nodes is Option<N: Node>: Membership.nodes: BTreeMap<NID, Option<N>>

    The value does not have to be an Option. If an application does not need openraft to store the Node data, it can just implement trait Node with an empty struct, or just use BasicNode as a placeholder.

    • Using Option<N> as the value is a legacy and since #480 is merged, we do not need the Option any more.
  • Changed: 70e3318a SnapshotMeta.last_log_id from LogId to Option of LogId; by 张炎泼; 2022-08-17

    SnapshotMeta.last_log_id should be the same type as StateMachine.last_applied.

    By making SnapshotMeta.last_log_id an Option of LogId, a snapshot can be build on an empty state-machine(in which last_applied is None).

  • Changed: d0d04b28 only purge logs that are in snapshot; by 张炎泼; 2022-08-28

    Let snapshot+logs be a complete state of a raft node.

    The Assumption before is state_machine+logs is a complete state of a raft node. This requires state machine to persist the state every time applying a log, which would be an innecessary overhead.

    • Change: remove ENV config entries. Do not let a lib be affected by environment variables.

    • Change: remove Config.keep_unsnapshoted_log: now by default, logs not included in snapshot won't be deleted.

      Rename Config.max_applied_log_to_keep to max_in_snapshot_log_to_keep.

  • Changed: 3111e7e6 RaftStorage::install_snapshot() does not need to return state changes; by 张炎泼; 2022-08-28

    The caller of RaftStorage::install_snapshot() knows about what changes have been made, the return value is unnecessary.

  • Changed: a12fd8e4 remove error MissingNodeInfo; by 张炎泼; 2022-11-02

    Because in a membership the type Node is not an Option any more, MissingNodeInfo error will never occur.

  • Changed: dbeae332 rename IntoOptionNodes to IntoNodes; by 张炎泼; 2022-11-02

  • Changed: e8ec9c50 EffectiveMembership::get_node() should return an Option; by 张炎泼; 2022-11-02

    EffectiveMembership::get_node() should return an Option<&Node> instead of a &Node. Otherwise it panic if the node is not found.

  • Changed: 93116312 remove error NodeNotFound; by 张炎泼; 2022-12-28

    A node is stored in Membership thus it should always be found. Otherwise it is a bug of openraft. In either case, there is no need for an application to deal with RPCError::NodeNotFound error.

    An application that needs such an error should define it as an application error.

    • Migration guide: if you do have been using it, you could just replace NodeNotFound with NetworkError.

    • Fix: #623

  • Changed: e1238428 RaftState: add field snapshot_meta; by 张炎泼; 2022-12-30

    Snapshot meta should be part of the RaftState. Move it from Engine to RaftState

  • Changed: 2dd81018 make Raft::new() async and let it return error during startup; by 张炎泼; 2023-01-02

    • Change: move startup process from RaftCore::do_main() to Raft::new(), so that an error during startup can be returned earlier.

      Upgrade guide: application has to consume the returned future with Raft::new().await, and the error returned by the future.

    • Refactor: move id from Engine.id to Engine.config.id, so that accessing constant attribute does not depend on a reference to Engine.

  • Changed: 3d5e0016 A restarted leader should enter leader state at once, without another round of election; by 张炎泼; 2023-01-04

    • Test: single-node restart test does not expect the node to run election any more.

    • Refactor: add VoteHandler to handle vote related operations.

    • Change: make ServerState default value Learner.

    • Fix: #607

  • Changed: 77e87a39 remove InitializeError::NotAMembershipEntry error; by 张炎泼; 2023-02-12

    Such an error can only be caused by internal calls. An application do not need to handle it.

  • Changed: fbb3f211 add RaftError as API return error type.; by 张炎泼; 2023-02-12

    Add RaftError<E> as error type returned by every Raft::xxx() API. RaftError has two variants: Fatal error or API specific error. This way every API error such as AppendEntriesError does not have to include an Fatal in it.

    Upgrade tip:

    The affected types is mainly trait RaftNetwork, an application should replace AppendEntriesError, VoteError, InstallSnapshotError with RaftError<_>, RaftError<_>, and RaftError<_, InstallSnapshotError>.

    So is for other parts, e.g., Raft::append_entries() now returns Result<AppendEntriesResponse, RaftError<_>>, an application should also rewrite error handling that calls these APIs.

    See changes in examples/.

  • Changed: d1b3b232 remove RaftNetworkFactory::ConnectionError and AddLearnerError::NetworkError; by 张炎泼; 2023-02-12

    RaftNetworkFactory::new_client() does not return an error because openraft can only ignore it. Therefore it should not create a connection but rather a client that will connect when required. Thus there is chance it will build a client that is unable to send out anything, e.g., in case the Node network address is configured incorrectly.

    Because of the above change, And AddLearnerError will not include a NetworkError any more, because when adding a learner, the connectivity can not be effectively detected.

    Upgrade tip:

    Just update the application network implementation so that it compiles.

  • Changed: 0161a3d2 remove AddLearnerResponse and AddLearnerError; by 张炎泼; 2023-02-17

    In openraft adds a learner is done by committing a membership config log, which is almost the same as committing any log.

    AddLearnerResponse contains a field matched to indicate the replication state to the learner, which is not included in ClientWriteResponse. This information can be retrieved via Raft::metrics().

    Therefore to keep the API simple, replace AddLearnerResponse with ClientWriteResponse.

    Behavior change: adding a learner always commit a new membership config log, no matter if it already exists in membership. To avoid duplicated add, an application should check existence first by examining Raft::metrics()

    • Fix: #679

    Upgrade tips:

    • Replace AddLearnerResponse with ClientWriteResponse
    • Replace AddLearnerError with ClientWriteError

    Passes the application compilation.

    See the changes in examples/.

  • Changed: 9906d6e9 remove non-blocking membership change; by 张炎泼; 2023-02-18

    When changing membership in nonblocking mode, the leader submits a membership config log but does not wait for the log to be committed.

    This is useless because the caller has to assert the log is committed, by periodically querying the metrics of a raft node, until it is finally committed. Which actually makes it a blocking routine.

    API changes:

    • Removes allow_lagging parameter from Raft::change_membership()
    • Removes error LearnerIsLagging

    Upgrade tip:

    Adjust API calls to make it compile.

    Refactor: move leader_append_entries() to LeaderHandler.

  • Changed: f591726a trait IntoNodes adds two new method has_nodes() and node_ids(); by 张炎泼; 2023-02-19

    trait IntoNodes converts types T such as Vec or BTreeSet into BTreeMap<NID, Node>.

    This patch changes the functionality of the IntoNodes trait to provide two new methods has_nodes() and node_ids(), in addition to the existing into_nodes() method. The has_nodes() method returns true if the type T contains any Node objects, and node_ids() returns a Vec of the NodeId objects associated with the Node objects in T.

    Refactor:

    The patch also refactors the Membership::next_safe() method to return an Err(LearnerNotFound) if it attempts to build a Membership object containing a voter_id that does not correspond to any Node.

  • Changed: 55217aa4 move default implemented method from trait RaftLogReader to StorageHelper; by 张炎泼; 2023-02-21

    Function get_log_entries() and try_get_log_entry() are provided by trait RaftLogReader with default implementations. However, they do not need to be part of this trait and an application does not have to implement them.

    Therefore in this patch they are moved to StorageHelper struct, which provides additional storage access methods that are built based on the RaftStorage trait.

  • Changed: 0a1dd3d6 replace EffectiveMembership with StoredMembership in RaftStorage; by 张炎泼; 2023-02-26

    EffectiveMembership is a struct used at runtime, which contains additional information such as an optimized QuorumSet implementation that has different structure from a Membership.

    To better separate concerns, a new struct called StoredMembership has been introduced specifically for storage purpose. It contains only the information that needs to be stored in storage. Therefore, StoredMembership is used instead of EffectiveMembership in RaftStorage.

    Upgrade tip:

    Replace EffectiveMembership with StoredMembership in an application.

    Fields in EffectiveMembership are made private and can be accessed via corresponding methods such as: EffectiveMembership.log_id and EffectiveMembership.membership should be replaced with EffectiveMembership::log_id() and EffectiveMembership::membership().

Added:

  • Added: 966eb287 use a version to track metrics change; by 张炎泼; 2022-03-03

    Add Versioned<D> to track changes of an Arc<D>.

    In openraft, some frequently updated object such metrics are wrapped in an Arc, and some modification is made in place: by storing an AtomicU64.

  • Added: 80f89134 Add support for external requests to be executed inside of Raft core loop; by Ivan Schréter; 2022-03-05

    The new feature is also exposed via RaftRouter test fixture and tested in the initialization test (in addition to the original checks).

  • Added: 2a5c1b9e add feature-flag: bt enables backtrace; by 张炎泼; 2022-03-12

  • Added: 16406aec add error NotAllowed; by 张炎泼; 2022-04-06

  • Added: a8655446 InitialState: add last_purged_log_id; by 张炎泼; 2022-04-07

  • Added: 6f20e1fc add trait RaftPayload RaftEntry to access payload and entry without the need to know about user data, i.e., AppData or AppDataResponse.; by 张炎泼; 2022-04-07

  • Added: 67c870e2 add err: NotAMembershipEntry, NotInMembers; by 张炎泼; 2022-04-16

  • Added: 675a0f8f Engine stores log ids; by 张炎泼; 2022-04-16

  • Added: 2262c79f LogIdList: add method purge() to delete log ids; by 张炎泼; 2022-05-08

  • Added: ff898cde Engine: add method: purge_log(); by 张炎泼; 2022-05-08

  • Added: 4d0918f2 Add rocks based example; by Heinz N. Gies; 2022-07-05

  • Added: 86eb2981 Raft::enable_tick() to enable or disable election timeout; by 张炎泼; 2022-07-31

  • Added: 956177df use blank log for heartbeat (#483); by 张炎泼; 2022-08-01

    • Feature: use blank log for heartbeat

    Heartbeat in standard raft is the way for a leader to assert it is still alive.

    • A leader send heartbeat at a regular interval.
    • A follower that receives a heartbeat believes there is an active leader thus it rejects election request(send_vote) from another node unreachable to the leader, for a short period.

    Openraft heartbeat is a blank log

    Such a heartbeat mechanism depends on clock time. But raft as a distributed consensus already has its own pseudo time defined very well. The pseudo time in openraft is a tuple (vote, last_log_id), compared in dictionary order.

    Why it works

    To refuse the election by a node that does not receive recent messages from the current leader, just let the active leader send a blank log to increase the pseudo time on a quorum.

    Because the leader must have the greatest pseudo time, thus by comparing the pseudo time, a follower automatically refuse election request from a node unreachable to the leader.

    And comparing the pseudo time is already done by handle_vote_request(), there is no need to add another timer for the active leader.

    Other changes:

    • Feature: add API to switch timeout based events:

      • Raft::enable_tick(): switch on/off election and heartbeat.
      • Raft::enable_heartbeat(): switch on/off heartbeat.
      • Raft::enable_elect(): switch on/off election.

      These methods make some testing codes easier to write. The corresponding Config entries are also added: Config::enable_tick Config::enable_heartbeat Config::enable_elect

    • Refactor: remove Engine Command::RejectElection. Rejecting election now is part of handle_vote_req() as blank-log heartbeat is introduced.

    • Refactor: heartbeat is removed from ReplicationCore. Instead, heartbeat is emitted by RaftCore.

    • Fix: when failed to sending append-entries, do not clear need_to_replicate flag.

    • CI: add test with higher network delay.

    • Doc: explain why using blank log as heartbeat.

    • Fix: #151

  • Added: b6817758 add Raft::trigger_elect() and Raft::trigger_heartbeat() to let user manually trigger a election or send a heartbeat log; by 张炎泼; 2022-08-06

  • Added: f437cda0 add Raft::trigger_snapshot() to manually trigger to build snapshot at once; by 张炎泼; 2022-08-07

  • Added: eae08515 Added sled store example based on rocks example; by kus; 2022-08-16

  • Added: 07a2a677 adding a snapshot finalize timeout config; by Zach Schoenberger; 2022-11-09

  • Added: 2877be0c add config: send_snapshot_timeout; by Zach Schoenberger; 2022-11-09

  • Added: 541e9d36 add "Inflight" to store info about inflight replication data; by 张炎泼; 2023-01-17

  • Added: 4a85ee93 feature flag "single-term-leader": standard raft mode; by 张炎泼; 2023-02-13

    With this feature on: only one leader can be elected in each term, but reduce LogId size from LogId:{term, node_id, index} to LogId{term, index}.

    Add CommittedLeaderId as the leader-id type used in LogId: The leader-id used in LogId can be different(smaller) from leader-id used in Vote, depending on LeaderId definition. CommittedLeaderId is the smallest data that can identify a leader after the leadership is granted by a quorum(committed).

    Change: Vote stores a LeaderId in it.

    // Before
    pub struct Vote<NID> {
        pub term: u64,
        pub node_id: NID,
        pub committed: bool,
    }
    
    // After
    pub struct Vote<NID> {
        #[cfg_attr(feature = "serde", serde(flatten))]
        pub leader_id: LeaderId<NID>,
    
        pub committed: bool,
    }

    Upgrade tip:

    If you manually serialize Vote, i.e. without using serde, the serialization part should be rewritten.

    Otherwise, nothing needs to be done.

    • Fix: #660
  • Added: 4f4b05f6 add v07-v08 compatible store rocksstore-compat07; by 张炎泼; 2023-02-25

v0.7.4

Changed:

  • Changed: 1bd22edc remove AddLearnerError::Exists, which is not actually used; by 张炎泼; 2022-09-30

  • Changed: c6fe29d4 change-membership does not return error when replication lags; by 张炎泼; 2022-10-22

    If blocking is true, Raft::change_membership(..., blocking) will block until repliication to new nodes become upto date. But it won't return an error when proposing change-membership log.

    • Change: remove two errors: LearnerIsLagging and LearnerNotFound.

    • Fix: #581

Fixed:

  • Fixed: 2896b98e changing membership should not remove replication to all learners; by 张炎泼; 2022-09-30

    When changing membership, replications to the learners(non-voters) that are not added as voter should be kept.

    E.g.: with a cluster of voters {0} and learners {1, 2, 3}, changing membership to {0, 1, 2} should not remove replication to node 3.

    Only replications to removed members should be removed.

Added:

  • Added: 9a22bb03 add rocks-store as a RaftStorage implementation based on rocks-db; by 张炎泼; 2023-02-22

v0.7.3

Changed:

  • Changed: 25e94c36 InstallSnapshotResponse: replies the last applied log id; Do not install a smaller snapshot; by 张炎泼; 2022-09-22

    A snapshot may not be installed by a follower if it already has a higher last_applied log id locally. In such a case, it just ignores the snapshot and respond with its local last_applied log id.

    This way the applied state(i.e., last_applied) will never revert back.

Fixed:

  • Fixed: 21684bbd potential inconsistency when installing snapshot; by 张炎泼; 2022-09-22

    The conflicting logs that are before snapshot_meta.last_log_id should be deleted before installing a snapshot.

    Otherwise there is chance the snapshot is installed but conflicting logs are left in the store, when a node crashes.

v0.7.2

Added:

  • Added: 568ca470 add Raft::remove_learner(); by 张炎泼; 2022-09-02

v0.7.1

Added:

  • Added: ea696474 add feature-flag: bt enables backtrace; by 张炎泼; 2022-03-12

    --features bt enables backtrace when generating errors. By default errors does not contain backtrace info.

    Thus openraft can be built on stable rust by default.

    To use on stable rust with backtrace, set RUSTC_BOOTSTRAP=1, e.g.:

    RUSTUP_TOOLCHAIN=stable RUSTC_BOOTSTRAP=1 make test
    

v0.7.0

v0.7.0-alpha.5

v0.7.0-alpha.4

v0.7.0-alpha.3

Changed:

  • Changed: f99ade30 API: move default impl methods in RaftStorage to StorageHelper; by 张炎泼; 2022-07-04

Fixed:

  • Fixed: 44381b0c when handling append-entries, if prev_log_id is purged, it should not delete any logs.; by 张炎泼; 2022-08-14

    When handling append-entries, if the local log at prev_log_id.index is purged, a follower should not believe it is a conflict and should not delete all logs. It will get committed log lost.

    To fix this issue, use last_applied instead of committed: last_applied is always the committed log id, while committed is not persisted and may be smaller than the actually applied, when a follower is restarted.

v0.7.0-alpha.2

Fixed:

  • Fixed: 30058c03 #424 wrong range when searching for membership entries: [end-step, end).; by 张炎泼; 2022-07-03

    The iterating range searching for membership log entries should be [end-step, end), not [start, end). With this bug it will return duplicated membership entries.

    • Bug: #424

v0.7.0-alpha.1

Fixed:

  • Fixed: d836d85c if there may be more logs to replicate, continue to call send_append_entries in next loop, no need to wait heartbeat tick; by lichuang; 2022-01-04

  • Fixed: 5a026674 defensive_no_dirty_log hangs tests; by YangKian; 2022-01-08

  • Fixed: 8651625e save leader_id if a higher term is seen when handling append-entries RPC; by 张炎泼; 2022-01-10

    Problem:

    A follower saves hard state (term=msg.term, voted_for=None) when a msg.term > local.term when handling append-entries RPC.

    This is quite enough to be correct but not perfect. Correct because:

    • In one term, only an established leader will send append-entries;

    • Thus, there is a quorum voted for this leader;

    • Thus, no matter what voted_for is saved, it is still correct. E.g. when handling append-entries, a follower node could save hard state (term=msg.term, voted_for=Some(ANY_VALUE)).

    The problem is that a follower already knows the legal leader for a term but still does not save it. This leads to an unstable cluster state: The test sometimes fails.

    Solution:

    A follower always save hard state with the id of a known legal leader.

  • Fixed: 1a781e1b when lack entry, the snapshot to build has to include at least all purged logs; by 张炎泼; 2022-01-18

  • Fixed: a0a94af7 span.enter() in async loop causes memory leak; by 张炎泼; 2022-06-17

    It is explained in: https://onesignal.com/blog/solving-memory-leaks-in-rust/

Changed:

  • Changed: c9c8d898 trait RaftStore: remove get_membership_config(), add last_membership_in_log() and get_membership() with default impl; by drdr xp; 2022-01-04

    Goal: minimize the work for users to implement a correct raft application.

    Now RaftStorage provides default implementations for get_membership() and last_membership_in_log().

    These two methods just can be implemented with other basic user impl methods.

    • fix: #59
  • Changed: abda0d10 rename RaftStorage methods do_log_compaction: build_snapshot, delete_logs_from: delete_log; by 张炎泼; 2022-01-15

  • Changed: a52a9300 RaftStorage::get_log_state() returns last purge log id; by 张炎泼; 2022-01-16

    • Change: get_log_state() returns the last_purged_log_id instead of the first_log_id. Because there are some cases in which log are empty: When a snapshot is install that covers all logs, or when max_applied_log_to_keep is 0.

      Returning None is not clear about if there are no logs at all or all logs are deleted.

      In such cases, raft still needs to maintain log continuity when repilcating. Thus the last log id that once existed is important. Previously this is done by checking the last_applied_log_id, which is dirty and buggy.

      Now an implementation of RaftStorage has to maintain the last_purged_log_id in its store.

    • Change: Remove first_id_in_log(), last_log_id(), first_known_log_id(), because concepts are changed.

    • Change: Split delete_logs() into two method for clarity:

      delete_conflict_logs_since() for deleting conflict logs when the replication receiving end find a conflict log.

      purge_logs_upto() for cleaning applied logs

    • Change: Rename finalize_snapshot_installation() to install_snapshot().

    • Refactor: Remove initial_replicate_to_state_machine(), which does nothing more than a normal applying-logs.

    • Refactor: Remove enum UpdateCurrentLeader. It is just a wrapper of Option.

  • Changed: 7424c968 remove unused error MembershipError::Incompatible; by 张炎泼; 2022-01-17

  • Changed: beeae721 add ChangeMembershipError sub error for reuse; by 张炎泼; 2022-01-17

v0.6.4

v0.6.3

v0.6.2

Fixed:

  • Fixed: 4d58a51e a non-voter not in joint config should not block replication; by drdr xp; 2021-08-31

  • Fixed: eed681d5 race condition of concurrent snapshot-install and apply.; by drdr xp; 2021-09-01

    Problem:

    Concurrent snapshot-install and apply mess up last_applied.

    finalize_snapshot_installation runs in the RaftCore thread. apply_to_state_machine runs in a separate tokio task(thread).

    Thus there is chance the last_applied being reset to a previous value:

    • apply_to_state_machine is called and finished in a thread.

    • finalize_snapshot_installation is called in RaftCore thread and finished with last_applied updated.

    • RaftCore thread finished waiting for apply_to_state_machine, and updated last_applied to a previous value.

    RaftCore: -.    install-snapshot,         .-> replicate_to_sm_handle.next(),
               |    update last_applied=5     |   update last_applied=2
               |                              |
               v                              |
    task:      apply 2------------------------'
    --------------------------------------------------------------------> time
    

    Solution:

    Rule: All changes to state machine must be serialized.

    A temporary simple solution for now is to call all methods that modify state machine in RaftCore thread. But this way it blocks RaftCore thread.

    A better way is to move all tasks that modifies state machine to a standalone thread, and send update request back to RaftCore to update its fields such as last_applied

  • Fixed: a48a3282 handle-vote should compare last_log_id in dictionary order, not in vector order; by drdr xp; 2021-09-09

    A log {term:2, index:1} is definitely greater than log {term:1, index:2} in raft spec. Comparing log id in the way of term1 >= term2 && index1 >= index2 blocks election: no one can become a leader.

  • Fixed: 228077a6 a restarted follower should not wait too long to elect. Otherwise the entire cluster hangs; by drdr xp; 2021-11-19

  • Fixed: 6c0ccaf3 consider joint config when starting up and committing.; by drdr xp; 2021-12-24

    • Change: MembershipConfig support more than 2 configs

    • Makes fields in MembershipConfig privates. Provides methods to manipulate membership.

    • Fix: commit without replication only when membership contains only one node. Previously it just checks the first config, which results in data loss if the cluster is in a joint config.

    • Fix: when starting up, count all nodes but not only the nodes in the first config to decide if it is a single node cluster.

  • Fixed: b390356f first_known_log_id() should returns the min one in log or in state machine; by drdr xp; 2021-12-28

  • Fixed: cd5a570d clippy warning; by lichuang; 2022-01-02

Changed:

  • Changed: deda6d76 remove PurgedMarker. keep logs clean; by drdr xp; 2021-09-09

    Changing log(add a PurgedMarker(original SnapshotPointer)) makes it diffeicult to impl install-snapshot for a RaftStore without a lock protecting both logs and state machine.

    Adding a PurgedMarker and installing the snapshot has to be atomic in storage layer. But usually logs and state machine are separated store. e.g., logs are stored in fast flash disk and state machine is stored some where else.

    To get rid of the big lock, PurgedMarker is removed and installing a snaphost does not need to keep consistent with logs any more.

  • Changed: 734eec69 VoteRequest: use last_log_id:LogId to replace last_log_term and last_log_index; by drdr xp; 2021-09-09

  • Changed: 74b16524 introduce StorageError. RaftStorage gets rid of anyhow::Error; by drdr xp; 2021-09-13

    StorageError is an enum of DefensiveError and StorageIOError. An error a RaftStorage impl returns could be a defensive check error or an actual io operation error.

    Why:

    anyhow::Error is not enough to support the flow control in RaftCore. It is typeless thus RaftCore can not decide what next to do depending on the returned error.

    Inside raft, anyhow::Error should never be used, although it could be used as source() of some other error types.

  • Changed: 46bb3b1c RaftStorage::finalize_snapshot_installation is no more responsible to delete logs included in snapshot; by drdr xp; 2021-09-13

    A RaftStorage should be as simple and intuitive as possible.

    One should be able to correctly impl a RaftStorage without reading the guide but just by guessing what a trait method should do.

    RaftCore is able to do the job of deleting logs that are included in the state machine, RaftStorage should just do what is asked.

  • Changed: 2cd23a37 use structopt to impl config default values; by drdr xp; 2021-09-14

  • Changed: ac4bf4bd InitialState: rename last_applied_log to last_applied; by drdr xp; 2021-09-14

  • Changed: 74283fda RaftStorage::do_log_compaction() do not need to delete logs any more raft-core will delete them.; by drdr xp; 2021-09-14

  • Changed: 112252b5 RaftStorage add 2 API: last_id_in_log() and last_applied_state(), remove get_last_log_id(); by drdr xp; 2021-09-15

  • Changed: 7f347934 simplify membership change; by drdr xp; 2021-09-16

    • Change: if leadership is lost, the cluster is left with the joint config. One does not receive response of the change-membership request should always re-send to ensure membership config is applied.

    • Change: remove joint-uniform logic from RaftCore, which brings a lot complexity to raft impl. This logic is now done in Raft(which is a shell to control RaftCore).

    • Change: RaftCore.membership is changed to ActiveMembership, which includes a log id and a membership config. Making this change to let raft be able to check if a membership is committed by comparing the log index and its committed index.

    • Change: when adding a existent non-voter, it returns an Ok value instead of an Err.

    • Change: add arg blocking to add_non_voter and change_membership. A blocking change_membership still wait for the two config change log to commit. blocking only indicates if to wait for replication to non-voter to be up to date.

    • Change: remove non_voters. Merge it into nodes. Now both voters and non-voters share the same replication handle.

    • Change: remove field ReplicationState.is_ready_to_join, it can be just calculated when needed.

    • Change: remove is_stepping_down, membership.contains() is quite enough.

    • Change: remove consensus_state.

  • Changed: df684131 bsearch to find matching log between leader and follower; by drdr xp; 2021-12-17

    • Refactor: simplify algo to find matching log between leader and follower. It adopts a binary-search like algo:

      The leader tracks the max matched log id(self.matched) and the least unmatched log id(self.max_possible_matched_index).

      The follower just responds if the prev_log_id in

      AppendEntriesRequest matches the log at prev_log_id.index in its store.

      Remove the case-by-case algo.

    • Change: RaftStorage adds 2 new API: try_get_log_entries(), first_id_in_log() and first_known_log_id().

      These a are not stable, may be removed soon.

    • Fix: the timeout for Wait() should be a total timeout. Otherwise a Wait() never quits.

    • Fix: when send append-entries request, if a log is not found, it should retry loading, but not enter snapshot state. Because a log may be deleted by RaftCore just after Replication read prev_log_id from the store.

    • Refactor: The two replication loop: line-rate loop and snapshot loop should not change the ReplicationState, but instead returning an error. Otherwise it has to check the state everywhere.

    • Refactor: simplify receiving RaftCore messages: split drain_raft_rx() into process_raft_event() and try_drain_raft_rx().

    • Feature: a store impl has to add an initial log at index 0 to make the store mathematics complete.

    • Feature: add ReplicationError to describe all errors that is emitted when replicating entries or snapshot.

  • Changed: 6625484c remove EntryNormal; by drdr xp; 2021-12-23

  • Changed: 61551178 remove EntryMembership; by drdr xp; 2021-12-23

  • Changed: c61b4c49 remove ConflictOpt, which is a wrapper of log_id; add matched log id in AppendEntriesResponse; by drdr xp; 2021-12-23

  • Changed: 3511e439 rename MembershipConfig to Membership; by drdr xp; 2021-12-27

  • Changed: b43c085a track committed log id instead of just a commit index; by drdr xp; 2021-12-29

  • Changed: 8506102f remove unused field SnapshotMeta::membership; by drdr xp; 2021-12-29

Dependency:

Added:

  • Added: 1451f962 Membership provides method is_majority() and simplify quorum calculation for voting; by drdr xp; 2021-12-25

  • Added: a2a48c56 make DefensiveCheck a reuseable trait; by drdr xp; 2021-12-26

    • Defensive checks in MemStore are moved out into a trait DefensiveCheck.

    • Let user impl a base RaftStorage. Then raft wraps it with a StoreExt thus the defensive checks apply to every impl of RaftStorage.

v0.6.2-alpha.16

Changed:

  • Changed: 79a39970 to get last_log and membership, Storage should search for both logs and state machines.; by drdr xp; 2021-08-24

    Why:

    depending on the impl, a RaftStore may have logs that are included in the state machine still present. This may be caused by a non-transactional impl of the store, e.g. installing snapshot and removing logs are not atomic.

    Thus when searching for last_log or last membership, a RaftStore should search for both logs and state machine, and returns the greater one that is found.

    • Test: add test to prove these behaviors, which includes: get_initial_state() and get_membership().

    • Refactor: Make store tests a suite that could be applied to other impl.

  • Changed: 07d71c67 RaftStore::delete_logs_from() use range instead of (start, end); by drdr xp; 2021-08-28

  • Changed: 1c46a712 RaftStore::get_log_entries use range as arg; add try_get_log_entry() that does not return error even when defensive check is on; by drdr xp; 2021-08-28

Added:

  • Added: 420cdd71 add defensive check to MemStore; by drdr xp; 2021-08-28

  • Added: ab6689d9 RaftStore::get_last_log_id() to get the last known log id in log or state machine; by drdr xp; 2021-08-29

Fixed:

  • Fixed: 6d53aa12 too many(50) inconsistent log should not live lock append-entries; by drdr xp; 2021-08-31

    The test to reproduce it fakes a cluster of node 0,1,2: R0 has 100 uncommitted log at term 2. R2 has 100 uncommitted log at term 3.

    R0 ... 2,99 2,100
    R1
    R2 ... 3,99, 3,00
    

    Before this fix, brings up the cluster, R2 becomes leader and will never sync any log to R0.

    The fix is also quite simple:

    • Search backward instead of searching forward, to find the last log entry that matches prev_log_id.term, and responds this log id to the leader to let it send next append_entries RPC since this log id.

    • If no such matching term is found, use the first log id it sees, e.g., the entry at index prev_log_id.index - 50 for next append_entries.

  • Fixed: 9540c904 when append-entries, deleting entries after prev-log-id causes committed entry to be lost; by drdr xp; 2021-08-31

    Problem:

    When append-entries, raft core removes old entries after prev_log_id.index, then append new logs sent from leader.

    Since deleting then appending entries are not atomic(two calls to RaftStore), deleting consistent entries may cause loss of committed entries, if server crashes after the delete.

    E.g., an example cluster state with logs as following and R1 now is the leader:

    R1 1,1  1,2  1,3
    R2 1,1  1,2
    R3
    

    Committed entry {1,2} gets lost after the following steps:

    • R1 to R2: append_entries(entries=[{1,2}, {1,3}], prev_log_id={1,1})
    • R2 deletes 1,2
    • R2 crash
    • R2 is elected as leader with R3, and only see 1,1; the committed entry 1,2 is lost.

    Solution:

    The safe way is to skip every entry that are consistent to the leader. And delete only the inconsistent entries.

    Another issue with this solution is that:

    Because we can not just delete log[prev_log_id.index..], the commit index:

    • must be update only after append-entries,
    • and must point to a log entry that is consistent to leader.

    Or there could be chance applying an uncommitted entry:

    R0 1,1  1,2  3,3
    R1 1,1  1,2  2,3
    R2 1,1  1,2  3,3
    
    • R0 to R1 append_entries: entries=[{1,2}], prev_log_id = {1,1}, commit_index = 3
    • R1 accepted this append_entries request but was not aware of that entry {2,3} is inconsistent to leader. Updating commit index to 3 allows it to apply an uncommitted entries {2,3}.

v0.6.2-alpha.15

Changed:

  • Changed: a1a05bb4 rename Network methods to send_xxx; by drdr xp; 2021-08-23

  • Changed: f168696b rename RaftStorage::Snapshot to RaftStorage::SnapsthoData; by drdr xp; 2021-08-23

  • Changed: fea63b2f rename CurrentSnapshotData to Snapshot; by drdr xp; 2021-08-23

  • Changed: fabf3e74 rename RaftStorage::create_snapshot() to RaftStorage::begin_receiving_snapshot; by drdr xp; 2021-08-23

  • Changed: 90329fbf RaftStorage: merge append_entry_to_log and replicate_to_log into one method append_to_log; by drdr xp; 2021-08-24

  • Changed: daf2ed89 RaftStorage: remove apply_entry_to_state_machine; by drdr xp; 2021-08-24

  • Changed: a18b98f1 SnapshotPointer do not store any info.; by drdr xp; 2021-08-24

    Use SnapshotPointer to store membership is a bad idea. It brings in troubles proving the consistency, e.g.:

    • When concurrent do_log_compaction() is called(it is not possible for now, may be possible in future. The correctness proof involving multiple component is a nightmare.)

    • Proof of correctness of consistency between StateMachine.last_membership and SnapshotPointer.membership is complicated.

    What we need is actually:

    • At least one committed log is left in the log storage,
    • and info in the purged log must be present in the state machine, e.g. membership
  • Changed: 5d0c0b25 rename SnapshotPointer to PurgedMarker; by drdr xp; 2021-08-24

  • Changed: 72b02249 rename replicate_to_state_machine to apply_to_state_machine; by drdr xp; 2021-08-24

v0.6.2-alpha.14

Fixed:

  • Fixed: eee8e534 snapshot replication does not need to send a last 0 size chunk; by drdr xp; 2021-08-22

  • Fixed: 8cd24ba0 RaftCore.entries_cache is inconsistent with storage. removed it.; by drdr xp; 2021-08-23

    • When leader changes, entries_cache is cleared. Thus there may be cached entries wont be applied to state machine.

    • When applying finished, the applied entries are not removed from the cache. Thus there could be entries being applied more than once.

v0.6.2-alpha.13

Fixed:

  • Fixed: 2eccb9e1 install snapshot req with offset GE 0 should not start a new session.; by drdr xp; 2021-08-22

    A install-snapshot always ends with a req with data len to be 0 and offset GE 0. If such a req is re-sent, e.g., when timeout, the receiver will try to install a snapshot with empty data, if it just finished the previous install snapshot req(snapshot_state is None) and do not reject a install snapshot req with offset GE 0. Which results in a fatal storage error, since the storage tries to decode an empty snapshot data.

    • feature: add config install_snapshot_timeout.

v0.6.2-alpha.12

v0.6.2-alpha.11

v0.6.2-alpha.10

Fixed:

  • Fixed: beb0302b leader should not commit when there is no replication to voters.; by drdr xp; 2021-08-18

    When there is no replication to voters but there are replications to non-voters, the leader did not check non-voters for a quorum but just commits a log at once.

    This cause the membership change log from a single node always commits. E.g. start node 0, and non-voter 1, 2; then change_membership({0, 1, 2}), It just commits the joint-log at once. But according to raft paper, it should await a quorum of {0} and a quorum of {0, 1, 2}.

Changed:

  • Changed: 6350514c change-membership should be log driven but not channel driven; by drdr xp; 2021-08-18

    A membership change involves two steps: the joint config phase and the final config phase. Each phase has a corresponding log involved.

    Previously the raft setup several channel to organize this workflow, which makes the logic hard to understand and introduces complexity when restarting or leadership transferred: it needs to re-establish the channels and tasks.

    According to the gist of raft, all workflow should be log driven. Thus the new approach:

    • Write two log(the joint and the final) at once it receives a change-membership request.
    • All following job is done according to just what log is committed.

    This simplifies the workflow and makes it more reliable and intuitive to understand.

    Related changes:

    • When change_membership is called, append 2 logs at once.

    • Introduce universal response channel type to send back a message when some internal task is done: ResponseTx, and a universal response error type: ResponseError.

    • Internal response channel is now an Option<ResponseTx>, since the first step of membership change does not need to respond to the caller.

    • When a new leaser established, if the last log is a joint config log, append a final config log to let the partial change-membership be able to complete.

      And the test is added.

    • Removed membership related channels.

    • Refactor: convert several func from async to sync.

v0.6.2-alpha.9

Changed:

  • Changed: 8b59966d MembershipConfig.member type is changed form HashSet BTreeSet; by drdr xp; 2021-08-17

v0.6.2-alpha.8

Changed:

  • Changed: adc24f55 pass all logs to apply_entry_to_state_machine(), not just Normal logs.; by drdr xp; 2021-08-16

    Pass Entry<D> to apply_entry_to_state_machine(), not just the only EntryPayload::Normal(normal_log).

    Thus the state machine is able to save the membership changes if it prefers to.

    Why:

    In practice, a snapshot contains info about all applied logs, including the membership config log. Before this change, the state machine does not receive any membership log thus when making a snapshot, one needs to walk through all applied logs to get the last membership that is included in state machine.

    By letting the state machine remember the membership log applied, the snapshto creation becomes more convenient and intuitive: it does not need to scan the applied logs any more.

v0.6.2-alpha.7

v0.6.2-alpha.6

Changed:

  • Changed: 82a3f2f9 use LogId to track last applied instead of using just an index.; by drdr xp; 2021-07-19

    It provides more info by Using LogId to track last applied log. E.g. when creating a snapshot, it need to walk through logs to find the term of the last applied log, just like it did in memstore impl.

    Using LogId{term, index} is a more natural way in every aspect.

    changes: RaftCore: change type of last_applied from u64 to LogId.

v0.6.2-alpha.5

Fixed:

  • Fixed: fc8e92a8 typo; by drdr xp; 2021-07-12

  • Fixed: 447dc11c when finalize_snapshot_installation, memstore should not load membership from its old log that are going to be overridden by snapshot.; by drdr xp; 2021-07-13

  • Fixed: dba24036 after 2 log compaction, membership should be able to be extract from prev compaction log; by drdr xp; 2021-07-14

Changed:

  • Changed: 7792cccd add CurrentSnapshotData.meta: SnapshotMeta, which is a container of all meta data of a snapshot: last log id included, membership etc.; by drdr xp; 2021-07-13

  • Changed: 0c870cc1 reduce one unnecessary snapshot serialization; by drdr xp; 2021-07-14

    • Change: get_current_snapshot(): remove double-serialization: convert MemStoreSnapshot to CurrentSnapshotData instead of serializing MemStoreSnapshot:

      Before:

      MemStoreSnapshot.data = serialize(state-machine)
      CurrentSnapshotData.data = serialize(MemStoreSnapshot)
      

      After:

      MemStoreSnapshot.data = serialize(state-machine)
      CurrentSnapshotData.data = MemStoreSnapshot.data
      

      when finalize_snapshot_installation, extract snapshot meta info from InstallSnapshotRequest. Reduce one unnecessary deserialization.

    • Change: InstallSnapshotRequest: merge snapshot_id, last_log_id, membership into one field meta.

    • Refactor: use SnapshotMeta(snapshot_id, last_log_id, membership) as a container of metadata of a snapshot. Reduce parameters.

    • Refactor: remove redundant param delete_through from finalize_snapshot_installation.

v0.6.2-alpha.4

Changed:

  • Changed: 954c67a9 InstallSnapshotRequest: merge last_included{term,index} into last_included; by drdr xp; 2021-07-08

  • Changed: 933e0b32 use snapshot-id to identify a snapshot stream; by drdr xp; 2021-07-09

    A snapshot stream should be identified by some id, since the server end should not assume messages are arrived in the correct order. Without an id, two install_snapshot request belonging to different snapshot data may corrupt the snapshot data, explicitly or even worse, silently.

    • Add SnapshotId to identify a snapshot stream.

    • Add SnapshotSegmentId to identify a segment in a snapshot stream.

    • Add field snapshot_id to snapshot related data structures.

    • Add error RaftError::SnapshotMismatch.

    • Storage::create_snapshot() does not need to return and id. Since the receiving end only keeps one snapshot stream session at most. Instead, Storage::do_log_compaction() should build a unique id everytime it is called.

    • When the raft node receives an install_snapshot request, the id must match to continue. A request with a different id should be rejected. A new id with offset=0 indicates the sender has started a new stream. In this case, the old unfinished stream is dropped and cleaned.

    • Add test for install_snapshot API.

  • Changed: 85859d07 CurrentSnapshotData: merge term and index into included.; by drdr xp; 2021-07-09

  • Changed: 5eb9d3af RaftCore: replace snapshot_index with snapshot_last_included: LogId. Keep tracks of both snapshot last log term and index.; by drdr xp; 2021-07-09

    Also SnapshotUpdate::SnapshotComplete now contains an LogId instead of an u64 index.

  • Changed: 9c5f3d7e RaftCore: merge last_log_{term,index} into last_log: LogId; by drdr xp; 2021-07-09

  • Changed: 58d8e3a2 AppendEntriesRequest: merge prev_log_{term,index} into prev_log: LogId; by drdr xp; 2021-07-10

  • Changed: 9e4fb64f InitialState: last_log_{term,index} into last_log: LogId; by drdr xp; 2021-07-10

  • Changed: 24e38130 Entry: merge term and index to log_id: LogId; by drdr xp; 2021-07-11

Added:

  • Added: 8e0b0df9 report snapshot metrics to RaftMetrics::snapshot, which is a LogId: (term, index) that a snapshot includes; by drdr xp; 2021-07-09

    • Add: Wait.snapshot() to watch snapshot changes.
    • Test: replace sleep() with wait_for_snapshot() to speed up tests.

v0.6.2-alpha.3

Dependency:

  • Dependency: b351c87f upgrade tokio from 1.7 to 1.8; by drdr xp; 2021-07-08

Fixed:

  • Fixed: cf4badd0 leader should re-create and send snapshot when threshold/2 < last_log_index - snapshot < threshold; by drdr xp; 2021-07-08

    The problem:

    If last_log_index advances snapshot.applied_index too many, i.e.: threshold/2 < last_log_index - snapshot < threshold (e.g., 10/2 < 16-10 < 20 in the test that reproduce this bug), the leader tries to re-create a new snapshot. But when last_log_index < threshold, it won't create, which result in a dead loop.

    Solution:

    In such case, force to create a snapshot without considering the threshold.

v0.6.2-alpha.2

Dependency:

  • Dependency: 70e1773e adapt to changes of rand-0.8: gen_range() accepts a range instead of two args; by drdr xp; 2021-06-21

Added:

  • Added: 32a67e22 add metrics about leader; by drdr xp; 2021-06-29

    In LeaderState it also report metrics about the replication to other node when report metrics.

    When switched to other state, LeaderState will be destroyed as long as the cached replication metrics.

    Other state report an None to raft core to override the previous metrics data.

    At some point the raft core, without knonwning the state, just report metrics with an Update::Ignore, to indicate that leave replication metrics intact.

Fixed:

  • Fixed: d60f1e85 client_read has using wrong quorum=majority-1; by drdr xp; 2021-07-02

v0.6.2-alpha.1

Added:

  • Added: 1ad17e8e move wait_for_xxx util into metrics.; by drdr xp; 2021-06-16

    Introduce struct Wait as a wrapper of the metrics channel to impl wait-for utils:

    • log(): wait for log to apply.
    • current_leader(): wait for known leader.
    • state(): wait for the role.
    • members(): wait for membership_config.members.
    • next_members(): wait for membership_config.members_after_consensus.

    E.g.:

    // wait for ever for raft node's current leader to become 3:
    r.wait(None).current_leader(2).await?;

    The timeout is now an option arg to all wait_for_xxx functions in fixtures. wait_for_xxx_timeout are all removed.

v0.6.2-alpha

Added:

  • Added: 3388f1a2 link to discord server.; by Anthony Dodd; 2021-05-21

  • Added: bcc246cc a pull request template.; by Anthony Dodd; 2021-05-26

  • Added: ea539069 wait_for_nodes_log(); by drdr xp; 2021-05-24

  • Added: 668ad478 some wait_for func:; by drdr xp; 2021-05-24

    • wait_for_log()
    • wait_for_log_timeout()
    • wait_for_state()
    • wait_for_state_timeout()

Fixed:

  • Fixed: 89bb48f8 last_applied should be updated only when logs actually applied.; by drdr xp; 2021-05-20

  • Fixed: e9f40450 usage of get_storage_handle; by drdr xp; 2021-05-23

  • Fixed: 22cd1a0c clippy complains; by drdr xp; 2021-05-23

  • Fixed: 6202138f a conflict is expected even when appending empty enties; by drdr xp; 2021-05-24

  • Fixed: f449b64a discarded log in replication_buffer should be finally sent.; by drdr xp; 2021-05-22

    Internally when replication goes to LaggingState(a non-leader lacks a lot logs), the ReplicationCore purges outbound_buffer and replication_buffer and then sends all committed logs found in storage.

    Thus if there are uncommitted logs in replication_buffer, these log will never have chance to be replicated, even when replication goes back to LineRateState. Since LineRateState only replicates logs from ReplicationCore.outbound_buffer and ReplicationCore.replication_buffer.

    This test ensures that when replication goes to LineRateState, it tries to re-send all logs found in storage(including those that are removed from the two buffers.

  • Fixed: 6d680484 #112 : when a follower is removed, leader should stops sending log to it.; by drdr xp; 2021-05-21

    A leader adds all follower replication states to a hashset nodes, when the leader is established. But the leader does not do it when membership changed. Thus when a follower is removed, the leader can not stop replication to it because the follower is not in nodes.

    The solution is to move replication state from non_voters to nodes. So that next time a follower is removed the leader is able to remove the replication from nodes.

  • Fixed: 39690593 a NonVoter should stay as NonVoter instead of Follower after restart; by drdr xp; 2021-05-14

  • Fixed: d882e743 when calc quorum, the non-voter should be count; by drdr xp; 2021-06-02

    Counting only the follower(nodes) as quorum for new config(c1) results in unexpected log commit. E.g.: change from 012 to 234, when 3 and 4 are unreachable, the first log of joint should not be committed.

  • Fixed: a10d9906 when handle_update_match_index(), non-voter should also be considered, because when member change a non-voter is also count as a quorum member; by drdr xp; 2021-06-16

  • Fixed: 11cb5453 doc-include can only be used in nightly build; by drdr xp; 2021-06-16

    • Simplify CI test: test all in one action.

    • Disable clippy: it suggests inappropriate assert_eq to assert conversion which is used in a macro.

    • Add makefile

    • Build only with nightly rust. Add rust-toolchain to specify toolchain version.

Dependency:

  • Dependency: 919d91cb upgrade tokio from 1.0 to 1.7; by drdr xp; 2021-06-16

v0.6.1

async-raft 0.6.1

fixed

  • Fixed #105 where function set_target_state missing else condition.
  • Fixed #106 which ensures that counting of replicas to determine a new commit value only considers entries replicated as part of the current term.
  • Fixed a bug where Learner nodes could be restarted and come back as voting members.

async-raft 0.6.0

The big news for this release is that we are now based on Tokio 1.0! Big shoutout to @xu-cheng for doing all of the heavy lifting for the Tokio 1.0 update, along with many other changes which are part of this release.

It is important to note that 0.6.0 does include two breaking changes from 0.5: the new RaftStorage::ShutdownError associated type, and Tokio 1.0. Both of these changes are purely code related, and it is not expected that they will negatively impact running systems.

changed

  • Updated to Tokio 1.0!
  • BREAKING: this introduces a RaftStorage::ShutdownError associated type. This allows for the Raft system to differentiate between fatal storage errors which should cause the system to shutdown vs errors which should be propagated back to the client for application specific error handling. These changes only apply to the RaftStorage::apply_entry_to_state_machine method.
  • A small change to Raft startup semantics. When a node comes online and successfully recovers state (the node was already part of a cluster), the node will start with a 30 second election timeout, ensuring that it does not disrupt a running cluster.
  • #89 removes the Debug bounds requirement on the AppData & AppDataResponse types.
  • The Raft type can now be cloned. The clone is very cheap and helps to facilitate async workflows while feeding client requests and Raft RPCs into the Raft instance.
  • The Raft.shutdown interface has been changed slightly. Instead of returning a JoinHandle, the method is now async and simply returns a result.
  • The ClientWriteError::ForwardToLeader error variant has been modified slightly. It now exposes the data (generic type D of the type) of the original client request directly. This ensures that the data can actually be used for forwarding, if that is what the parent app wants to do.
  • Implemented #12. This is a pretty old issue and a pretty solid optimization. The previous implementation of this algorithm would go to storage (typically disk) for every process of replicating entries to the state machine. Now, we are caching entries as they come in from the leader, and using only the cache as the source of data. There are a few simple measures needed to ensure this is correct, as the leader entry replication protocol takes care of most of the work for us in this case.
  • Updated / clarified the interface for log compaction. See the guide or the updated do_log_compaction method docs for more details.

added

  • #97 adds the new Raft.current_leader method. This is a convenience method which builds upon the Raft metrics system to quickly and easily identify the current cluster leader.

fixed

  • Fixed #98 where heartbeats were being passed along into the log consistency check algorithm. This had the potential to cause a Raft node to go into shutdown under some circumstances.
  • Fixed a bug where the timestamp of the last received heartbeat from a leader was not being stored, resulting in degraded cluster stability under some circumstances.

memstore 0.2.0

changed

  • Updated async-raft dependency to 0.6.0 & updated storage interface as needed.

fixed

  • Fixed #76 by moving the process of replicating log entries to the state machine off of the main task. This ensures that the process never blocks the main task. This also includes a few nice optimizations mentioned below.

0.5.5

changed

  • Added #[derive(Serialize, Deserialize)] to RaftMetrics, State.

0.5.4

fixed

  • Fixed #82 where client reads were not behaving correctly for single node clusters. Single node integration tests have been updated to ensure this functionality is working as needed.

0.5.3

fixed

  • Fixed #79 ... for real this time! Add an integration test to prove it.

0.5.2

fixed

  • Fixed #79. The Raft core state machine was not being properly updated in response to shutdown requests. That has been addressed and shutdowns are now behaving as expected.

0.5.1

changed

  • ChangeMembershipError::NodeNotLeader now returns the ID of the current cluster leader if known.
  • Fix off-by-one error in get_log_entries during the replication process.
  • Added #[derive(Serialize, Deserialize)] to Config, ConfigBuilder & SnapshotPolicy.

0.5.0

changed

The only thing which hasn't changed is that this crate is still an implementation of the Raft protocol. Pretty much everything else has changed.

  • Everything is built directly on Tokio now.
  • The guide has been updated.
  • Docs have been updated.
  • The Raft type is now the primary API of this crate, and is a simple struct with a few public methods.
  • Lots of fixes to the implementation of the protocol, ranging from subtle issues in joint consensus to non-voter syncing.

0.4.4

  • Implemented Error for config::ConfigError

0.4.3

Added a few convenience derivations.

  • Derive Eq on messages::MembershipConfig.
  • Derive Eq on metrics::State.
  • Derive PartialEq & Eq on metrics::RaftMetrics.
  • Update development dependencies.
  • Fixed bug #41 where nodes were not starting a new election timeout task after comign down from leader state. Thanks @lionesswardrobe for the report!

0.4.2

A few QOL improvements.

  • Fixed an issue where the value for current_leader was not being set to None when becoming a candidate. This isn't really a bug per se, as no functionality depended on this value as far as Raft is concerned, but it is an issue that impacts the metrics system. This value is now being updated properly.
  • Made the messages::ClientPayload::new_base constructor pub(crate) instead of pub, which is what the intention was originally, but I was apparently tired :).
  • Implemented #25. Implementing Display+Error for the admin error types.

0.4.1

A few bug fixes.

  • Fixed an issue where a node in a single-node Raft was not resuming as leader after a crash.
  • Fixed an issue where hard state was not being saved after a node becomes leader in a single-node Raft.
  • Fixed an issue where the client request pipeline (a Stream with the actix::StreamFinish) was being closed after an error was returned during processing of client requests (which should not cause the stream to close). This was unexpected and undocumented behavior, very simple fix though.

0.4.0

This changeset introduces a new AppDataResponse type which represents a concrete data type which must be sent back from the RaftStorage impl from the ApplyEntryToStateMachine handler. This provides a more direct path for returning application level data from the storage impl. Often times this is needed for responding to client requests in a timely / efficient manner.

  • AppDataResponse type has been added (see above).
  • A few handlers have been updated in the RaftStorage type. The handlers are now separated based on where they are invoked from the Raft node. The three changed handlers are:
    • AppendEntryToLog: this is the same. It is the initial step of handling client requests to apply an entry to the log. This is still where application level errors may be safely returned to the client.
    • ReplicateToLog: this is for replicating entries to the log. This is part of the replication process.
    • ApplyEntryToStateMachine: this is for applying an entry to the state machine as the final part of a client request. This is where the new AddDataResponse type must be returned.
    • ReplicateToStateMachine: this is for replicating entries to the state machine. This is part of the replication process.

0.3.1

Overhauled the election timeout mechanism. This uses an interval job instead of juggling a rescheduling processes. Seems to offer quite a lot more stability. Along with the interval job, we are using std::time::Instants for performing the comparisons against the last received heartbeat.

0.3.0

Another backwards incompatible change to the RaftStorage trait. It is now using associated types to better express the needed trait constraints. These changes were the final bit of work needed to get the entire actix-raft system to work with a Synchronous RaftStorage impl. Async impls continue to work as they have, the RaftStorage impl block will need to be updated to use the associated types though. The recommend pattern is as follows:

impl RaftStorage<..., ...> for MyStorage {
    type Actor = Self;
    type Context = Context<Self>; // Or SyncContext<Self>;
}

My hope is that this will be the last backwards incompatible change needed before a 1.0 release. This crate is still young though, so we will see.

0.2.0

  • Made a few backwards incompatible changes to the RaftStorage trait. Overwrite its third type parameter with actix::SyncContext<Self> to enable sync storage.
  • Also removed the RaftStorage::new constructor, as it is a bit restrictive. Just added some docs instead describing what is needed.

0.1.3

  • Added a few addition top-level exports for convenience.

0.1.2

  • Changes to the README for docs.rs.

0.1.1

  • Changes to the README for docs.rs.

0.1.0

  • Initial release!