-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OnDiskSTM: Inconsistent OnDiskIndex during node start will silently be ignored #156
Comments
@stffabi I am not sure what is the inconsistency here. The situation you described above is legitimate - let's say your IOnDiskStateMachine have all entries up to index 33 applied, that index 33 value is recorded by your Update() method to reflect such progress. In this case, for all __regular user proposals __ applied into the on disk SM, 33 is the highest index value. After that index 33 regular user proposal entry, there might be many other Raft entries that are not regular user proposals as well, e.g. config change entries and noop entries caused by new leaders, those entries do not need to be handled by the on disk SM's Update() method but still need to be included in the snapshot as well. For the situation you described above, the real question is whether it is possible to have regular user proposals with index value [34, 50]. Please have a look at the concurrentSave() method in internal/rsm/statemachine.go, it is not possible. When your snapshot index is 50, it means prepare() returned a SSMeta with index = 50 when the state machine is locked. A sync() is issued after prepare() to make sure all IOnDiskStateMachine state is persistent. If during recover the on disk SM reports its index as 33, it show no regular entries between [34, 50]. |
@lni thanks for your reply. I totally agree with you that this is legitimate iff there were no user proposals between index values [34, 50]. But in my case those indexes included user proposals. I also agree with you that if everything works fine such an inconsistency could not happen. My issue was not directed to any inconsistency caused by dragonboat itself but if it was caused by something outside of dragonboat it wont be detected by dragonboat. For example if the IOnDiskStateMachine was not correct in the sense that Sync() was not correctly implemented or if the filesystem on which the IOnDiskStateMachine works had any problem with the fsync() triggered in the Sync(). In such cases it might happen that the SSMeta has recorded index=50 and also record OnDiskIndex=50 because the last user proposal's index was also 50, but during the next start onDiskInitIndex as returned by the Open from IOnDiskStateMachine would return some value before 50. As far as I can tell from the code (maybe I'm overlooking something) this condition is not detected (at least for a dummy snapshot) by dragonboat. Isn't that an invariant, that the OnDiskIndex of a snapshot which gets applied during startup must always be smaller or equal to the onDisk(Init)Index? If this invariant is hurt, something went somewhere terribly wrong and for sake of robustness I would love to see this gets detected by dragonboat. |
@stffabi thanks for the input. I agree that it will be useful to check the OnDiskIndex of the snapshot and the onDiskInitIndex value of the SM when restarting an on disk state machine. Maybe something like the following in the recover() method in internal/rsm/statemachine.go -
Would you like to send in a PR to give it a try? Thanks. |
Awesome, yes I would be interested in giving a PR a try. Unfortunately it would take me 2-3 days until I have time to look into it in detail. I'll create the PR as soon as I'm ready. Thanks again. |
Sounds great. |
I've created a draft PR where we could discuss the changes. |
If an OnDiskSTM returns during the open an old index (let's say index=33) and the latest snapshot is on a newer index e.g. 50, this inconsistency will silently be ignored and assumed the OnDiskSTM is on 50. Therefore newer entries (e.g. with index=51, 52,...) will be recovered as it is assumed the STM is on index 50.
The node will just start fine and begin working with the old OnDiskSTM and and applying newer indexes. Which results in a gab of data.
It seems like one of the problems is that a dummy snapshot is just applied without any checks here:
dragonboat/internal/rsm/statemachine.go
Lines 330 to 333 in 6dcfdf3
Dragonboat version
3.3.0
Expected behavior
Starting of the node should fail, because the OnDiskSTM is on an older index from which a recovery is not possible anymore.
Actual behavior
Inconsistent OnDisk index is ignored and normal operation will start with a gap on the on disk data.
Steps to reproduce the behavior
Use OnDisk Example adjust to code to a one node cluster only and put key values until at least one snapshot has been created. Afterwards adjust the code to return index 1 on opening. Then start the node again and see that the node starts just fine.
The text was updated successfully, but these errors were encountered: