Use size of events to batch Watch responses #18975

serathius · 2024-12-01T13:39:28Z

Part of #16839
Instead of batching by number of revisions (do not understand it's purpose), do batching by size of events to avoid resync generating huge responses. For example Watch opening on very old revision (like 5 minutes behind within but still compaction window), resync will load the whole history and try to push it in one response. If watch is congested or client is not reading this can cause a huge spikes in memory.

Implements #18965 without needing to reconfigure clients.

This is basically implementing fragment watch feature, without needing to configure it on client side. We still need to maintain watch Atomic property, so we cannot split revision between responses. This is simple, if we start an a revision we continue it even if we are over the limit.

Proposing to use the same response size as used by fragment, the size of request + grpc overhead (2MB). As we cannot split revisions, without fragmenting the maximum response size is 2x that (4MB).

Patch	object size [KB]	watchers	Throughput (events/s)	memory [MB]	50%ile	90%ile	99%ile
Base	1000	10	1051.6745	4411.4	0.004	0.005	0.010
Base	1000	20	2217.3058	2317.1	0.238	0.593	0.670
Base	1000	100	2212.7293	5031.7	3.651	7.369	7.700
Base	100	1000	19027.3147	1753.3	5.011	9.219	10.12
Base	20	1000	109533.6274	189.7	0.132	0.765	1.582
Base	10	1000	104001.1763	102.7	0.084	0.151	0.506
Batch by size	1000	10	1051.9756 (~%)	4383.0 (~%)	0.004	0.005	0.012
Batch by size	1000	20	2218.4787 (~%)	2297.5 (~%)	0.148	0.526	0.595
Batch by size	1000	100	2426.7258 (+10%)	1796.9 (-65%)	3.549	6.236	6.851
Batch by size	100	1000	28430.9434 (+50%)	1116.9 (-36%)	3.356	5.765	6.365
Batch by size	20	1000	109291.3042 (~%)	173.9 (-8%)	0.121	0.680	1.45
Batch by size	10	1000	104132.5942 (~%)	103.1 (~%)	0.082	0.146	0.471

For etcd overload scenario (>=100KB object size) it significantly reduced memory use (5GB -> 1.8GB), while improving latency (7.7s->6.9s). For all other scenarios it maintains similar memory and latency.

Tested using go run ./tools/benchmark/main.go watch-latency --watch-per-stream 1000 --streams 1 --put-total 200 --val-size 100000

k8s-ci-robot · 2024-12-01T13:39:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [serathius]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

codecov · 2024-12-01T13:57:15Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.76%. Comparing base (b038739) to head (3231b06).

Additional details and impacted files

Files with missing lines	Coverage Δ
server/config/config.go	`80.23% <100.00%> (ø)`
server/etcdserver/api/v3rpc/grpc.go	`100.00% <100.00%> (ø)`
server/etcdserver/api/v3rpc/watch.go	`84.06% <100.00%> (+0.93%)`	⬆️
server/etcdserver/server.go	`81.86% <100.00%> (-0.34%)`	⬇️
server/storage/mvcc/watchable_store.go	`94.80% <100.00%> (+1.85%)`	⬆️
server/storage/mvcc/watcher_group.go	`90.90% <100.00%> (+0.46%)`	⬆️

... and 17 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #18975      +/-   ##
==========================================
- Coverage   68.82%   68.76%   -0.07%     
==========================================
  Files         420      420              
  Lines       35621    35627       +6     
==========================================
- Hits        24517    24498      -19     
- Misses       9683     9700      +17     
- Partials     1421     1429       +8

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b038739...3231b06. Read the comment docs.

serathius · 2024-12-01T15:56:33Z

cc @ahrtr @jmhbnz

serathius · 2024-12-01T20:03:40Z

Spitted #18976 out

serathius · 2024-12-02T14:04:22Z

/retest

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>

… fit under size Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>

k8s-ci-robot added approved size/L labels Dec 1, 2024

k8s-ci-robot added the area/testing label Dec 1, 2024

serathius force-pushed the watch-batch-size branch from bfb3026 to fa2c9c0 Compare December 1, 2024 15:53

serathius mentioned this pull request Dec 1, 2024

Watch sync refactor #18976

Merged

serathius force-pushed the watch-batch-size branch from fa2c9c0 to 2ac6613 Compare December 1, 2024 17:33

serathius marked this pull request as draft December 1, 2024 18:07

k8s-ci-robot added the do-not-merge/work-in-progress label Dec 1, 2024

serathius force-pushed the watch-batch-size branch 2 times, most recently from b4506a9 to e2d2e95 Compare December 1, 2024 20:02

serathius changed the title ~~Use size of events to batch responses~~ Use size of events to batch Watch responses Dec 2, 2024

serathius force-pushed the watch-batch-size branch 2 times, most recently from ad9b317 to 9a2fd06 Compare December 2, 2024 13:27

serathius requested a review from ahrtr December 2, 2024 13:28

serathius marked this pull request as ready for review December 2, 2024 13:28

k8s-ci-robot removed the do-not-merge/work-in-progress label Dec 2, 2024

serathius mentioned this pull request Dec 2, 2024

Watch starvation can cause OOMs #16839

Open

k8s-ci-robot added the needs-rebase label Dec 2, 2024

serathius added 3 commits December 3, 2024 09:13

Implement batching by event size

90b2c4c

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>

Use maxRequestSizeBytes for batch limiting

cc64d28

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>

Expect watch fragment test to allways pass as responses should always…

7e00cc2

… fit under size Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>

serathius force-pushed the watch-batch-size branch from 9a2fd06 to 7e00cc2 Compare December 3, 2024 08:14

k8s-ci-robot removed the needs-rebase label Dec 3, 2024

serathius marked this pull request as draft December 3, 2024 09:14

k8s-ci-robot added the do-not-merge/work-in-progress label Dec 3, 2024

Speedup resync loop if number of unsynced watchers doesn't go down

3231b06

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use size of events to batch Watch responses #18975

Use size of events to batch Watch responses #18975

serathius commented Dec 1, 2024 •

edited

Loading

k8s-ci-robot commented Dec 1, 2024

codecov bot commented Dec 1, 2024 •

edited

Loading

serathius commented Dec 1, 2024

serathius commented Dec 1, 2024

serathius commented Dec 2, 2024

Use size of events to batch Watch responses #18975

Are you sure you want to change the base?

Use size of events to batch Watch responses #18975

Conversation

serathius commented Dec 1, 2024 • edited Loading

k8s-ci-robot commented Dec 1, 2024

codecov bot commented Dec 1, 2024 • edited Loading

Codecov Report

serathius commented Dec 1, 2024

serathius commented Dec 1, 2024

serathius commented Dec 2, 2024

serathius commented Dec 1, 2024 •

edited

Loading

codecov bot commented Dec 1, 2024 •

edited

Loading