Define an official performance validation suite for etcd #16467

jmhbnz · 2023-08-24T07:23:10Z

What would you like to be added?

The current performance validation process for etcd relies heavily on the Kubernetes scalability tests. While this approach has been valuable we need to create an official performance validation for etcd that is maintained within the project and therefore more accessible and integrated into regular project activity.

In my mind this will include developing a comprehensive suite of performance tests that cover various real-world usage scenarios. Integrating these tests into some form of on demand or scheduled etcd ci pipeline and making this accessible to work undertaken, for example ensure a pull request proposing upgrading a golang version can be validated for any performance regressions.

With this issue I would like to capture recent discussion in #16463 (comment) and the intent that we progress creating an independent and dedicated performance validation mechanism for etcd and ensure we do not lose sight of this work. We can use this issue to track any ideas and further conversation before starting any work.

References:

Why is this needed?

Reduce reliance on external testing suites that are less accessible.
Establish an official project perspective on performance.
Create the mechanism to track and drive future performance improvements.
Reduce cognitive burden for future etcd contributors and maintainers

Sub task tracking

The text was updated successfully, but these errors were encountered:

serathius · 2023-08-24T09:22:42Z

Talked with @mborsz who is member of Kubernetes SIG scalability about how we should approach performance testing of etcd. We came to conclusion that we need 3 things:

etcd SLIs - important dimensions that we want to measure performance and prevent regressions of. It's that we have dedicated benchmark scenarios to each analyse independent dimensions as it's easier to reason and analyse the results. For that we can cherry pick existing etcd benchmarks. We can use current performance as baseline
Reproducibility - benchmarks results need to be repeatable. We need to run them in the same environment. Benchmark execution should happen not locally, but remotely on dedicated machine. Best case we have the benchmarks periodically on large github runner. We should also try to reduce the noise, so each scenario we should not run benchmark sequentially, but on separate machine, so we avoid bursts impacting results. Of course cloud VMs don't have the most stable performance, however it should suffice for now.
Visualization - To spot regressions we need to be able to observe trends and compare performance. Aside of per result reports, we should have a dashboard that aggregates results. At Google we use internal version of https://github.com/google/mako which is great, unfortunately looks like project has been archived. Kubernetes uses http://perf-dash.k8s.io/ which is pretty limited and will require code changes to support etcd. Please let me know if you have better suggestions.

Based on above points the work is:

Propose list of benchmarks to run
Setup a periodic job that executes the benchmark
Pick a dashboard for visualization and integrate benchmark reports.

geetasg · 2023-08-31T20:56:06Z

should the etcd SLIs be part of the contract ? Ref: https://docs.google.com/document/d/1NUZDiJeiIH5vo_FMaTWf0JtrQKCx0kpEaIIuPoj9P6A/edit#heading=h.tlkin1a8b8bl?

jmhbnz · 2023-09-06T20:06:55Z

should the etcd SLIs be part of the contract ? Ref: https://docs.google.com/document/d/1NUZDiJeiIH5vo_FMaTWf0JtrQKCx0kpEaIIuPoj9P6A/edit#heading=h.tlkin1a8b8bl?

Potentially - Let's try and get some SLI's proposed initially and see how they fit in relation to the current contract? I have been meaning to sit down and list out potential SLI's here we can cherry pick from, feel free to do that same 🙏🏻

jmhbnz · 2023-11-09T15:11:37Z

Recording a discussion during kubecon na - Along with identify service level indicators as a starting point for this work we can also take lessons from kubernetes sig-scale to identify a set of dimensions that our new performance validation suite will have an envelope within: https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md

We can review the older benchmark tooling to get a starting point on dimensions and iterate from there.

chaochn47 · 2024-03-18T21:40:32Z

Expect the performance test suite should help detect/prevent #17529 or in the robustness test kubernetes traffic.

Do we think there is a gap in general on performance testing? I can help addressing it.

@jmhbnz @serathius @ahrtr

jmhbnz · 2024-04-03T18:27:03Z

Expect the performance test suite should help detect/prevent #17529 or in the robustness test kubernetes traffic.

Do we think there is a gap in general on performance testing? I can help addressing it.

Thanks @chaochn47 - Yes my expectations from updated performance validation suite once complete is we can catch issues like the one linked earlier. @ivanvc is currently getting some basic prow jobs running that will be running some existing tools like tools/benchmark and tools/rw-heatmaps. We will need to think about if any additional tooling, or further updates to that existing tooling are required. If you have any ideas on that please feel free to draft an feature issue so we can discuss 🙏🏻

serathius · 2024-04-18T07:25:19Z

Expect the performance test suite should help detect/prevent #17529 or in the robustness test kubernetes traffic.

Do we think there is a gap in general on performance testing? I can help addressing it.

Don't think so, performance and correctness are pretty different beast that needs different approaches. Checking correctness requires a lot of overhead to check it, while performance measuring wants as little noise as possible to provide reproducible results.

What failed in #17529 was an unknown throughput breaking point that was hiding a correctness issue under it. I think we can use performance testing to discover more of such breaking points, and then try to simulate them during correctness testing. This was already done in the e2e test that you provided in #17555. Failpoint beforeSendWatchResponse can be used to simulate slow response writing, which can simulate the same performance breaking point. Please see https://github.com/etcd-io/etcd/pull/17680/files where I managed to reproduce the issue using the breaking point.

jmhbnz · 2024-06-02T06:32:31Z

Hi Team - @ivanvc and I would like to propose the first service level indicator. We are keen for your feedback on this first one before we move on to proposing additional.

Latency of processing mutating API calls, measured as 99th percentile over last 5 minutes

Mutating calls being put or del. This is an etcd iteration on the first entry in https://github.com/kubernetes/community/blob/master/sig-scalability/slos/api_call_latency.md.

Please let us know what you think. If this first SLI is accepted we will be updating tools/benchmark and/or tools/rw-heatmaps as required to support measurement of it and enable a formal SLO to be created in future.

chaochn47 · 2024-06-02T18:24:50Z

If we are intended to optimize etcd performance in kubernetes, IMHO we should generate k8s like traffic.

For example, rw-heatmap tool uses mixed read-only and write-only transactions, which does not have watch traffic simulated. Hopefully it is already in the roadmap.

serathius · 2024-06-03T08:46:32Z

If we are intended to optimize etcd performance in kubernetes, IMHO we should generate k8s like traffic.

We need both. This issue is important, but not getting enough attention. Unfortunately I don't have enough time to lead this, Is there someone that could work on this with my guidance?

chaochn47 · 2024-06-04T17:44:19Z

/assign

I can help since recently I am looking into etcd performance aspect.

jmhbnz · 2024-06-04T20:04:10Z

@serathius, @chaochn47 - Please let us know if the first etcd SLI drafted above looks ok. Agree watch is critical, there should be an SLI relating to this also. We intend to work iteratively to propose a larger table of SLI's as the k8s project have done.

serathius · 2024-06-04T20:44:35Z

@chaochn47 Can you start from creating a document where we can start discussing the SLIs? Maybe just copy K8s SLIs that make sense for etcd and we can iterate on that.

chaochn47 · 2024-06-05T03:27:29Z

@serathius This is the bare minimum doc etcd performance work stream that created from my head. I would fill in more details and PoC soon.

marseel · 2024-06-20T10:29:24Z

Visualization - To spot regressions we need to be able to observe trends and compare performance. Aside of per result reports, we should have a dashboard that aggregates results. At Google we use internal version of https://github.com/google/mako which is great, unfortunately looks like project has been archived. Kubernetes uses http://perf-dash.k8s.io/ which is pretty limited and will require code changes to support etcd. Please let me know if you have better suggestions.

For perfdash, I can offer guidance. Should be fairly straight-forward, essentially benchmark needs to output data in specific json format and copy it to GS bucket.

As mentioned in one of the comments in the doc above etcd performance work stream [also it would be great to make that doc public. ]:

I think that just replicating kube-apiserver access pattern in benchmarking is not enough. kube-apiserver access pattern to Etcd is very specific, with a very few clients connecting. Also, Kubernetes itself has some workarounds, due to poor performance of some of the features of Etcd. For example, looking at the events in Kubernetes:

So currently Kubernetes creates 1 lease per 1 minute window or 1k keys attached as a workaround results. I'm suspecting that most of that workaround was due to issue and probably poor performance of cleaning up expired leases.

Similarily, recently I chated with Marek about range request performance with limit. Currently, range request with a limit have linear performance (linear to number of keys in range, not limit) as it's counting all keys within range, I don't think anyone would expect range requests with limit to have such degraded performance.

Also, Etcd benchmarking doc mentions O(100k) watchers, which would never be replicated with k8s-based access pattern. While that benchmark is nice, AFAIK it doesn't capture the fact that usually client needs to make rage-request first, before establishing a watch.

SLI mentioned in the doc above:

- Latency of processing mutating API calls for single key, measured as 99th percentile over last 5 minutes
- Latency of processing non-streaming read-only API calls, measured as 99th percentile over last 5 minutes
- Watch latency for a key prefix (from the moment when object is stored in database to when it's ready to be sent to a dedicated watcher), measured as 99th percentile over last 5 minutes
- Grant / Revoke Lease latency

make sense, but only in the context of limits similar to Kubernetes limits
For example:

For mutating API calls, what is the qps of requests, how many watchers are watching these changes
non-streaming read-only API calls, how many concurrent reads can happen, how many keys are being read
granting/revoking leases - to how many keys lease is attached.

To spice things up, in Cilium we also do use Etcd quite heavily. Usually, we have O(5k) clients watching O(100k) keys with a very small values - O(100 bytes) as compared to k8s that uses much larger values.
We hit both of the issues mentioned above (large leases & linear rage requests), which fixing would probably benefit Kubernetes too.
We can provide input from Cilium perspective what SLI / limits we would expect from Cilum perspective and help with validating results later on (/cc @giorio94 )

giorio94 · 2024-07-29T10:23:22Z

We can provide input from Cilium perspective what SLI / limits we would expect from Cilum perspective and help with validating results later on.

Sorry for the delay. In this respect, I've started prepared an initial document summarizing the scale/performance aspects and SLIs/SLOs from the Cilium perspective. Feel free to ask for any further question/clarification.

serathius · 2024-07-30T08:50:30Z

Thanks @giorio94, very detailed and throughout work. We will definitely include it in etcd SLIs. Is the goal of this document just to clarify Cillium requirements or is there any intention to help etcd qualify it?

giorio94 · 2024-07-30T14:13:17Z

Thanks!

Is the goal of this document just to clarify Cillium requirements or is there any intention to help etcd qualify it?

I'm personally happy to help with the definition of the benchmark suite, although I don't have a lot of context on the etcd internals. Just a note that I'll be out of office for the next couple of weeks.

jmhbnz added area/performance type/feature labels Aug 24, 2023

jmhbnz mentioned this issue Aug 24, 2023

Document project golang version policy #16463

Merged

jmhbnz added the area/tooling label Aug 24, 2023

ahrtr mentioned this issue Aug 31, 2023

server: optimizing memory overhead of copy operation in ConcurrentReadTxn #16508

Merged

chaochn47 mentioned this issue Sep 26, 2023

Hook up etcd to kubernetes' prow infrastructure to run CI jobs #16653

Closed

serathius mentioned this issue Sep 28, 2023

Proposal to mark etcd-io/dbtester as public archive etcd-io/dbtester#367

Open

serathius mentioned this issue Oct 27, 2023

Handle Extremely High Throughput by holding back requests to etcd until the throughput decreases. #16837

Open

jmhbnz self-assigned this Dec 30, 2023

jmhbnz mentioned this issue Feb 11, 2024

Re-write rw-heatmaps tool in golang #17404

Closed

jmhbnz added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Feb 11, 2024

jmhbnz pinned this issue Apr 3, 2024

jmhbnz assigned ivanvc Apr 3, 2024

ivanvc mentioned this issue Apr 8, 2024

Add Prow job to run periodic performance benchmarks #17754

Closed

k8s-ci-robot assigned chaochn47 Jun 4, 2024

jmhbnz removed their assignment Jun 4, 2024

ivanvc removed their assignment Jun 10, 2024

serathius unpinned this issue Jun 11, 2024

serathius mentioned this issue Jul 3, 2024

etcdserver: separate "raft log compact" from snapshot #18235

Closed

ivanvc mentioned this issue Sep 5, 2024

etcd: remove performance ratio periodic jobs kubernetes/test-infra#33425

Merged

serathius mentioned this issue Sep 19, 2024

Reduce memory usage of etcd member catchup mechanism #17098

Open

ahrtr pinned this issue Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define an official performance validation suite for etcd #16467

Define an official performance validation suite for etcd #16467

jmhbnz commented Aug 24, 2023 •

edited

Loading

serathius commented Aug 24, 2023 •

edited

Loading

geetasg commented Aug 31, 2023

jmhbnz commented Sep 6, 2023

jmhbnz commented Nov 9, 2023

chaochn47 commented Mar 18, 2024

jmhbnz commented Apr 3, 2024

serathius commented Apr 18, 2024 •

edited

Loading

jmhbnz commented Jun 2, 2024

chaochn47 commented Jun 2, 2024

serathius commented Jun 3, 2024

chaochn47 commented Jun 4, 2024

jmhbnz commented Jun 4, 2024

serathius commented Jun 4, 2024

chaochn47 commented Jun 5, 2024

marseel commented Jun 20, 2024

giorio94 commented Jul 29, 2024

serathius commented Jul 30, 2024

giorio94 commented Jul 30, 2024

Define an official performance validation suite for etcd #16467

Define an official performance validation suite for etcd #16467

Comments

jmhbnz commented Aug 24, 2023 • edited Loading

What would you like to be added?

Why is this needed?

Sub task tracking

serathius commented Aug 24, 2023 • edited Loading

geetasg commented Aug 31, 2023

jmhbnz commented Sep 6, 2023

jmhbnz commented Nov 9, 2023

chaochn47 commented Mar 18, 2024

jmhbnz commented Apr 3, 2024

serathius commented Apr 18, 2024 • edited Loading

jmhbnz commented Jun 2, 2024

chaochn47 commented Jun 2, 2024

serathius commented Jun 3, 2024

chaochn47 commented Jun 4, 2024

jmhbnz commented Jun 4, 2024

serathius commented Jun 4, 2024

chaochn47 commented Jun 5, 2024

marseel commented Jun 20, 2024

giorio94 commented Jul 29, 2024

serathius commented Jul 30, 2024

giorio94 commented Jul 30, 2024

jmhbnz commented Aug 24, 2023 •

edited

Loading

serathius commented Aug 24, 2023 •

edited

Loading

serathius commented Apr 18, 2024 •

edited

Loading