kubelet parameter(eviction-max-pod-grace-period ), not work as expected like officical comment. #118172

rainbowBPF2 · 2023-05-22T11:54:19Z

What happened?

Hey brothers , feedback a confusing point about kubelet soft eviction:

Just like kubelet --help show:
--eviction-max-pod-grace-period int32 Maximum allowed grace period (in seconds) to use when terminating pods in response to a soft eviction threshold being met. If negative, defer to pod specified value.

I guess if this parameter is set to negative ,such as -1, soft eviction would use pod specified value(TerminationGracePeriodSeconds),

But When I try to evict pod by creating node pressure(such as memory pressure), I found it's always -1 send to CRI runtime.
Then pod container stopped immediately with sigkill, exit 137.

In short， this parameter：
（1）set as active number：work as expected；
（2）no set， kubelet read its default number：0， work as expected；
（3）set as negative number: not work as kubelet help show.

Thanks for your response. May it be a bug ?

What did you expect to happen?

--eviction-max-pod-grace-period set to negative, then kubelet syncPod logic send pod‘s TerminationGracePeriodSeconds as gracePeriod to CRI runtime to stop container.

How can we reproduce it (as minimally and precisely as possible)?

(1) set this parameter to a negative number, such as -1;
(2) config kubelet soft eviction ;
(3) running a pod asking a lot memory , to trigger node memory pressure;
(4) check CRI runtime log, such as containerd ;
(5) then you will find timeout parameter -1 sent to pod container , then container exit with code 137(sigkill);

Anything else we need to know?

I running these tests on Tencent Cloud TKE.

And I thought it seem not related to Cloud Service Provider.

Kubernetes version

Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.3-tke.34", GitCommit:"61cc96d2f7e9277e89e29b4b04d045f27d6e75df", GitTreeState:"clean", BuildDate:"2023-03-08T07:40:04Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.3-tke.32", GitCommit:"8ca5807ae095365d68754a2ba20b10bbe5a8998c", GitTreeState:"clean", BuildDate:"2022-10-08T04:00:33Z", GoVersion:"go1.12.14", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

Tencent Cloud TKE

OS version

AME="Ubuntu" VERSION="18.04.4 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.4 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic

Install tools

Container runtime (CRI) and version (if applicable)

containerd --version
containerd github.com/containerd/containerd v1.4.3-tke.2 a11500cbfa0b4d2fc9b905e03c35f349ef5b1a9f

Related plugins (CNI, CSI, ...) and versions (if applicable)

tzneal · 2023-05-22T13:39:32Z

/sig node

SergeyKanzhelev · 2023-05-24T17:18:49Z

/triage accepted

The question is whether we fix code and make the backward incompatible change or fix a doc. I'd vote for a doc change which is less work. But blocks an interesting scenarios.

@rainbowBPF2 were you looking for the described behavior? Does your scenario requires this?

/priority important-longterm

rainbowBPF2 · 2023-05-25T06:43:10Z

@SergeyKanzhelev

Thanks for brothers‘ attention on this point.

For daily business running, temporarily no bad effects, and seems not required. Doc change is also a choice.
But look at it in the long run, it will be an amazing/exciting feature.
Because we have a more reasonable choice to terminate pods in case of soft-eviction, to realize Pod-Graceful-Shutdown

In short，no required but deep expected, 😄。

fgksgf · 2023-05-25T15:13:10Z

/assign

k8s-triage-robot · 2024-05-24T15:18:15Z

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

k8s-triage-robot · 2024-08-22T15:55:32Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-09-21T16:56:21Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

haircommander · 2024-10-23T17:42:45Z

/triage accepted
/kind documentation
/priority backlog

haircommander · 2024-10-23T17:43:03Z

/remove-lifecycle rotten

SergeyKanzhelev · 2024-10-23T18:21:44Z

/help
/good-first-issue

At this stage I doubt we will be fixing the inconsistency by changing code and need to update the field documentation instead.

If you want to take this issue:

List set of steps you performed with all the commands output to reproduce this inconsistency
Send a PR to update the description of this field.
If it is defined in both - command line and kubelet config, make sure both are tested and updated

k8s-ci-robot · 2024-10-23T18:21:47Z

@SergeyKanzhelev:
This request has been marked as suitable for new contributors.

Guidelines

Please ensure that the issue body includes answers to the following questions:

Why are we solving this issue?
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
Does this issue have zero to low barrier of entry?
How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.

In response to this:

/help
/good-first-issue

At this stage I doubt we will be fixing the inconsistency by changing code and need to update the field documentation instead.

If you want to take this issue:

List set of steps you performed with all the commands output to reproduce this inconsistency

Send a PR to update the description of this field.

If it is defined in both - command line and kubelet config, make sure both are tested and updated

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ArpanSolanki29 · 2024-11-01T16:44:15Z

/Assign

anneheartrecord · 2024-11-12T12:02:04Z

/assign

potato-boii · 2024-11-21T03:33:35Z

/assign

FabianIMV · 2024-11-22T16:59:43Z

Hi! I'd like to help with this documentation issue. I plan to update the documentation to accurately reflect the actual behavior
and submit a PR with the changes

Is this issue still available to work on? Happy to collaborate with other assignees if needed.

akshayamadhuri · 2024-11-22T18:12:38Z

Hi! @FabianIMV I am working on Docs too..
happy to collaborate with you.

FabianIMV · 2024-11-22T19:26:12Z

Thanks @akshayamadhuri! I'm new to Kubernetes contributions. Would you like to share how we can work together on this? Happy to learn and help with any part you think would be useful.

FabianIMV · 2024-11-22T20:02:38Z

/assign

The documentation for eviction-max-pod-grace-period parameter claimed that negative values would defer to the pod's specified grace period. However, the current implementation does not honor this behavior. This commit updates the documentation to accurately reflect the actual behavior. Fixes kubernetes#118172

rainbowBPF2 added the kind/bug Categorizes issue or PR as related to a bug. label May 22, 2023

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 22, 2023

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 22, 2023

k8s-ci-robot assigned fgksgf May 25, 2023

fgksgf mentioned this issue May 25, 2023

Fix kubelet parameter eviction-max-pod-grace-period does not work as expected #118259

Closed

kannon92 mentioned this issue Dec 14, 2023

Already API-evicted Pods do not get evicted by the kubelet eviction manager (memory pressure, ephemeral storage pressure) #122297

Closed

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels May 24, 2024

kannon92 added this to SIG Node Bugs Jul 22, 2024

kannon92 moved this to Triaged in SIG Node Bugs Jul 22, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 22, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 21, 2024

haircommander moved this from Triaged to Triage in SIG Node Bugs Oct 23, 2024

k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Oct 23, 2024

k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. priority/backlog Higher priority than priority/awaiting-more-evidence. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 23, 2024

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 23, 2024

k8s-ci-robot added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Oct 23, 2024

k8s-ci-robot assigned ArpanSolanki29 Nov 1, 2024

k8s-ci-robot assigned anneheartrecord Nov 12, 2024

k8s-ci-robot assigned potato-boii Nov 21, 2024

akshayamadhuri mentioned this issue Nov 22, 2024

Update kubelet.md kubernetes/website#48804

Closed

k8s-ci-robot assigned FabianIMV Nov 22, 2024

FabianIMV linked a pull request Nov 22, 2024 that will close this issue

docs: clarify eviction-max-pod-grace-period behavior with negative values #128945

Open

potato-boii removed their assignment Nov 24, 2024

FabianIMV mentioned this issue Nov 25, 2024

docs: clarify eviction-max-pod-grace-period behavior with negative values FabianIMV/kubernetes#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubelet parameter(eviction-max-pod-grace-period ), not work as expected like officical comment. #118172

kubelet parameter(eviction-max-pod-grace-period ), not work as expected like officical comment. #118172

rainbowBPF2 commented May 22, 2023 •

edited

Loading

tzneal commented May 22, 2023

SergeyKanzhelev commented May 24, 2023

rainbowBPF2 commented May 25, 2023 •

edited

Loading

fgksgf commented May 25, 2023

k8s-triage-robot commented May 24, 2024

k8s-triage-robot commented Aug 22, 2024

k8s-triage-robot commented Sep 21, 2024

haircommander commented Oct 23, 2024

haircommander commented Oct 23, 2024

SergeyKanzhelev commented Oct 23, 2024

k8s-ci-robot commented Oct 23, 2024

ArpanSolanki29 commented Nov 1, 2024

anneheartrecord commented Nov 12, 2024

potato-boii commented Nov 21, 2024

FabianIMV commented Nov 22, 2024

akshayamadhuri commented Nov 22, 2024

FabianIMV commented Nov 22, 2024

FabianIMV commented Nov 22, 2024

kubelet parameter(eviction-max-pod-grace-period ), not work as expected like officical comment. #118172

kubelet parameter(eviction-max-pod-grace-period ), not work as expected like officical comment. #118172

Comments

rainbowBPF2 commented May 22, 2023 • edited Loading

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

tzneal commented May 22, 2023

SergeyKanzhelev commented May 24, 2023

rainbowBPF2 commented May 25, 2023 • edited Loading

fgksgf commented May 25, 2023

k8s-triage-robot commented May 24, 2024

k8s-triage-robot commented Aug 22, 2024

k8s-triage-robot commented Sep 21, 2024

haircommander commented Oct 23, 2024

haircommander commented Oct 23, 2024

SergeyKanzhelev commented Oct 23, 2024

k8s-ci-robot commented Oct 23, 2024

Guidelines

ArpanSolanki29 commented Nov 1, 2024

anneheartrecord commented Nov 12, 2024

potato-boii commented Nov 21, 2024

FabianIMV commented Nov 22, 2024

akshayamadhuri commented Nov 22, 2024

FabianIMV commented Nov 22, 2024

FabianIMV commented Nov 22, 2024

rainbowBPF2 commented May 22, 2023 •

edited

Loading

rainbowBPF2 commented May 25, 2023 •

edited

Loading