Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Blackhole implementation for e2e tests #18641

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

henrybear327
Copy link
Contributor

Based on the ideas discussed in the issues [1] and PRs [2][3][6], we switch from using an L4 reverse proxy to an L7 forward proxy to properly block peer network traffic, without the need to use external tools.

The design aims to implement only the minimal required features that satisfy blocking incoming and outgoing peer traffic. Complicated features such as packet reordering, packet delivery delay, etc. to a future container-based solution.

Background

A peer will
(a) receive traffic from its peers
(b) initiate connections to its peers (via stream and pipeline).

Thus, the current mechanism of only blocking peer traffic via the peer's existing reverse proxy is insufficient, since only scenario (a) is handled, and network traffic in scenario (b) is not blocked at all.

Proposed solution

We introduce an L7 forward proxy for each peer, which will be proxying all the connections initiated from a peer to its peers.

We will remove the current use of the L4 reverse proxy, as the L7 forward proxy holds the information of the destination, we can block all incoming and outgoing traffic that is initiated from a peer to others, without having to resort to external tools, such as iptables.

The modified architecture will look something like this:

A --- A's forward proxy --- B
   ^ newly introduced

Implementation

The main subtasks are

  • redesigned as an L7 forward proxy
  • introduce a new environment variable E2E_TEST_FORWARD_PROXY_IP to bypass the limitation of http.ProxyFromEnvironment
  • implement an L7 forward proxy

Known limitations are

  • Doesn't support unix socket, as L7 HTTP transport proxy only supports HTTP/HTTPS/and socks5 -> Although the e2e test supports unix sockets for peer communication, only a few of the e2e test cases use unix sockets as the majority of e2e test cases use HTTP/HTTPS. It's been discussed and decided that it is ok for now without the unix socket support.
  • it's L7 so we need to send a perfectly crafted HTTP request
  • doesn’t support reordering, dropping, or manipulating packets on the fly

Testing

  • make gofail-enable && make build && make gofail-disable && go test -timeout 60s -run ^TestBlackholeByMockingPartitionLeader$ go.etcd.io/etcd/tests/v3/e2e -v -count=1
  • make gofail-enable && make build && make gofail-disable && go test -timeout 60s -run ^TestBlackholeByMockingPartitionFollower$ go.etcd.io/etcd/tests/v3/e2e -v -count=1
  • go test -timeout 30s -run ^TestServer_ go.etcd.io/etcd/pkg/v3/proxy -v -failfast

References

[1] issue #17737
[2] PR (V1) https://github.com/henrybear327/etcd/tree/fix/e2e_blackhole
[3] PR (V2) #17891
[4] #17938 (comment)
[5] #17985 (comment)
[6] #17938

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: henrybear327
Once this PR has been reviewed and has the lgtm label, please assign ahrtr for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@henrybear327
Copy link
Contributor Author

/cc @serathius
/cc @fuweid

This PR only contains the L7 forward proxy code.

After merging #18640, this PR will be a lot cleaner to review. (As I can't set this PR to be based on #18640, thus there are quite some duplicated commits.)

@codecov-commenter
Copy link

codecov-commenter commented Sep 25, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 49.27536% with 105 lines in your changes missing coverage. Please review.

Project coverage is 68.72%. Comparing base (c86c93c) to head (85dbfc3).

Current head 85dbfc3 differs from pull request most recent head 8c7320a

Please upload reports for the commit 8c7320a to get more accurate results.

Files with missing lines Patch % Lines
pkg/proxy/server.go 49.27% 95 Missing and 10 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
Files with missing lines Coverage Δ
pkg/proxy/server.go 51.71% <49.27%> (-9.24%) ⬇️

... and 19 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #18641      +/-   ##
==========================================
- Coverage   68.77%   68.72%   -0.05%     
==========================================
  Files         420      420              
  Lines       35535    35451      -84     
==========================================
- Hits        24438    24363      -75     
+ Misses       9668     9659       -9     
  Partials     1429     1429              

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c86c93c...8c7320a. Read the comment docs.

Part of the patches to fix etcd-io#17737

During the development of etcd-io#17938,
we agreed that during the transition to L7 forward proxy, unused
features and features targeting L4 reverse proxy will be dropped.

This feature falls under the unused feature.

Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
Part of the patches to fix etcd-io#17737

During the development of etcd-io#17938,
we agreed that during the transition to L7 forward proxy, unused
features and features targeting L4 reverse proxy will be dropped.

This feature falls under the unused feature.

Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
Part of the patches to fix etcd-io#17737

During the development of etcd-io#17938,
we agreed that during the transition to L7 forward proxy, unused
features and features targeting L4 reverse proxy will be dropped.

This feature falls under the unused feature. Also, the initial
implementation has a bug: if connections are not created continuously,
the latency accept will not work. Consider the following case:
a) set latency accept
b) put latency accept into effect
c) latency accept will start idling the goroutine
d) block-wait at accept() - waiting for new connections
e) new connection comes in - establish it
f) go to c -> as we can see, if the request come every x seconds, where
x is larger than the latency accept time we set, we can see that the
latency accept has no effect.

Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
Part of the patches to fix etcd-io#17737

During the development of etcd-io#17938,
we agreed that during the transition to L7 forward proxy, unused
features and features targeting L4 reverse proxy will be dropped.

This feature falls under the unused feature.

Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
Based on the ideas discussed in the issues [1] and PRs [2][3][6], it's
been decided to switch from using an L4 reverse proxy to an L7 forward
proxy to properly block peer network traffic, without the need to use
external tools.

The design aims to implement only the minimal required features that
satisfy blocking incoming and outgoing peer traffic. Complicated features
such as packet reordering, packet delivery delay, etc. to a future
container-based solution.

A peer will
(a) receive traffic from its peers
(b) initiate connections to its peers (via stream and pipeline).

Thus, the current mechanism of only blocking peer traffic via the peer's
existing reverse proxy is insufficient, since only scenario (a) is
handled, and network traffic in scenario (b) is not blocked at all.

We introduce an L7 forward proxy for each peer, which will be proxying
all the connections initiated from a peer to its peers.

We will remove the current use of the L4 reverse proxy, as the L7
forward proxy holds the information of the destination, we can block all
incoming and outgoing traffic that is initiated from a peer to others,
without having to resort to external tools, such as iptables.

The modified architecture will look something like this:
```
A --- A's forward proxy --- B
   ^ newly introduced
```

The main subtasks are
- redesigned as an L7 forward proxy
- introduce a new environment variable `E2E_TEST_FORWARD_PROXY_IP` to
bypass the limitation of `http.ProxyFromEnvironment`
- implement an L7 forward proxy

Known limitations are
- Doesn't support unix socket, as L7 HTTP transport proxy only supports
HTTP/HTTPS/and socks5 -> Although the e2e test supports unix sockets for
peer communication, only a few of the e2e test cases use unix sockets as
the majority of e2e test cases use HTTP/HTTPS. It's been discussed and
decided that it is ok for now without the unix socket support.
- it's L7 so we need to send a perfectly crafted HTTP request
- doesn’t support reordering, dropping, or manipulating packets on the
fly

- `make gofail-enable && make build && make gofail-disable && go test -timeout 60s -run ^TestBlackholeByMockingPartitionLeader$ go.etcd.io/etcd/tests/v3/e2e -v -count=1`
- `make gofail-enable && make build && make gofail-disable && go test -timeout 60s -run ^TestBlackholeByMockingPartitionFollower$ go.etcd.io/etcd/tests/v3/e2e -v -count=1`
- `go test -timeout 30s -run ^TestServer_ go.etcd.io/etcd/pkg/v3/proxy -v -failfast`

[1] issue etcd-io#17737
[2] PR (V1) https://github.com/henrybear327/etcd/tree/fix/e2e_blackhole
[3] PR (V2) etcd-io#17891
[4] etcd-io#17938 (comment)
[5] etcd-io#17985 (comment)
[6] etcd-io#17938

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

Signed-off-by: Siyuan Zhang <sizhang@google.com>
Signed-off-by: Iván Valdés Castillo <iv@nvald.es>
Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
@k8s-ci-robot
Copy link

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants