Skip to content

android: harden phase-3.2 datapath validation#776

Open
TryDying wants to merge 11 commits intotailscale:mainfrom
TryDying:wide
Open

android: harden phase-3.2 datapath validation#776
TryDying wants to merge 11 commits intotailscale:mainfrom
TryDying:wide

Conversation

@TryDying
Copy link
Copy Markdown

Summary

  • harden the phase-3.2 datapath prototype with stable flow_id, richer lifecycle logs, and relay resource snapshots
  • add reusable host-side baseline test services plus dynamic LAN/tailnet target discovery for adb validation
  • add phase-3.2 validation scripts covering baseline, concurrent SOCKS/direct traffic, wrong-port /32 behavior, and TCP lifecycle cases
  • update Chinese docs and progress records for the phase-3.2 engineering prototype

Testing

  • make apk
  • make install
  • BUILD_FIRST=false INSTALL_FIRST=false sh scripts/tsocks-test-phase32.sh

TryDying and others added 11 commits March 20, 2026 02:42
- Add AGENTS.md with project overview, documentation index, architecture highlights
- Add docs/01-项目指南.md: Project overview and quick start guide
- Add docs/02-开发指南.md: Development workflow and feature guidelines
- Add docs/03-技术指南.md: Architecture design and technical stack
- Add docs/04-更新日志.md: Version history and bug fix tracking
- Add PROGRESS.md: High-signal lessons learned log template

All documentation follows established language and maintenance rules.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Signed-off-by: trydying <trydying@mail.ustc.edu.cn>
Copilot AI review requested due to automatic review settings April 12, 2026 15:11
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the Android phase-3.2 datapath validation prototype by adding stable flow correlation, richer lifecycle/relay observability, host-side baseline services, and end-to-end adb-driven validation scripts.

Changes:

  • Add tsocks datapath routing rules, stable flow_id, relay/resource snapshots, and TCP lifecycle logging across the Go datapath stack.
  • Introduce reusable host-side HTTP/TCP baseline test services plus scripts to trigger, run, and machine-verify scenarios (including concurrency and wrong-port /32 behavior).
  • Add Android debug-only adb harness pieces (WorkManager worker + activity) to drive probes and datapath HTTP checks; update docs/progress records and bump versionCode.

Reviewed changes

Copilot reviewed 32 out of 33 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
scripts/tsocks-test-trigger.sh Scenario-based adb trigger for broadcast/activity-based tests.
scripts/tsocks-test-services-stop.sh Stops the host-side baseline test server via pidfile.
scripts/tsocks-test-services-start.sh Starts the host-side baseline test server and runs a health check.
scripts/tsocks-test-services-health.sh Host-side readiness checks for HTTP/TCP baselines.
scripts/tsocks-test-run-all.sh Orchestrates build/install/connect + runs scenario suite and summaries.
scripts/tsocks-test-phase32.sh Phase-3.2 validation suite (baseline/concurrency/wrong-port/lifecycle).
scripts/tsocks-test-pass-fail.sh Log-based PASS/FAIL extraction and machine-verifiable assertions.
scripts/tsocks-test-logs.sh Convenience logcat filter for tsocks validation tags.
scripts/tsocks-test-install.sh Installs the APK via make install.
scripts/tsocks-test-env.sh Discovers default LAN/tailnet hosts and exports baseline ports.
scripts/tsocks-test-build.sh Builds the APK via make apk.
scripts/tsocks_test_server.py Threaded HTTP/TCP baseline server for healthz and TCP behaviors.
PROGRESS.md Adds Chinese progress/lessons-learned entry for phase-3.2 prototype.
libtailscale/tsocks.go Implements shared probe execution and datapath SOCKS relay with logging.
libtailscale/tsocks_rules.go Centralizes rules, injected-route target derivation, and flow ID generation.
libtailscale/tsocks_rules_test.go Unit tests for rule matching, injected routes, offload state, flow IDs.
libtailscale/tsocks_observability.go Adds relay start/end snapshots and close-reason classification.
libtailscale/step0_tun.go Adds “step0” gVisor proof-stack interception and lifecycle event logging.
libtailscale/net.go Injects /32 routes and wraps the TUN device to enable step0 interception.
libtailscale/interfaces.go Extends Application interface with RunTsocksProbe for adb automation.
libtailscale/backend.go Wires tsocks controller into backend/netstack TCP flow handling.
android/src/main/java/com/tailscale/ipn/IPNReceiver.java Adds debug-only broadcast action to enqueue probe worker requests.
android/src/main/java/com/tailscale/ipn/DatapathTestActivity.kt Debug-only activity to initiate a raw-socket HTTP datapath request.
android/src/main/java/com/tailscale/ipn/AdbTcpHttpTestWorker.kt WorkManager worker that calls into libtailscale RunTsocksProbe.
android/src/main/java/com/tailscale/ipn/AdbTcpHttpTestContract.kt Contract constants for adb test actions/extras/log tags.
android/src/main/AndroidManifest.xml Declares DatapathTestActivity component.
android/build.gradle Bumps Android versionCode.
AGENTS.md Adds repository workflow/documentation rules and validated commands.
.gitignore Ignores local dev artifacts (.envrc, etc.).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread libtailscale/net.go
Comment on lines +127 to +131
for _, routeTarget := range tsocksInjectedRouteTargets() {
if err := builder.AddRoute(routeTarget.String(), 32); err != nil {
return err
}
b.logger.Logf("updateTUN: added tsocks injected route %s/32", routeTarget)
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tsocksInjectedRouteTargets() currently returns hard-coded public IPs, and these /32 routes are added to the VPN unconditionally on every updateTUN call. That will divert traffic to those IPs into the TUN for all users, which is high-risk outside the test harness. Please gate these injected routes behind an explicit debug/feature flag (and ensure the default is disabled), or only add them when running the phase-3.x validation flow.

Copilot uses AI. Check for mistakes.
Comment thread libtailscale/net.go
Comment on lines +189 to +193
if tunDev, err = newStep0Tun(tunDev, b.appCtx, b.tsocks); err != nil {
closeFileDescriptor()
return err
}
b.logger.Logf("updateTUN: wrapped TUN device for step0")
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping the TUN with step0Tun unconditionally changes packet handling and enables gVisor listeners for the intercept targets in all builds. This should be guarded behind a debug/feature flag (or only enabled when tsocks/datapath validation is explicitly turned on), otherwise production users may see unexpected routing/overhead or failures if the SOCKS server isn’t reachable.

Copilot uses AI. Check for mistakes.
Comment thread libtailscale/backend.go
if err != nil {
return nil, fmt.Errorf("netstack.Create: %w", err)
}
ns.GetTCPHandlerForFlow = b.tsocks.datapathHandler
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ns.GetTCPHandlerForFlow is set unconditionally to tsocks.datapathHandler, which enables offload behavior for matching flows in all runs. Given the rules include hard-coded public IPs and a hard-coded SOCKS server, this needs a runtime/compile-time gate so the handler is only installed when tsocks validation is explicitly enabled; otherwise normal users can have unrelated traffic hijacked or fail when the SOCKS server is unreachable.

Copilot uses AI. Check for mistakes.
Comment thread libtailscale/step0_tun.go
Comment on lines +267 to +268
if decision.Route != tsocksRouteTailnetSocks || !slices.Contains(tsocksInterceptTargets(), target) {
return false
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldIntercept calls tsocksInterceptTargets() and then slices.Contains(...) on every packet. tsocksInterceptTargets() allocates and sorts a slice each call, so this becomes a hot-path overhead. Consider precomputing the intercept targets once (e.g., store a map/set in step0Tun during init) and reusing it for membership checks.

Copilot uses AI. Check for mistakes.
Comment on lines +141 to +147
func tsocksShouldOffloadTarget(dst netip.AddrPort) bool {
for _, target := range tsocksInterceptTargets() {
if target == dst {
return true
}
}
return false
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tsocksShouldOffloadTarget calls tsocksInterceptTargets(), which rebuilds and sorts the intercept target list every time. This is avoidable overhead in routing/packet-processing paths. Consider caching the computed intercept targets (and/or a set) at init time since tsocksDatapathRules are static.

Copilot uses AI. Check for mistakes.
#

resolve_default_lan_host() {
ip -4 -o addr show up scope global | awk '$2 != "tailscale0" { split($4, a, "/"); print a[1]; exit }'
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve_default_lan_host assumes the ip command exists. Because most caller scripts run with set -e and source this file, missing ip will abort the whole test flow. Consider guarding with command -v ip (and returning an empty/default value with a clear error message) to make failures more actionable across dev environments.

Suggested change
ip -4 -o addr show up scope global | awk '$2 != "tailscale0" { split($4, a, "/"); print a[1]; exit }'
if command -v ip >/dev/null 2>&1; then
ip -4 -o addr show up scope global | awk '$2 != "tailscale0" { split($4, a, "/"); print a[1]; exit }'
else
echo "tsocks-test-env: 'ip' command not found; leaving TSOCKS_TEST_LAN_HOST unset unless provided explicitly" >&2
fi

Copilot uses AI. Check for mistakes.
assert_contains "socks-close" "event=conn_close .*closeReason=" "$log_file"
assert_contains "socks-connect" "event=socks_connect .*stage=connect_success" "$log_file"
assert_not_contains "socks-test-fail" "event=TEST_FAIL" "$log_file"
assert_not_contains "socks-cross-target" "flow_id=.*dst=104\.18\.26\.120:80.*\n.*flow_id=.*dst=104\.18\.27\.120:80" "$log_file" || true
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cross-target assertion is currently ineffective: grep -E doesn’t match across newlines, and the check is also silenced with || true, so it can’t ever fail the run. If you want to detect flow_id reuse across different dst values, use a line-oriented check (e.g., extract flow_id->dst mappings and assert 1:1) or drop this dead assertion.

Suggested change
assert_not_contains "socks-cross-target" "flow_id=.*dst=104\.18\.26\.120:80.*\n.*flow_id=.*dst=104\.18\.27\.120:80" "$log_file" || true
if ! awk '
{
flow_id = ""
dst = ""
if (match($0, /flow_id=[^[:space:]]+/)) {
flow_id = substr($0, RSTART + 8, RLENGTH - 8)
}
if (match($0, /dst=[^[:space:]]+/)) {
dst = substr($0, RSTART + 4, RLENGTH - 4)
}
if (flow_id != "" && dst != "") {
if ((flow_id in seen_dst) && seen_dst[flow_id] != dst) {
printf "flow_id %s reused across dst values: %s and %s\n", flow_id, seen_dst[flow_id], dst > "/dev/stderr"
exit 1
}
seen_dst[flow_id] = dst
}
}
' "$log_file"; then
printf 'ASSERT FAIL [%s]: flow_id reused across different dst values\n' "socks-cross-target" >&2
exit 1
fi

Copilot uses AI. Check for mistakes.
<activity
android:name=".DatapathTestActivity"
android:excludeFromRecents="true"
android:exported="true"
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DatapathTestActivity is declared as exported in the main manifest. Even though it exits immediately when !BuildConfig.DEBUG, in debug builds any app can start it and provide an arbitrary url extra, turning it into a network request primitive. Prefer moving this activity declaration to a debug-only manifest/source set, or mark it non-exported / protect it with a signature permission so it’s not externally invokable.

Suggested change
android:exported="true"
android:exported="false"

Copilot uses AI. Check for mistakes.
Comment thread libtailscale/tsocks.go
Comment on lines +292 to +297
headers := fmt.Sprintf("%s %s HTTP/1.1\r\nHost: %s\r\nConnection: close\r\nUser-Agent: tailscale-android-tsocks-test\r\n", method, path, req.Host)
hostHeader := strings.TrimSpace(req.HostHeader)
if hostHeader == "" {
hostHeader = req.Host
}
headers = fmt.Sprintf("%s %s HTTP/1.1\r\nHost: %s\r\nConnection: close\r\nUser-Agent: tailscale-android-tsocks-test\r\n", method, path, hostHeader)
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probeHTTP assigns headers twice (first using req.Host, then immediately overwriting with hostHeader). This is dead code and makes the request construction harder to follow. Please remove the first assignment and build the headers once using the chosen Host header value.

Suggested change
headers := fmt.Sprintf("%s %s HTTP/1.1\r\nHost: %s\r\nConnection: close\r\nUser-Agent: tailscale-android-tsocks-test\r\n", method, path, req.Host)
hostHeader := strings.TrimSpace(req.HostHeader)
if hostHeader == "" {
hostHeader = req.Host
}
headers = fmt.Sprintf("%s %s HTTP/1.1\r\nHost: %s\r\nConnection: close\r\nUser-Agent: tailscale-android-tsocks-test\r\n", method, path, hostHeader)
hostHeader := strings.TrimSpace(req.HostHeader)
if hostHeader == "" {
hostHeader = req.Host
}
headers := fmt.Sprintf("%s %s HTTP/1.1\r\nHost: %s\r\nConnection: close\r\nUser-Agent: tailscale-android-tsocks-test\r\n", method, path, hostHeader)

Copilot uses AI. Check for mistakes.
Comment thread android/build.gradle
minSdkVersion 26
targetSdkVersion 35
versionCode 468
versionCode 503
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AGENTS.md states that after modifying any code, android/build.gradle versionCode must be incremented by 1. This change jumps from 468 to 503, which conflicts with that rule. Either adjust the documented rule (if larger bumps are allowed) or update versionCode to match the stated +1 policy.

Suggested change
versionCode 503
versionCode 469

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants