laylink/contract.md

# POP-Controlled Reverse Access Gateway Contract

## Implementation Status

Last updated: 2026-05-29 Asia/Shanghai.

Current phase: MVP bootstrap in progress.

Direction update under evaluation:

* Legacy agent naming has been migrated to `Client Agent` for the first MVP path.
* First capability being implemented:
  * Client connects to Client Agent.
  * Client Agent connects outbound to POP Server.
  * Client Agent wraps all traffic into LayLink frames.
  * POP Server authenticates/authorizes the request.
  * POP Server connects directly to the public target.
  * POP Server relays target data back to Client Agent through LayLink frames.
* KCP acceleration requirement:
  * The Agent-to-POP transport should support a KCP-over-UDP mode.
  * TCP framed mode should remain as a fallback/control-friendly transport.
  * KCP should be introduced behind a transport abstraction instead of leaking into session, policy, or routing code.
* Transport protocol configuration:
  * POP Server uses `POP_ALLOWED_AGENT_TRANSPORTS` to allow one or more Agent-to-POP transports.
  * Agent uses `AGENT_TRANSPORT_PROTOCOL` to choose one concrete transport.
  * Allowed names are `tcp`, `udp`, and `kcp`.
  * Current runnable implementations are `tcp` and experimental `kcp`; `udp` is reserved.
* Feasibility:
  * Workerman supports long-running async TCP servers and custom protocols; it is suitable for the framed fallback/control channel.
  * KCP itself is a UDP-based reliable ARQ protocol, so adding KCP means adding a UDP transport layer and session demultiplexing below the existing LayLink frame protocol.
  * Native PHP KCP support is possible through an extension or FFI binding to `ikcp.c`; pure-PHP KCP is not recommended for production performance.
  * Recommended implementation order:
    1. Complete Client Agent naming migration in code, docs, config, and entrypoints.
    2. Implement TCP-framed Client Agent -> POP -> public target path.
    3. Define `TransportInterface` so frame protocol can run over TCP now and KCP later.
    4. Add KCP-over-UDP transport behind the transport abstraction after the TCP framed path is stable.
  * Main risk:
    * KCP is not a socket by itself. It needs UDP I/O, timers, packet flush/update scheduling, MTU handling, retransmission tuning, and connection/session management.
    * PHP-only KCP may work as a prototype but is likely CPU-heavy under concurrency.
    * The cleanest production path is a PHP extension/FFI binding or a sidecar KCP transport process.

Completed in this checkpoint:

* Added Composer PSR-4 autoload for `LayLink\\`.
* Added `.env.example`, `config/nodes.php`, `config/policies.php`, and `config/routes.php`.
* Added Workerman CLI entrypoints:
  * `bin/pop-server.php`
  * `bin/client-agent.php`
* Added length-prefixed JSON frame protocol:
  * `Frame`
  * `FrameType`
  * `FrameCodec`
  * `FrameParser`
* Added POP-side MVP services:
  * agent listener with node token auth, heartbeat handling, node registry, and framed session relay
* Added Agent-side MVP client:
  * outbound POP connection
  * AUTH frame
  * heartbeat
  * local allowlist enforcement
  * target TCP connection and DATA/CLOSE relay
* Added local JSONL audit logger at `runtime/audit.log`.
* Configured Workerman logs and pid files under `runtime/`.
* Added `readme.md` with node type descriptions, per-role `.env` requirements, config examples, and deployment checklist.
* Ran `composer dump-autoload`.
* Verified all non-vendor PHP files with `php -l`.
* Verified PSR-4 autoload by instantiating `LayLink\\Protocol\\Frame`.
* Verified POP Workerman startup with localhost high port outside the sandbox:
  * `POP_AGENT_LISTEN=127.0.0.1:19001`
  * worker reached `[OK]` and stopped cleanly via `timeout`.
* Added Agent-to-POP transport configuration:
  * `POP_ALLOWED_AGENT_TRANSPORTS`
  * `AGENT_TRANSPORT_PROTOCOL`
  * POP rejects disallowed Agent transport during node authentication with `transport_not_allowed`.
* Renamed the agent entrypoint and defaults:
  * Client Agent entrypoint is `bin/client-agent.php`
  * default `NODE_ID=client-01`
  * default `NODE_TYPE=client`
  * runtime pid file `runtime/client-agent.pid`
  * worker name `laylink-client-agent`
* Verified `bin/client-agent.php` starts under Workerman and reaches `[OK]` in a short smoke test.
* Reworked new MVP data path:
  * local client connects to one enabled Client Agent ingress listener
  * Client Agent sends `OPEN` to POP Server
  * POP Server authenticates client request and checks policy
  * POP Server opens the public target directly
  * `DATA` and `CLOSE` frames relay the stream between Client Agent and POP Server
* Added Client Agent local ingress protocols:
  * `socks5`: SOCKS5 `CONNECT`; default enabled on `127.0.0.1:1080`
  * `http-proxy`: HTTP `CONNECT` and ordinary HTTP absolute URL proxy requests; default disabled on `127.0.0.1:8080`
  * `raw-json`: newline JSON debug ingress; default disabled on `127.0.0.1:9000`
* Added per-ingress env switches, listen IPs, and listen ports:
  * `CLIENT_AGENT_SOCKS5_ENABLED`
  * `CLIENT_AGENT_SOCKS5_LISTEN_IP`
  * `CLIENT_AGENT_SOCKS5_LISTEN_PORT`
  * `CLIENT_AGENT_SOCKS5_AUTH_MODE`
  * `CLIENT_AGENT_SOCKS5_USERNAME`
  * `CLIENT_AGENT_SOCKS5_PASSWORD`
  * `CLIENT_AGENT_HTTP_PROXY_ENABLED`
  * `CLIENT_AGENT_HTTP_PROXY_LISTEN_IP`
  * `CLIENT_AGENT_HTTP_PROXY_LISTEN_PORT`
  * `CLIENT_AGENT_RAW_JSON_ENABLED`
  * `CLIENT_AGENT_RAW_JSON_LISTEN_IP`
  * `CLIENT_AGENT_RAW_JSON_LISTEN_PORT`
* Added Client Agent default identity for generated proxy requests:
  * `CLIENT_AGENT_AUTH_TOKEN`
  * `CLIENT_AGENT_USER_ID`
* Completed SOCKS5 TCP proxy protocol handling for the current MVP:
  * method negotiation
  * no-auth method
  * RFC1929 username/password method
  * IPv4/domain/IPv6 target address parsing
  * `CONNECT`
  * standard SOCKS5 failure replies
  * `BIND` returns command-not-supported
  * `UDP ASSOCIATE` returns a local UDP relay endpoint and uses LayLink `UDP_DATA` frames
* Added LayLink UDP datagram relay path:
  * Client Agent parses SOCKS5 UDP request packets
  * Client Agent sends `UDP_DATA` frames to POP Server
  * POP Server validates client auth and `protocol=udp` policy
  * POP Server sends datagrams to public UDP targets
  * POP Server returns UDP responses as `UDP_DATA`
* Added UDP egress sample policy `public-udp-egress` for ports `53`, `123`, and `443`.
* Added `FrameType::descriptions()` as the code-level frame type catalog.
* Verified Client Agent can start SOCKS5, HTTP proxy, and raw-json listeners together on localhost high ports.
* Verified Client Agent can start SOCKS5 TCP plus SOCKS5 UDP relay listeners together on localhost high ports.
* Added `scripts/verify-socks5.sh` to verify real SOCKS5 HTTPS requests:
  * `https://bing.com/` for connectivity and HTTPS support
  * `https://ip.sb/` for egress IP
* Reorganized `.env.example` into readable sections:
  * `[config]`
  * `[kcp]`
  * `[client-agent]`
  * `[pop-server]`
  * Section headers are comments-for-humans in practice; the current Env loader ignores lines without `=`.
* Removed deprecated/compatibility-only surfaces:
  * `POP_CLIENT_LISTEN`
  * POP direct client listener
  * `src/Server/ClientListener.php`
  * `bin/client-gateway.php`
  * `src/Client/ClientGateway.php`
  * `bin/border-agent.php`
  * sample border node and border policy docs
* Verified POP now starts with only `laylink-pop-agent-listener`.
* Fixed SOCKS5 error behavior when POP is not connected:
  * SOCKS5 method negotiation no longer returns text errors.
  * POP connection failures during CONNECT now return standard SOCKS5 failure replies.
* Added Agent-to-POP Frame encryption:
  * `LAYLINK_FRAME_ENCRYPTION=none|chacha20`
  * `LAYLINK_FRAME_ENCRYPTION_KEY`
  * POP Server and Client Agent must use identical encryption settings.
  * `chacha20` currently uses libsodium XChaCha20 stream encryption with a random nonce per frame.
* Verified `none` and `chacha20` FrameCodec encode/decode round trips.
* Verified POP Server starts with `LAYLINK_FRAME_ENCRYPTION=chacha20`.
* Added port range matching for policy and agent allowlist ports:
  * `target_ports` supports exact ports such as `80` and string ranges such as `'8080-10080'`.
  * `allowed_ports` supports the same syntax.
* Allowed sample public TCP egress policy on port range `'8080-10080'` for HTTP-alt/speedtest style endpoints.
* Optimized TCP stream `DATA` frames:
  * Control frames remain JSON.
  * TCP `DATA` payloads now use binary frame encoding when both ends run the updated code.
  * This removes base64 expansion and JSON string encoding from the hot TCP data path.
* Verified binary TCP `DATA` frame encode/decode under both `none` and `chacha20`.
* Added POP-side target DNS pre-resolution:
  * Domain resolution failures return `OPEN_FAIL` with `dns_resolution_failed`.
  * POP no longer lets common target DNS failures bubble up as raw `stream_socket_client()` warnings.
* Added TCP backpressure for large transfers:
  * POP pauses target reads when the Agent connection send buffer crosses the high watermark.
  * POP resumes target reads when the Agent connection drains.
  * Client Agent pauses local client reads when the POP connection send buffer crosses the high watermark.
  * Client Agent pauses POP reads while a local client output buffer is full.
  * Send buffer limits default to 64 MiB with a 32 MiB backpressure high watermark.
  * Tuning envs:
    * `LAYLINK_DATA_CHUNK_BYTES`
    * `LAYLINK_MAX_SEND_BUFFER_BYTES`
    * `LAYLINK_BACKPRESSURE_HIGH_WATERMARK_BYTES`
* Fixed large-download truncation risk:
  * Client Agent now treats POP `CLOSE` as a graceful remote EOF and waits for the local client send buffer to drain before closing the local socket.
  * TCP `DATA` is split into configurable chunks, defaulting to 1 MiB, to reduce frame overhead while avoiding oversized frames.
  * POP refreshes Agent activity on any valid frame, not only `PING`, reducing heartbeat false positives during heavy traffic.
* Started Agent-to-POP transport abstraction:
  * Added `FrameClientTransport`.
  * Added `TcpFrameClientTransport` as the current TCP implementation.
  * `AgentClient` now sends and receives LayLink frames through the transport interface instead of directly owning `AsyncTcpConnection` and `FrameParser`.
  * This preserves current TCP behavior while preparing a `KcpFrameClientTransport` implementation.
* Added POP-side frame transport abstraction:
  * Added `FrameServerConnection`.
  * Added `TcpFrameServerConnection`.
  * Added `TcpFrameServerListener`.
  * `AgentListener`, `NodeConnection`, `NodeRegistry`, and `TunnelSession` now hold Agent connections through `FrameServerConnection`.
  * TCP listener decode/encode details are isolated from POP session, policy, heartbeat, and relay logic.
* Added transport factory/config selection:
  * `FrameClientTransportFactory` maps `AGENT_TRANSPORT_PROTOCOL=tcp` to `TcpFrameClientTransport`.
  * `FrameServerListenerFactory` maps the implemented POP transport `tcp` to `TcpFrameServerListener`.
  * `FrameClientTransportFactory` maps `AGENT_TRANSPORT_PROTOCOL=kcp` to `KcpFrameClientTransport`.
  * `FrameServerListenerFactory` maps POP transport `kcp` to `KcpFrameServerListener`.
  * `udp` still fails at factory boundaries with explicit not-implemented errors instead of leaking into business logic.
* Added experimental multi-connection Client Agent -> POP support:
  * `CLIENT_AGENT_POP_CONNECTIONS` controls how many parallel Agent-to-POP long connections a Client Agent opens.
  * New local TCP sessions are distributed round-robin across authenticated POP transports.
  * Each session stays bound to its selected POP transport for the whole session lifetime.
  * POP `NodeRegistry` now supports multiple live connections under the same `NODE_ID`.
  * Heartbeat activity and offline cleanup are tracked per Agent connection.
* KCP/UDP implementation decision:
  * Start with `kcp` before raw `udp` for Agent-to-POP frame transport.
  * Existing TCP tunnel sessions require ordered, reliable byte-stream semantics; raw UDP would need retransmission, ordering, MTU fragmentation, congestion/window handling, and session cleanup.
  * Implementing raw UDP as a general Frame transport would effectively recreate a weaker KCP.
  * Keep the existing SOCKS5 `UDP ASSOCIATE`/`UDP_DATA` feature separate: it is application datagram relay over the current reliable Agent-to-POP channel, not the Agent-to-POP transport itself.
  * Recommended KCP path is a transport implementation behind `FrameClientTransport` / `FrameServerConnection`, backed by a native extension, FFI binding, or sidecar process rather than pure PHP for production throughput.
* Added KCP Agent-to-POP transport:
  * `KcpPacketCodec` defines UDP packet types for `SYN`, `SYN_ACK`, `DATA`, `ACK`, and `CLOSE`.
  * `KcpFrameClientTransport` runs Client Agent frames over UDP while preserving the existing `FrameClientTransport` interface.
  * `KcpFrameServerListener` and `KcpFrameServerConnection` expose KCP/UDP sessions to POP as `FrameServerConnection`.
  * POP can now listen on both TCP and KCP when `POP_ALLOWED_AGENT_TRANSPORTS=tcp,kcp`.
  * `NativeKcpSession` uses PHP FFI to call native upstream `ikcp.c` through `native/kcp/liblaylink_kcp.so`.
  * `scripts/build-kcp-ffi.sh` builds the native shared library from vendored `native/kcp/ikcp.c`.
  * `LAYLINK_KCP_BACKEND=ffi` selects the native KCP backend; `LAYLINK_KCP_BACKEND=php` remains as a debugging fallback through `KcpReliableSession`.
  * `LAYLINK_KCP_FFI_LIB` can point to a custom native KCP library path.
* Added array-style env parsing:
  * `Env::csv()` accepts traditional comma-separated values such as `tcp,kcp`.
  * `Env::csv()` also accepts JSON arrays such as `["tcp","kcp"]`.
* Added KCP congestion and UDP EAGAIN controls:
  * `KcpUdpPacketSender` bypasses Workerman `UdpConnection::send()` for KCP packets and uses suppressed `stream_socket_sendto()` directly.
  * UDP `EAGAIN` / "Resource temporarily unavailable" no longer emits PHP warnings from the KCP transport path.
  * KCP packets that cannot be sent immediately are queued and retried on subsequent transport ticks.
  * Added KCP tuning envs:
    * `LAYLINK_KCP_NODELAY`
    * `LAYLINK_KCP_INTERVAL_MS`
    * `LAYLINK_KCP_FAST_RESEND`
    * `LAYLINK_KCP_NO_CONGESTION_CONTROL`
    * `LAYLINK_KCP_SEND_WINDOW`
    * `LAYLINK_KCP_RECV_WINDOW`
    * `LAYLINK_KCP_MTU_BYTES`
    * `LAYLINK_KCP_TICK_MS`
    * `LAYLINK_KCP_UDP_SEND_QUEUE_BYTES`
    * `LAYLINK_KCP_UDP_FLUSH_PACKETS`
    * `LAYLINK_KCP_OUTPUT_DRAIN_PACKETS`
* Added POP worker count configuration:
  * `POP_AGENT_TCP_WORKERS` controls TCP Agent listener worker count.
  * `POP_AGENT_KCP_WORKERS` is exposed but currently clamped to `1` in `bin/pop-server.php`.
  * KCP/UDP must remain single-worker in the current architecture because KCP session state is process-local and UDP packets for one conv can otherwise be handled by different workers.
  * Native KCP output draining is capped per tick by `LAYLINK_KCP_OUTPUT_DRAIN_PACKETS` to reduce single-flow event-loop monopolization during large downloads.

Known MVP limitations:

* The current sandbox cannot bind TCP sockets; startup smoke tests need escalation or a normal shell environment.
* raw-json debug ingress uses newline-delimited JSON before switching to raw tunnel mode. Example:

```json
{"auth_token":"dev-token","user_id":"admin","target_host":"example.com","target_port":443,"protocol":"tcp"}
```

* No TLS yet.
* No production-grade client identity yet; `dev-token` is hardcoded for MVP development.
* No automated integration test harness yet.
* TCP stream forwarding can now use multiple Agent-to-POP connections per Client Agent, but a single TCP session is still pinned to one POP transport. Binary `DATA` frames, chunking, graceful EOF, and backpressure reduce per-byte overhead and buffer blowups; KCP is experimental and still needs throughput/loss tuning, while multipath and per-session flow-control tuning are future performance work.
* No explicit idle timeout or connect timeout enforcement yet.
* UDP relay is datagram-oriented and currently creates short-lived POP-side UDP sockets per outbound datagram; pooling and stronger timeout accounting are still future work.
* HTTP proxy supports `CONNECT` and ordinary absolute URL HTTP requests; advanced proxy auth and full HTTP/2 proxying are not implemented.

Next recommended tasks:

1. Add a local integration harness that starts POP, Client Agent, and a mock TCP echo target, then verifies authorized tunnel, policy denial, and agent local denial.
2. Add configurable client auth token or JWT-ready auth interface.
3. Add target connect timeout and session idle timeout.
4. Add more detailed buffer overflow audit reasons and metrics.
5. Add README quickstart with exact local commands.
6. Add a reproducible throughput benchmark script for direct-vs-LayLink comparisons.
7. Keep TCP tuning as an ongoing task:
   * benchmark `LAYLINK_DATA_CHUNK_BYTES` at `524288`, `1048576`, `2097152`, and `4194304`
   * benchmark buffer pairs such as `64MiB/32MiB` and `128MiB/64MiB`
   * record direct-vs-LayLink throughput, CPU, memory, and disconnect behavior
8. Benchmark and tune `CLIENT_AGENT_POP_CONNECTIONS` for 1, 2, 4, and 8 long connections under mixed single-download and multi-session workloads.
9. Benchmark native FFI `kcp` against `tcp` under latency, loss, and high-throughput workloads; tune KCP nodelay, window, MTU, resend, interval, UDP queue, and flush settings.
10. Design KCP horizontal scaling before allowing `POP_AGENT_KCP_WORKERS>1`; options include multiple POP ports/instances, reuseport five-tuple affinity, external session state, or a UDP dispatcher keyed by conv.
11. Add raw UDP Agent-to-POP transport only for explicitly datagram-oriented frame classes, or after a reliability/window design exists.
12. Add per-session flow-control windows to reduce head-of-line blocking on one Agent connection.
13. Optimize UDP relay with POP-side UDP socket pooling.
14. Add UDP association idle timeouts and cleanup.
15. Aggregate UDP audit records per association instead of per datagram.
16. Add UDP and per-user rate limiting.

## 0. Project Name

`LayLink`

This project implements a PHP Workerman-based reverse tunnel gateway.

The system allows a Client Agent to establish an outbound persistent framed connection to a POP Server. The POP Server authenticates clients, enforces access policy, selects a route, and forwards authorized TCP streams to public Internet targets or later restricted network zones.

This is **not** a full Layer-3 VPN. It is a policy-controlled Layer-4 reverse access gateway.

---

## 1. Core Architecture

### 1.1 Node Types

The MVP contains two core logical node types:

1. `POP Server`
2. `Client Agent`

### 1.2 Required Topology

```text
Client
  |
  v
POP Server
  |
  +--> Direct public egress
  |
  +--> Client Agent framed access
```

### 1.3 Network Constraints

The Client Agent is located on the client side.

The Client Agent:

* Accepts local or LAN client connections.
* Initiates outbound connections to `popserver1`, for example `10.1.0.2`.
* Wraps client requests and stream data in LayLink frames.

The POP Server:

* Accepts user/client access.
* Maintains persistent connections from Client Agents.
* Performs authentication, authorization, route selection, session management, and auditing.
* Can optionally connect directly to public Internet destinations.

---

## 2. Non-Negotiable Design Principles

### 2.1 POP Server Owns Policy

The POP Server is the only component allowed to make authorization decisions.

Agents must not accept arbitrary user-specified forwarding requests.

Agents only execute explicit `OPEN` instructions issued by the POP Server after authorization.

### 2.2 Agents Are Controlled Executors

Client Agents are controlled executors.

They may:

* Authenticate themselves to the POP Server.
* Maintain heartbeat.
* Accept local client connections on explicitly configured local proxy listeners.
* Send `OPEN` instructions to the POP Server.
* Relay stream data.
* Close sessions.

They must not:

* Expose a public SOCKS5/HTTP proxy unless explicitly configured and protected.
* Make authorization decisions locally.
* Override POP policy.
* Route traffic outside POP authorization.

### 2.3 No Full VPN in MVP

The MVP must not implement TUN/TAP, virtual network interfaces, routing tables, or full Layer-3 VPN behavior.

The MVP only supports authorized TCP stream forwarding.

UDP support may be added later.

---

## 3. MVP Scope

The first implementation must support:

1. POP Server starts a TCP listener for clients.
2. POP Server starts a TCP listener for agents.
3. Client Agent connects outbound to POP Server.
4. Agent authenticates with `node_id` and `node_token`.
5. Client connects to POP Server and requests access to a target.
6. POP Server checks policy.
7. POP Server selects a route.
8. POP Server sends `OPEN` frame to selected Agent.
9. Agent connects to the target service.
10. POP Server relays bidirectional TCP data between client and agent.
11. Session closes cleanly on either side disconnecting.
12. Audit log records the session.

MVP does not need:

* UDP relay.
* Web UI.
* Multi-POP clustering.
* Distributed HA.
* TLS certificate automation.
* SSH command audit.
* Database SQL audit.
* Complex identity provider integration.

---

## 4. Recommended Technology Stack

Language:

```text
PHP 8.2+
```

Framework:

```text
Workerman
```

Recommended packages:

```text
workerman/workerman
monolog/monolog
vlucas/phpdotenv
ramsey/uuid
```

Optional later:

```text
firebase/php-jwt
illuminate/database
react/promise
```

---

## 5. Directory Structure

The project should use the following structure:

```text
pop-tunnel-gateway/
  composer.json
  .env.example
  README.md
  CONTRACT.md

  bin/
    pop-server.php
    client-agent.php

  config/
    routes.php
    nodes.php
    policies.php

  src/
    Protocol/
      Frame.php
      FrameType.php
      FrameCodec.php
      FrameParser.php

    Server/
      PopServer.php
      AgentListener.php

    Agent/
      AgentClient.php
      TargetConnector.php

    Client/
      ClientGateway.php

    Session/
      TunnelSession.php
      SessionManager.php

    Node/
      NodeRegistry.php
      NodeConnection.php

    Auth/
      NodeAuthenticator.php
      ClientAuthenticator.php
      PolicyChecker.php

    Route/
      RouteResolver.php
      RouteDecision.php

    Audit/
      AuditLogger.php

    Util/
      BufferLimiter.php
      LoggerFactory.php
```

---

## 6. Frame Protocol

The system must use a framed protocol between POP Server and Agents.

Raw stream passthrough between POP and Agent is not allowed because the system needs multiplexing, session IDs, heartbeats, error handling, and auditability.

### 6.1 Frame Types

Required frame types:

| Type | Direction | Meaning |
| --- | --- | --- |
| `AUTH` | Client Agent -> POP | Agent authenticates itself with `node_id`, `node_type`, `node_token`, and `transport_protocol`. |
| `AUTH_OK` | POP -> Client Agent | Agent authentication accepted. |
| `AUTH_FAIL` | POP -> Client Agent | Agent authentication rejected; POP closes the connection after sending this frame. |
| `PING` | Client Agent -> POP | Agent heartbeat with active session count, load, and timestamp. |
| `PONG` | POP -> Client Agent | Heartbeat response. |
| `OPEN` | Client Agent -> POP | Client Agent requests POP to authorize and open a target stream. |
| `OPEN_OK` | POP -> Client Agent | POP has connected the target and the stream can begin. |
| `OPEN_FAIL` | POP -> Client Agent | POP rejected or failed the requested target stream. |
| `DATA` | Bidirectional | Stream bytes for one `session_id`; TCP stream payloads use binary frame encoding when both ends are updated. |
| `UDP_DATA` | Bidirectional | UDP datagram bytes for one UDP association; MVP payload uses base64 and includes target metadata. |
| `CLOSE` | Bidirectional | Close one stream session. |
| `ERROR` | Bidirectional | Explicit protocol or session error. |
| `WINDOW` | Bidirectional | Reserved flow-control window update for future backpressure. |

For the new MVP, `OPEN` always means:

```text
Client Agent asks POP Server to connect to the target.
```

It does not mean POP asks Agent to connect to an intranet target. That older direction is reserved for a later executor-agent mode.

### 6.2 Frame Fields

Each frame must contain:

```text
version
type
session_id
payload_length
payload
```

Control frames use JSON payloads. TCP stream `DATA` frames may use the binary DATA encoding below.

### 6.3 Frame Encoding

For control frames, use length-prefixed JSON frames.

Format:

```text
uint32_be length
json_payload
```

Example decoded control frame:

```json
{
  "version": 1,
  "type": "OPEN",
  "session_id": "018f6f4a-xxxx-xxxx",
  "payload": {
    "target_host": "example.com",
    "target_port": 443,
    "protocol": "tcp"
  }
}
```

TCP stream `DATA` frames use a binary body before optional encryption:

```text
uint32_be encrypted_or_plain_body_length
"LLB1"
uint8 binary_type=1
uint16_be session_id_length
session_id bytes
raw TCP payload bytes
```

Legacy JSON/base64 `DATA` decoding remains accepted for compatibility, but updated senders should emit binary `DATA`.

---

## 7. Agent Authentication

When an Agent connects to POP Server, it must immediately send an `AUTH` frame.

Example:

```json
{
  "version": 1,
  "type": "AUTH",
  "session_id": null,
  "payload": {
    "node_id": "client-01",
    "node_type": "client",
    "node_zone": "corp",
    "node_token": "CHANGE_ME",
    "supported_protocols": ["tcp"]
  }
}
```

POP Server must verify:

```text
node_id exists
node_token matches
node_type matches config
node is not disabled
```

On success:

```json
{
  "version": 1,
  "type": "AUTH_OK",
  "session_id": null,
  "payload": {
    "node_id": "client-01",
    "heartbeat_interval": 10
  }
}
```

On failure:

```json
{
  "version": 1,
  "type": "AUTH_FAIL",
  "session_id": null,
  "payload": {
    "reason": "invalid_node_token"
  }
}
```

POP Server must close the connection after `AUTH_FAIL`.

---

## 8. Heartbeat

Agents must send `PING` every 10 seconds by default.

Example:

```json
{
  "version": 1,
  "type": "PING",
  "session_id": null,
  "payload": {
    "node_id": "client-01",
    "active_sessions": 12,
    "load": 0.35,
    "timestamp": 1710000000
  }
}
```

POP Server replies:

```json
{
  "version": 1,
  "type": "PONG",
  "session_id": null,
  "payload": {
    "timestamp": 1710000001
  }
}
```

If no heartbeat is received for 30 seconds, POP Server must mark the node as offline and close all sessions routed through that node.

---

## 9. Client Request Model

For MVP, the client may connect directly to a POP TCP listener and submit an initial JSON request.

Example:

```json
{
  "auth_token": "dev-token",
  "target_host": "192.168.10.20",
  "target_port": 22,
  "protocol": "tcp",
  "route_hint": "client-01"
}
```

After the initial request is accepted, the TCP stream becomes a bidirectional tunnel.

Later versions may implement:

```text
SOCKS5
HTTP CONNECT
WebSocket tunnel
mTLS client identity
JWT authentication
```

---

## 10. Route Selection

The POP Server must use `RouteResolver` to decide where traffic should go.

Possible route types:

```text
direct
agent
border
reject
```

Example route decision:

```php
[
    'allowed' => true,
    'route_type' => 'agent',
    'node_id' => 'client-01',
    'policy_id' => 'corp-ssh-admin',
]
```

Client `route_hint` is advisory only.

The POP Server may ignore, override, or reject the route hint.

---

## 11. Policy Rules

Policies must be defined in `config/policies.php`.

Example:

```php
return [
    [
        'policy_id' => 'corp-ssh-admin',
        'users' => ['admin', 'devops'],
        'target_hosts' => ['192.168.10.20', '192.168.10.21'],
        'target_ports' => [22],
        'route_type' => 'agent',
        'node_id' => 'client-01',
        'enabled' => true,
    ],
    [
        'policy_id' => 'public-web-egress',
        'users' => ['normal-user', 'admin'],
        'target_hosts' => ['*'],
        'target_ports' => [80, 443, '8080-10080'],
        'route_type' => 'direct',
        'enabled' => true,
    ],
];
```

Policy matching must consider:

```text
user identity
target host
target port
protocol
requested route
node availability
policy enabled/disabled state
```

Default behavior must be deny.

---

## 12. Agent Local Allowlist

Each Agent must enforce its own local allowlist.

Example `config/nodes.php`:

```php
return [
    'client-01' => [
        'node_type' => 'client',
        'token' => 'CHANGE_ME',
        'allowed_cidrs' => [
            '192.168.0.0/16',
            '10.10.0.0/16',
        ],
        'allowed_ports' => [22, 80, 443, '8080-10080', 3306, 5432],
        'enabled' => true,
    ],
];
```

If an Agent receives an `OPEN` request outside its local allowlist, it must return `OPEN_FAIL`.

Example:

```json
{
  "version": 1,
  "type": "OPEN_FAIL",
  "session_id": "018f6f4a-xxxx",
  "payload": {
    "reason": "agent_local_policy_denied"
  }
}
```

---

## 13. Opening a Target Connection

POP Server sends:

```json
{
  "version": 1,
  "type": "OPEN",
  "session_id": "018f6f4a-xxxx",
  "payload": {
    "target_host": "192.168.10.20",
    "target_port": 22,
    "protocol": "tcp",
    "user_id": "admin",
    "policy_id": "corp-ssh-admin"
  }
}
```

Agent connects to the target.

On success:

```json
{
  "version": 1,
  "type": "OPEN_OK",
  "session_id": "018f6f4a-xxxx",
  "payload": {
    "target_host": "192.168.10.20",
    "target_port": 22
  }
}
```

On failure:

```json
{
  "version": 1,
  "type": "OPEN_FAIL",
  "session_id": "018f6f4a-xxxx",
  "payload": {
    "reason": "connection_refused"
  }
}
```

---

## 14. Data Forwarding

After `OPEN_OK`, data is exchanged with `DATA` frames.

Updated implementations encode TCP `DATA` as the binary frame described in section 6.3. Legacy JSON/base64 `DATA` frames may still be decoded during rolling upgrades.

Both POP Server and Agent must map `session_id` to the corresponding local TCP connection.

---

## 15. Session Lifecycle

Session states:

```text
NEW
OPENING
OPEN
CLOSING
CLOSED
FAILED
```

State transitions:

```text
NEW -> OPENING
OPENING -> OPEN
OPENING -> FAILED
OPEN -> CLOSING
CLOSING -> CLOSED
OPEN -> CLOSED
```

A session must be closed when:

```text
client disconnects
target disconnects
agent disconnects
policy check fails
OPEN_FAIL is received
send buffer exceeds hard limit
idle timeout is reached
```

---

## 16. Timeouts

Required timeout defaults:

```text
agent heartbeat interval: 10 seconds
agent offline threshold: 30 seconds
target connect timeout: 5 seconds
session idle timeout: 300 seconds
client initial request timeout: 5 seconds
```

All timeout values should be configurable.

---

## 17. Buffer and Backpressure

The implementation must avoid unbounded memory growth.

Each Workerman connection should configure a maximum send buffer.

Suggested default:

```php
$connection->maxSendBufferSize = 8 * 1024 * 1024;
```

The implementation must handle:

```text
onBufferFull
onBufferDrain
onClose
onError
```

If a session exceeds buffer limits and cannot recover, close the session and write an audit log entry.

---

## 18. Audit Logging

Every session must produce an audit log.

Required fields:

```text
session_id
user_id
source_ip
target_host
target_port
protocol
route_type
node_id
policy_id
start_time
end_time
duration_ms
bytes_client_to_target
bytes_target_to_client
result
failure_reason
```

MVP may write JSON lines to a local file:

```text
runtime/audit.log
```

Example:

```json
{
  "session_id": "018f6f4a-xxxx",
  "user_id": "admin",
  "source_ip": "1.2.3.4",
  "target_host": "192.168.10.20",
  "target_port": 22,
  "protocol": "tcp",
  "route_type": "agent",
  "node_id": "client-01",
  "policy_id": "corp-ssh-admin",
  "start_time": "2026-05-28T10:00:00+08:00",
  "end_time": "2026-05-28T10:01:00+08:00",
  "duration_ms": 60000,
  "bytes_client_to_target": 1024,
  "bytes_target_to_client": 2048,
  "result": "closed",
  "failure_reason": null
}
```

---

## 19. Error Handling

Errors must be explicit.

Required error reasons include:

```text
invalid_frame
invalid_auth
node_not_found
node_offline
policy_denied
route_not_found
target_connect_timeout
target_connection_refused
agent_local_policy_denied
session_not_found
buffer_overflow
protocol_not_supported
internal_error
```

Do not silently drop sessions without logging.

---

## 20. Security Requirements

### 20.1 Required for MVP

* Node token authentication.
* Default-deny policy.
* Agent local allowlist.
* Audit logging.
* Explicit route decision.
* No arbitrary target access.
* No unauthenticated Agent registration.
* No unauthenticated client request in production mode.

### 20.2 Required Before Production

* TLS between Client and POP.
* TLS between Agent and POP.
* Strong node credentials.
* Rotatable node tokens.
* JWT or mTLS client authentication.
* Per-user policy.
* Rate limiting.
* Session concurrency limits.
* Structured audit storage.
* Log redaction for secrets.

---

## 21. Configuration

`.env.example`:

```env
APP_ENV=dev
POP_AGENT_LISTEN=0.0.0.0:9001
NODE_ID=client-01
NODE_TYPE=client
NODE_TOKEN=CHANGE_ME
POP_SERVER_ADDRESS=tcp://10.1.0.2:9001
AUDIT_LOG=runtime/audit.log
LOG_LEVEL=debug
```

---

## 22. CLI Entrypoints

### 22.1 Start POP Server

```bash
php bin/pop-server.php start
```

### 22.2 Start Client Agent

```bash
php bin/client-agent.php start
```

### 22.3 Stop Services

```bash
php bin/pop-server.php stop
php bin/client-agent.php stop
```

---

## 23. MVP Acceptance Tests

The implementation is acceptable only if the following tests pass.

### 23.1 Agent Registration

Given a valid node token, Client Agent connects to POP Server and becomes online.

Expected result:

```text
NodeRegistry contains client-01 as online.
```

### 23.2 Invalid Agent Rejected

Given an invalid node token, POP Server returns `AUTH_FAIL` and closes the connection.

Expected result:

```text
Node is not registered.
Audit/security log records invalid_auth.
```

### 23.3 Authorized TCP Tunnel

Given:

```text
Client requests 192.168.10.20:22
User is allowed by policy
client-01 is online
```

Expected result:

```text
POP sends OPEN to client-01.
Agent connects to 192.168.10.20:22.
Client can exchange TCP data with target.
Audit log records success.
```

### 23.4 Policy Denial

Given:

```text
Client requests 192.168.99.99:22
No policy allows this target
```

Expected result:

```text
POP rejects the request.
No OPEN frame is sent to Agent.
Audit log records policy_denied.
```

### 23.5 Agent Local Denial

Given:

```text
POP sends OPEN to a target outside Agent local allowlist
```

Expected result:

```text
Agent returns OPEN_FAIL with agent_local_policy_denied.
Session is closed.
Audit log records failure.
```

### 23.6 Agent Offline

Given:

```text
client-01 is offline
Client requests route through client-01
```

Expected result:

```text
POP rejects request with node_offline.
```

### 23.7 Clean Close

Given:

```text
Client closes connection
```

Expected result:

```text
POP sends CLOSE to Agent.
Agent closes target connection.
SessionManager removes session.
Audit log records closed.
```

### 23.8 Target Connection Failure

Given:

```text
Target host or port is unreachable
```

Expected result:

```text
Agent sends OPEN_FAIL.
POP closes client connection.
Audit log records target connection failure.
```

---

## 24. Implementation Priority

Implement in this order:

1. `Frame`, `FrameCodec`, `FrameParser`
2. `NodeAuthenticator`
3. `NodeRegistry`
4. Agent connection listener on POP Server
5. Client Agent outbound connection
6. Heartbeat
7. Client listener
8. Policy checker
9. Route resolver
10. Session manager
11. Agent target connector
12. DATA forwarding
13. CLOSE handling
14. Audit logger
15. Basic tests

Do not implement UDP transport, KCP transport, Web UI, or clustering before the MVP TCP-framed proxy path is stable.

---

## 25. Coding Rules

* Use strict types where possible.
* Keep protocol parsing separate from business logic.
* Do not mix policy checking with socket forwarding.
* Do not use global mutable arrays except Workerman bootstrap state if unavoidable.
* All session IDs must be unique.
* Every network error must be logged.
* Every rejected access must be auditable.
* All config values must be externalized.
* The system must run on Linux.
* The code must be readable and modular, not a single giant script.

---

## 26. Deliverables

Codex should generate:

```text
composer.json
.env.example
README.md
bin/pop-server.php
bin/client-agent.php
config/routes.php
config/nodes.php
config/policies.php
src/**/*.php
```

The generated code must be runnable with:

```bash
composer install
php bin/pop-server.php start
php bin/client-agent.php start
```

---

## 27. Out of Scope for First Version

The following features must not be implemented in the first version unless explicitly requested later:

```text
TUN/TAP VPN
Layer-3 routing
Kernel packet capture
UDP relay
QUIC
Web dashboard
Clustered POP Server
Redis-based session sharing
TLS certificate automation
SSH command recording
Database SQL auditing
Browser-based remote desktop
```

---

## 28. Final Goal

The final MVP should prove this flow:

```text
Client
  -> POP Server
  -> policy check
  -> Client Agent
  -> internal TCP service
```

The POP Server must remain the only policy authority.

Agents must remain controlled executors.

Default access must always be denied unless a policy explicitly allows it.