This commit is contained in:
Ziki Shay 2026-05-08 00:05:51 +08:00
parent 549b706fcc
commit ed70a140a2
40 changed files with 5590 additions and 36 deletions

3
.env
View File

@ -7,6 +7,7 @@ LLM_METADATA_ENABLED="true"
LLM_METADATA_MODEL="glm-4.7-flash"
LLM_METADATA_MAX_TOKENS=2480
LLM_METADATA_TEMPERATURE=0.1
OPENSEARCH_HOST="http://localhost:9200"
OPENSEARCH_HOST="https://localhost:9200"
OPENSEARCH_USERNAME="admin"
OPENSEARCH_PASSWORD="proofdb"
ARCHIVE_CASK_URL="https://archive-cask.example.com"

1
.version Normal file
View File

@ -0,0 +1 @@
0.1.0

49
apidoc/README.md Normal file
View File

@ -0,0 +1,49 @@
# API 文档总览
当前 `apidoc/` 中的文档按接口域拆分:
- [importapi.md](/www/proofdb/apidoc/importapi.md): 档案导入接口
- [adminapi.md](/www/proofdb/apidoc/adminapi.md): 管理员认证与后台维护接口
- [searchapi.md](/www/proofdb/apidoc/searchapi.md): 全文、向量、混合搜索接口
- [evidenceapi.md](/www/proofdb/apidoc/evidenceapi.md): chunk 详情与 evidence 接口
## 当前已实现接口
```http
POST /api/articles/import
POST /api/admin/login
POST /api/admin/logout
GET /api/admin/me
GET /api/admin/archives
GET /api/admin/archives/{archive_uid}
PATCH /api/admin/archives/{archive_uid}
DELETE /api/admin/archives/{archive_uid}
GET /api/admin/opensearch/status
GET /api/admin/opensearch/documents
GET /api/admin/users
POST /api/admin/users
PATCH /api/admin/users/{id}
GET /api/admin/docs
GET /api/admin/docs/{name}
GET /api/admin/scripts
GET /api/admin/scripts/{name}
POST /api/admin/scripts/run
POST /api/search/fulltext
POST /api/search/vector
POST /api/search/hybrid
GET /api/chunks/{chunk_uid}
GET /api/evidence/{chunk_uid}
```
## 当前接口分层
- 导入层:把 Markdown 档案解析为 archive / chunk并写入 PostgreSQL。
- 管理层管理员登录、会话识别、archives 表管理、OpenSearch 状态、用户管理、文档查看与维护脚本执行。
- 检索层:从 OpenSearch 做 BM25、向量和 hybrid 检索。
- 证据层:把 `chunk_uid` 落到 citation、页码和证据正文。
## 说明
- 搜索接口中的 `hits` 始终表示“当前请求下返回的候选结果数组”,不是数据库全量导出。
- `fulltext`、`vector`、`hybrid` 都支持 `limit`
- `hybrid``total` 表示融合后的候选总数;更细的来源统计在 `sources` 字段中。

355
apidoc/adminapi.md Normal file
View File

@ -0,0 +1,355 @@
# 管理员后台 API
## 接口说明
这组接口服务于 Proof DB 的管理员维护面板,包括:
- 管理员登录与会话读取
- `archives` 表管理
- OpenSearch 状态查看
- 管理员用户管理
- APIDOC 文档查看
- 维护脚本执行
管理员网页入口仍然是:
- `GET /`
- `GET /admin/login`
- `GET /admin`
## 管理员认证
### 管理员登录
```http
POST /api/admin/login
```
`Content-Type: application/json`
| 字段 | 类型 | 必填 | 说明 |
| --- | --- | --- | --- |
| `username` | string | 是 | 管理员用户名 |
| `password` | string | 是 | 管理员密码 |
### 管理员退出登录
```http
POST /api/admin/logout
```
### 当前管理员会话
```http
GET /api/admin/me
```
未登录时返回:
```json
{
"code": 401,
"message": "Admin session not found."
}
```
## archives 表管理
### 获取档案列表
```http
GET /api/admin/archives
```
### 查询参数
| 字段 | 类型 | 必填 | 说明 |
| --- | --- | --- | --- |
| `query` | string | 否 | 按 `archive_uid`、`title`、`summary`、`author`、`source`、`series` 模糊搜索 |
| `page` | integer | 否 | 页码,默认 `1` |
| `page_size` | integer | 否 | 每页条数,默认 `20`,最大 `100` |
### 请求示例
```bash
curl '<APIdomain>/api/admin/archives?query=iraq&page=1&page_size=20'
```
### 成功响应
```json
{
"code": 0,
"message": "Archive list loaded.",
"data": {
"items": [
{
"archive_uid": "01KQHVREB6XPYF604RVZAP9NNY",
"title": "1.test",
"summary": "....",
"year": 1991,
"author": "....",
"source": "....",
"series": null,
"tags": ["Iraq", "Kuwait"],
"chunk_count": 14,
"created_time": "2026-05-07 12:00:00+00",
"updated_time": "2026-05-07 12:10:00+00"
}
],
"total": 1,
"page": 1,
"page_size": 20
}
}
```
### 获取单条档案详情
```http
GET /api/admin/archives/{archive_uid}
```
### 更新单条档案
```http
PATCH /api/admin/archives/{archive_uid}
```
`Content-Type: application/json`
可更新字段:
- `title`
- `summary`
- `year`
- `author`
- `source`
- `series`
- `tags`
- `metadata`
- `content`
- `raw`
其中:
- `tags` 可以传字符串,也可以传数组;字符串会按逗号或换行拆分
- `metadata` 可以传 JSON 对象,也可以传 JSON 字符串
- `year` 为空时会写回 `null`
### 更新请求示例
```bash
curl -X PATCH <APIdomain>/api/admin/archives/01KQHVREB6XPYF604RVZAP9NNY \
-H 'Content-Type: application/json' \
--data '{
"title": "Updated Title",
"summary": "Updated summary",
"year": 1991,
"tags": ["Iraq", "Kuwait"],
"metadata": {
"reviewed_by": "admin"
}
}'
```
### 删除单条档案
```http
DELETE /api/admin/archives/{archive_uid}
```
删除后会因外键约束级联删除对应 `chunks` 记录。
## OpenSearch 状态查看
### 获取 OpenSearch 管理状态
```http
GET /api/admin/opensearch/status
```
### 成功响应要点
响应中会同时返回:
- OpenSearch 连接配置摘要
- PostgreSQL 侧 `archives/chunks` 数量
- `embedded_chunks`
- `indexed_chunks`
- 当前索引是否存在
- `docs_count`
- cluster 健康状态
- mapping 字段列表
如果 OpenSearch 当前不可达,仍会返回数据库部分统计,但 `opensearch.error` 会带出错误信息。
### 获取 OpenSearch 索引文档粗览
```http
GET /api/admin/opensearch/documents
```
### 查询参数
| 字段 | 类型 | 必填 | 说明 |
| --- | --- | --- | --- |
| `query` | string | 否 | 按 `title`、`summary`、`source`、`author`、`text` 等字段做粗略搜索 |
| `size` | integer | 否 | 返回条数,默认 `20`,最大 `50` |
说明:
- 这是索引粗览接口,不返回向量字段本身。
- 返回中会包含 `text_preview`,用于后台快速检查索引内容是否正确进入 OpenSearch。
## 管理员用户管理
### 获取管理员用户列表
```http
GET /api/admin/users
```
### 创建管理员用户
```http
POST /api/admin/users
```
`Content-Type: application/json`
| 字段 | 类型 | 必填 | 说明 |
| --- | --- | --- | --- |
| `username` | string | 是 | 新管理员用户名 |
| `password` | string | 是 | 新管理员密码 |
| `display_name` | string | 否 | 展示名称 |
### 更新管理员用户
```http
PATCH /api/admin/users/{id}
```
`Content-Type: application/json`
可更新字段:
- `display_name`
- `password`
- `is_active`
说明:
- `password` 为空时表示不修改
- `is_active=false` 后,该账号将不能再登录
## APIDOC 查看
### 获取文档列表
```http
GET /api/admin/docs
```
返回 `/apidoc` 目录下当前可查看的 `.md` 文档列表。
### 获取单份文档内容
```http
GET /api/admin/docs/{name}
```
例如:
```bash
curl <APIdomain>/api/admin/docs/searchapi.md
```
响应中会带:
- `name`
- `title`
- `content`
- `html`
其中:
- `content` 为原始 Markdown 文本
- `html` 为后台面板可直接渲染的 HTML
## 维护脚本伪终端
### 获取白名单脚本列表
```http
GET /api/admin/scripts
```
当前返回的是允许在管理员面板里执行的 `scripts/*.php` 白名单,而不是任意文件系统扫描。
如果对应脚本在 `/scriptdoc` 中存在同名文档,列表接口也会带出:
- `doc_title`
- `doc_html`
- `doc_content`
### 获取单个维护脚本详情
```http
GET /api/admin/scripts/{name}
```
这个接口会返回单个脚本的说明、参数提示,以及可用的脚本文档内容。
### 执行维护脚本
```http
POST /api/admin/scripts/run
```
`Content-Type: application/json`
| 字段 | 类型 | 必填 | 说明 |
| --- | --- | --- | --- |
| `script_name` | string | 是 | 白名单脚本名,如 `reindex_opensearch` |
| `args` | string[] | 否 | 参数数组,仅允许 `--key=value` 风格 |
### 请求示例
```bash
curl -X POST <APIdomain>/api/admin/scripts/run \
-H 'Content-Type: application/json' \
--data '{
"script_name": "reindex_opensearch",
"args": ["--archive_uid=01KQHVREB6XPYF604RVZAP9NNY"]
}'
```
### 成功响应
```json
{
"code": 0,
"message": "Maintenance script finished.",
"data": {
"script_name": "reindex_opensearch",
"command": [
"php",
"scripts/reindex_opensearch.php",
"--archive_uid=01KQHVREB6XPYF604RVZAP9NNY"
],
"exit_code": 0,
"stdout": "....",
"stderr": "",
"ok": true
}
}
```
## 权限与错误语义
- 除 `POST /api/admin/login` 外,本文件中的所有接口都要求已有管理员会话。
- 未登录时统一返回 `401`
- 参数不合法时通常返回 `422`
- JSON 格式错误时返回 `400`
- 后端异常时返回 `500`

382
apidoc/evidenceapi.md Normal file
View File

@ -0,0 +1,382 @@
# Chunk 与 Evidence API
## 接口说明
这组接口用于把搜索结果落到可读的证据对象。
- `GET /api/archives/{archive_uid}` 返回 archive 级详情。
- `GET /api/archives/{archive_uid}/chunks` 返回该 archive 下的 chunk 列表。
- `GET /api/archives/{archive_uid}/evidence` 返回该 archive 下适合引用/AI 消费的证据列表。
- `GET /api/chunks/{chunk_uid}` 偏底层,返回 chunk 详情和所属 archive 信息。
- `GET /api/evidence/{chunk_uid}` 偏引用与展示,返回 citation、页码标签和证据正文。
其中 archive 接口以 `archive_uid` 为主键,另外两者以 `chunk_uid` 为主键。
## Archive 详情
```http
GET /api/archives/{archive_uid}
```
### 请求示例
```bash
curl <APIdomain>/api/archives/01KQHVREB6XPYF604RVZAP9NNY
```
### 成功响应
状态码:
```http
200 OK
```
响应示例:
```json
{
"code": 0,
"message": "Archive loaded.",
"data": {
"archive_uid": "01KQHVREB6XPYF604RVZAP9NNY",
"title": "1.test",
"summary": "This directive, signed by Brent Scowcroft ...",
"year": 1992,
"author": "Brent Scowcroft",
"source": "test/1.test.md",
"series": null,
"tags": ["National Security", "Policy"],
"metadata": {
"ai_enrichment": {
"provider": "bigmodel"
}
},
"content": "full normalized archive content ...",
"raw": "# 1.test ...",
"chunks": [
"01KQHVREB6XPYF604RVZAP9NNY_1_39003",
"01KQHVREB6XPYF604RVZAP9NNY_2_12345"
],
"chunk_count": 14
}
}
```
说明:
- `content` 是归一化后的 archive 正文。
- `raw` 是导入时保存的原始 Markdown。
- `chunks` 是当前 archive 关联的 `chunk_uid` 列表。
- `chunk_count` 方便调用方快速判断档案规模,而不必自己数数组长度。
### 错误响应
#### archive 不存在
状态码:
```http
404 Not Found
```
```json
{
"code": 404,
"message": "Archive not found.",
"errors": {
"archive_uid": "missing_archive_uid"
}
}
```
## Archive 下的 Chunk 列表
```http
GET /api/archives/{archive_uid}/chunks
```
### 请求示例
```bash
curl <APIdomain>/api/archives/01KQHVREB6XPYF604RVZAP9NNY/chunks
```
### 成功响应
状态码:
```http
200 OK
```
响应示例:
```json
{
"code": 0,
"message": "Archive chunks loaded.",
"data": {
"archive_uid": "01KQHVREB6XPYF604RVZAP9NNY",
"title": "1.test",
"summary": "This directive, signed by Brent Scowcroft ...",
"source": "test/1.test.md",
"author": "Brent Scowcroft",
"year": 1992,
"series": null,
"tags": ["National Security", "Policy"],
"chunk_count": 14,
"chunks": [
{
"chunk_uid": "01KQHVREB6XPYF604RVZAP9NNY_1_39003",
"archive_uid": "01KQHVREB6XPYF604RVZAP9NNY",
"chunk_index": 1,
"page_start": 1,
"page_end": 1,
"pages": [1],
"text": "chunk text...",
"length": 300,
"embedding_status": 3,
"embedding_ref": {
"provider": "bigmodel",
"model": "embedding-3",
"dimensions": 2048
},
"embedding_model": "embedding-3",
"embedding_error": null,
"search_index_status": 3,
"search_index_error": null,
"archive": {
"archive_uid": "01KQHVREB6XPYF604RVZAP9NNY",
"title": "1.test",
"summary": "This directive, signed by Brent Scowcroft ...",
"year": 1992,
"author": "Brent Scowcroft",
"source": "test/1.test.md",
"series": null,
"tags": ["National Security", "Policy"],
"metadata": {}
}
}
]
}
}
```
说明:
- 这个接口偏底层,适合按 archive 批量读取完整 chunk 数据。
- `chunks``chunk_index` 升序返回。
## Archive 级 Evidence 列表
```http
GET /api/archives/{archive_uid}/evidence
```
### 请求示例
```bash
curl <APIdomain>/api/archives/01KQHVREB6XPYF604RVZAP9NNY/evidence
```
### 成功响应
状态码:
```http
200 OK
```
响应示例:
```json
{
"code": 0,
"message": "Archive evidence loaded.",
"data": {
"archive_uid": "01KQHVREB6XPYF604RVZAP9NNY",
"title": "1.test",
"summary": "This directive, signed by Brent Scowcroft ...",
"source": "test/1.test.md",
"author": "Brent Scowcroft",
"year": 1992,
"series": null,
"tags": ["National Security", "Policy"],
"chunk_count": 14,
"evidence": [
{
"chunk_uid": "01KQHVREB6XPYF604RVZAP9NNY_1_39003",
"chunk_index": 1,
"page_start": 1,
"page_end": 1,
"pages": [1],
"page_label": "p. 1",
"citation": "1.test | Brent Scowcroft | 1992 | p. 1 | test/1.test.md",
"quote": "chunk text...",
"length": 300,
"embedding_model": "embedding-3",
"embedding_status": 3,
"search_index_status": 3
}
]
}
}
```
说明:
- 这个接口偏上层,适合 AI、RAG、引用构造和前端证据列表展示。
- `evidence` 里的每一项都保留了 citation 所需的页码和引用文本。
## Chunk 详情
```http
GET /api/chunks/{chunk_uid}
```
### 请求示例
```bash
curl <APIdomain>/api/chunks/01KQHVREB6XPYF604RVZAP9NNY_14_97554
```
### 成功响应
状态码:
```http
200 OK
```
响应示例:
```json
{
"code": 0,
"message": "Chunk loaded.",
"data": {
"chunk_uid": "01KQHVREB6XPYF604RVZAP9NNY_14_97554",
"archive_uid": "01KQHVREB6XPYF604RVZAP9NNY",
"chunk_index": 14,
"page_start": 8,
"page_end": 8,
"pages": [8],
"text": "NSD 45 20 AUG 90 U.S. Policy in Response to the Iraqi Invasion of Kuwait ...",
"length": 148,
"embedding_status": 3,
"embedding_ref": {
"provider": "bigmodel",
"model": "embedding-3",
"dimensions": 2048
},
"embedding_model": "embedding-3",
"embedding_error": null,
"search_index_status": 3,
"search_index_error": null,
"archive": {
"archive_uid": "01KQHVREB6XPYF604RVZAP9NNY",
"title": "1.test",
"summary": null,
"year": 1992,
"author": "Brent Scowcroft",
"source": "test/1.test.md",
"series": null,
"tags": [],
"metadata": {}
}
}
}
```
### 错误响应
#### chunk 不存在
状态码:
```http
404 Not Found
```
```json
{
"code": 404,
"message": "Chunk not found.",
"errors": {
"chunk_uid": "missing_chunk_uid"
}
}
```
## Evidence 详情
```http
GET /api/evidence/{chunk_uid}
```
### 请求示例
```bash
curl <APIdomain>/api/evidence/01KQHVREB6XPYF604RVZAP9NNY_14_97554
```
### 成功响应
状态码:
```http
200 OK
```
响应示例:
```json
{
"code": 0,
"message": "Evidence loaded.",
"data": {
"chunk_uid": "01KQHVREB6XPYF604RVZAP9NNY_14_97554",
"archive_uid": "01KQHVREB6XPYF604RVZAP9NNY",
"title": "1.test",
"source": "test/1.test.md",
"author": "Brent Scowcroft",
"year": 1992,
"series": null,
"tags": [],
"page_start": 8,
"page_end": 8,
"pages": [8],
"page_label": "p. 8",
"citation": "1.test | Brent Scowcroft | 1992 | p. 8 | test/1.test.md",
"quote": "NSD 45 20 AUG 90 U.S. Policy in Response to the Iraqi Invasion of Kuwait ...",
"chunk": {
"chunk_index": 14,
"length": 148,
"embedding_model": "embedding-3",
"embedding_status": 3,
"search_index_status": 3
}
}
}
```
### 错误响应
#### evidence 不存在
状态码:
```http
404 Not Found
```
```json
{
"code": 404,
"message": "Evidence not found.",
"errors": {
"chunk_uid": "missing_chunk_uid"
}
}
```

View File

@ -90,7 +90,7 @@ POST /api/articles/import
## 请求示例
```bash
curl -X POST http://127.0.0.1:8787/api/articles/import \
curl -X POST <APIdomain>/api/articles/import \
-F 'title=NSD 76 Disposition of NSC Policy Documents' \
-F 'source=archive://nsc/nsd-76' \
-F 'chunk_size=800' \
@ -101,7 +101,7 @@ curl -X POST http://127.0.0.1:8787/api/articles/import \
也可以直接发送 Markdown 原文:
```bash
curl -X POST 'http://127.0.0.1:8787/api/articles/import?title=NSD%2076&source=archive://nsc/nsd-76' \
curl -X POST '<APIdomain>/api/articles/import?title=NSD%2076&source=archive://nsc/nsd-76' \
-H 'Content-Type: text/markdown' \
--data-binary '@test/1.test.md'
```
@ -109,7 +109,7 @@ curl -X POST 'http://127.0.0.1:8787/api/articles/import?title=NSD%2076&source=ar
JSON 调用示例:
```bash
curl -X POST http://127.0.0.1:8787/api/articles/import \
curl -X POST <APIdomain>/api/articles/import \
-H 'Content-Type: application/json' \
--data '{
"title": "NSD 76 Disposition of NSC Policy Documents",

View File

@ -7,6 +7,7 @@ Proof DB 的搜索接口基于 OpenSearch `proofdb_chunks` 索引。当前版本
OpenSearch 中每个 chunk 文档同时包含:
- `text` 等全文字段,用于 BM25 检索。
- `summary` 档案摘要字段,会参与全文检索,也会随搜索结果一起返回。
- `embedding` 2048 维向量字段,用于后续 vector / hybrid 检索。
## 全文搜索
@ -35,7 +36,7 @@ POST /api/search/fulltext
### 请求示例
```bash
curl -X POST http://127.0.0.1:8787/api/search/fulltext \
curl -X POST <APIdomain>/api/search/fulltext \
-H 'Content-Type: application/json' \
--data '{
"query": "policy documents",
@ -46,7 +47,7 @@ curl -X POST http://127.0.0.1:8787/api/search/fulltext \
带过滤条件:
```bash
curl -X POST http://127.0.0.1:8787/api/search/fulltext \
curl -X POST <APIdomain>/api/search/fulltext \
-H 'Content-Type: application/json' \
--data '{
"query": "Iraq Kuwait",
@ -87,6 +88,7 @@ curl -X POST http://127.0.0.1:8787/api/search/fulltext \
"page_start": 1,
"page_end": 1,
"title": "NSD 76 Disposition of NSC Policy Documents",
"summary": "Summary text...",
"source": "archive://nsc/nsd-76",
"author": "Brent Scowcroft",
"year": 1992,
@ -101,6 +103,12 @@ curl -X POST http://127.0.0.1:8787/api/search/fulltext \
}
```
说明:
- `hits` 是当前返回的结果数组。
- `total` 是当前 full-text 查询下的命中总数。
- 全文搜索当前会综合匹配 `text`、`title`、`summary`、`source`、`author`、`series`、`tags`。
### 错误响应
#### JSON 格式错误
@ -157,8 +165,6 @@ curl -X POST http://127.0.0.1:8787/api/search/fulltext \
}
```
## 后续接口
## 向量搜索
```http
@ -179,7 +185,7 @@ POST /api/search/vector
### 请求示例
```bash
curl -X POST http://127.0.0.1:8787/api/search/vector \
curl -X POST <APIdomain>/api/search/vector \
-H 'Content-Type: application/json' \
--data '{
"query": "Iraq invasion and Desert Storm",
@ -191,7 +197,7 @@ curl -X POST http://127.0.0.1:8787/api/search/vector \
中文 query 也可以提交给向量搜索:
```bash
curl -X POST http://127.0.0.1:8787/api/search/vector \
curl -X POST <APIdomain>/api/search/vector \
-H 'Content-Type: application/json' \
--data '{
"query": "伊拉克入侵科威特与沙漠风暴",
@ -231,6 +237,7 @@ curl -X POST http://127.0.0.1:8787/api/search/vector \
"page_start": 8,
"page_end": 8,
"title": "NSD 76 Disposition of NSC Policy Documents",
"summary": "Summary text...",
"source": "archive://nsc/nsd-76",
"author": "Brent Scowcroft",
"year": 1992,
@ -246,6 +253,12 @@ curl -X POST http://127.0.0.1:8787/api/search/vector \
```
说明:
- `hits` 是当前返回的结果数组。
- `total` 是当前 vector 查询返回的候选总数。
- `embedding_dimensions` 是本次 query embedding 的维度,而不是索引总维度统计字段。
### 错误响应
错误响应格式与全文搜索一致。常见错误包括:
@ -254,8 +267,6 @@ curl -X POST http://127.0.0.1:8787/api/search/vector \
- 缺少 `query``422 Unprocessable Entity`
- embedding API 或 OpenSearch 查询失败:`500 Internal Server Error`
## 后续接口
## 混合搜索
```http
@ -290,7 +301,7 @@ POST /api/search/hybrid
### 请求示例
```bash
curl -X POST http://127.0.0.1:8787/api/search/hybrid \
curl -X POST <APIdomain>/api/search/hybrid \
-H 'Content-Type: application/json' \
--data '{
"query": "Iraq invasion and Desert Storm",
@ -302,7 +313,7 @@ curl -X POST http://127.0.0.1:8787/api/search/hybrid \
中文 query
```bash
curl -X POST http://127.0.0.1:8787/api/search/hybrid \
curl -X POST <APIdomain>/api/search/hybrid \
-H 'Content-Type: application/json' \
--data '{
"query": "伊拉克入侵科威特与沙漠风暴",
@ -370,6 +381,7 @@ curl -X POST http://127.0.0.1:8787/api/search/hybrid \
"archive_uid": "01KQHVREB6XPYF604RVZAP9NNY",
"page_start": 8,
"page_end": 8,
"summary": "Summary text...",
"text": "chunk text..."
}
]
@ -377,6 +389,14 @@ curl -X POST http://127.0.0.1:8787/api/search/hybrid \
}
```
说明:
- `hits` 是融合排序后的结果数组。
- `total` 是融合后的候选总数。
- `sources.fulltext_total``sources.vector_total` 分别表示两路召回的原始统计。
- `rank_sources` 用于说明某条结果在 fulltext / vector 两路中的排名与 RRF 贡献。
- `summary` 来自 archive 级摘要元数据,不是 chunk 单独生成的摘要。
### 错误响应
错误响应格式与全文搜索一致。常见错误包括:
@ -385,11 +405,8 @@ curl -X POST http://127.0.0.1:8787/api/search/hybrid \
- 缺少 `query``422 Unprocessable Entity`
- embedding API、全文搜索或向量搜索失败`500 Internal Server Error`
## 后续接口
## 相关接口
以下能力尚未实现
与搜索结果配套的证据查看接口见
```http
GET /api/chunks/{chunk_uid}
GET /api/evidence/{chunk_uid}
```
- [evidenceapi.md](/www/proofdb/apidoc/evidenceapi.md)

View File

@ -0,0 +1,64 @@
<?php
namespace app\controller;
use app\service\AdminAuthService;
use support\Request;
use support\Response;
class AdminController
{
public function landing(Request $request): Response
{
if ((new AdminAuthService())->current($request) !== null) {
return $this->redirect('/admin');
}
return view('admin/landing', [
'archiveCaskUrl' => config('admin.archive_cask_url', ''),
'version' => $this->version(),
]);
}
public function login(Request $request): Response
{
if ((new AdminAuthService())->current($request) !== null) {
return $this->redirect('/admin');
}
return view('admin/login', [
'archiveCaskUrl' => config('admin.archive_cask_url', ''),
'version' => $this->version(),
]);
}
public function dashboard(Request $request): Response
{
$admin = (new AdminAuthService())->current($request);
if ($admin === null) {
return $this->redirect('/admin/login');
}
return view('admin/dashboard', [
'archiveCaskUrl' => config('admin.archive_cask_url', ''),
'admin' => $admin,
'version' => $this->version(),
]);
}
private function redirect(string $location): Response
{
return response('', 302, ['Location' => $location]);
}
private function version(): string
{
$path = base_path('.version');
if (!is_file($path)) {
return 'unknown';
}
$value = trim((string) file_get_contents($path));
return $value !== '' ? $value : 'unknown';
}
}

View File

@ -0,0 +1,129 @@
<?php
namespace app\controller\Api;
use app\service\AdminAuthService;
use InvalidArgumentException;
use JsonException;
use support\Request;
use support\Response;
use Throwable;
class AdminAuthController
{
public function login(Request $request): Response
{
try {
$payload = $this->payload($request);
$username = trim((string) ($payload['username'] ?? ''));
$password = (string) ($payload['password'] ?? '');
if ($username === '' || $password === '') {
throw new InvalidArgumentException('username and password are required.');
}
$auth = new AdminAuthService();
$user = $auth->authenticate($username, $password);
if ($user === null) {
return $this->jsonResponse([
'code' => 401,
'message' => 'Admin login failed.',
'errors' => ['auth' => 'invalid username or password.'],
], 401);
}
$auth->login($request, $user);
} catch (JsonException $exception) {
return $this->jsonResponse([
'code' => 400,
'message' => 'Invalid JSON body.',
'errors' => ['body' => $exception->getMessage()],
], 400);
} catch (InvalidArgumentException $exception) {
return $this->jsonResponse([
'code' => 422,
'message' => 'Admin login validation failed.',
'errors' => ['auth' => $exception->getMessage()],
], 422);
} catch (Throwable $exception) {
return $this->jsonResponse([
'code' => 500,
'message' => 'Admin login failed.',
'errors' => ['auth' => $exception->getMessage()],
], 500);
}
return $this->jsonResponse([
'code' => 0,
'message' => 'Admin login completed.',
'data' => ['admin' => $user],
], 200);
}
public function logout(Request $request): Response
{
try {
(new AdminAuthService())->logout($request);
} catch (Throwable $exception) {
return $this->jsonResponse([
'code' => 500,
'message' => 'Admin logout failed.',
'errors' => ['auth' => $exception->getMessage()],
], 500);
}
return $this->jsonResponse([
'code' => 0,
'message' => 'Admin logout completed.',
], 200);
}
public function me(Request $request): Response
{
try {
$admin = (new AdminAuthService())->current($request);
} catch (Throwable $exception) {
return $this->jsonResponse([
'code' => 500,
'message' => 'Admin session lookup failed.',
'errors' => ['auth' => $exception->getMessage()],
], 500);
}
if ($admin === null) {
return $this->jsonResponse([
'code' => 401,
'message' => 'Admin session not found.',
], 401);
}
return $this->jsonResponse([
'code' => 0,
'message' => 'Admin session loaded.',
'data' => ['admin' => $admin],
], 200);
}
/**
* @throws JsonException
*/
private function payload(Request $request): array
{
$rawBody = trim($request->rawBody());
if ($rawBody === '') {
return $request->post();
}
$payload = json_decode($rawBody, true, 512, JSON_THROW_ON_ERROR);
return is_array($payload) ? $payload : [];
}
private function jsonResponse(array $data, int $status): Response
{
return response(
json_encode($data, JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES | JSON_THROW_ON_ERROR),
$status,
['Content-Type' => 'application/json']
);
}
}

View File

@ -0,0 +1,351 @@
<?php
namespace app\controller\Api;
use app\service\AdminAuthService;
use app\service\AdminConsole\AdminDocService;
use app\service\AdminConsole\ArchiveAdminService;
use app\service\AdminConsole\MaintenanceScriptService;
use app\service\AdminConsole\OpenSearchAdminService;
use app\service\AdminUserRepository;
use InvalidArgumentException;
use JsonException;
use support\Request;
use support\Response;
use Throwable;
class AdminConsoleController
{
public function archives(Request $request): Response
{
if ($guard = $this->guard($request)) {
return $guard;
}
try {
$data = (new ArchiveAdminService())->list(
trim((string) $request->get('query', '')),
(int) $request->get('page', 1),
(int) $request->get('page_size', 20),
);
} catch (Throwable $exception) {
return $this->error(500, 'Archive list lookup failed.', ['archives' => $exception->getMessage()]);
}
return $this->ok('Archive list loaded.', $data);
}
public function archive(Request $request, string $archiveUid): Response
{
if ($guard = $this->guard($request)) {
return $guard;
}
try {
$archive = (new ArchiveAdminService())->detail($archiveUid);
} catch (Throwable $exception) {
return $this->error(500, 'Archive lookup failed.', ['archive' => $exception->getMessage()]);
}
if ($archive === null) {
return $this->error(404, 'Archive not found.', ['archive_uid' => $archiveUid], 404);
}
return $this->ok('Archive loaded.', $archive);
}
public function updateArchive(Request $request, string $archiveUid): Response
{
if ($guard = $this->guard($request)) {
return $guard;
}
try {
$archive = (new ArchiveAdminService())->update($archiveUid, $this->payload($request));
} catch (JsonException $exception) {
return $this->error(400, 'Invalid JSON body.', ['body' => $exception->getMessage()], 400);
} catch (InvalidArgumentException $exception) {
return $this->error(422, 'Archive update validation failed.', ['archive' => $exception->getMessage()], 422);
} catch (Throwable $exception) {
return $this->error(500, 'Archive update failed.', ['archive' => $exception->getMessage()]);
}
if ($archive === null) {
return $this->error(404, 'Archive not found.', ['archive_uid' => $archiveUid], 404);
}
return $this->ok('Archive updated.', $archive);
}
public function deleteArchive(Request $request, string $archiveUid): Response
{
if ($guard = $this->guard($request)) {
return $guard;
}
try {
$deleted = (new ArchiveAdminService())->delete($archiveUid);
} catch (Throwable $exception) {
return $this->error(500, 'Archive delete failed.', ['archive' => $exception->getMessage()]);
}
if (!$deleted) {
return $this->error(404, 'Archive not found.', ['archive_uid' => $archiveUid], 404);
}
return $this->ok('Archive deleted.', ['archive_uid' => $archiveUid]);
}
public function openSearchStatus(Request $request): Response
{
if ($guard = $this->guard($request)) {
return $guard;
}
try {
$status = (new OpenSearchAdminService())->status();
} catch (Throwable $exception) {
return $this->error(500, 'OpenSearch status lookup failed.', ['opensearch' => $exception->getMessage()]);
}
return $this->ok('OpenSearch status loaded.', $status);
}
public function openSearchDocuments(Request $request): Response
{
if ($guard = $this->guard($request)) {
return $guard;
}
try {
$documents = (new OpenSearchAdminService())->documents(
trim((string) $request->get('query', '')),
(int) $request->get('size', 20),
);
} catch (Throwable $exception) {
return $this->error(500, 'OpenSearch document lookup failed.', ['opensearch' => $exception->getMessage()]);
}
return $this->ok('OpenSearch documents loaded.', $documents);
}
public function users(Request $request): Response
{
if ($guard = $this->guard($request)) {
return $guard;
}
try {
$users = (new AdminUserRepository())->listAll();
} catch (Throwable $exception) {
return $this->error(500, 'Admin users lookup failed.', ['users' => $exception->getMessage()]);
}
return $this->ok('Admin users loaded.', ['items' => array_map(fn (array $user): array => $this->sanitizeUser($user), $users)]);
}
public function createUser(Request $request): Response
{
if ($guard = $this->guard($request)) {
return $guard;
}
try {
$payload = $this->payload($request);
$username = trim((string) ($payload['username'] ?? ''));
$password = trim((string) ($payload['password'] ?? ''));
$displayName = trim((string) ($payload['display_name'] ?? ''));
if ($username === '' || $password === '') {
throw new InvalidArgumentException('username and password are required.');
}
$repository = new AdminUserRepository();
if ($repository->findAnyByUsername($username)) {
throw new InvalidArgumentException('username already exists.');
}
$user = $repository->create($username, $password, $displayName !== '' ? $displayName : null);
} catch (JsonException $exception) {
return $this->error(400, 'Invalid JSON body.', ['body' => $exception->getMessage()], 400);
} catch (InvalidArgumentException $exception) {
return $this->error(422, 'Admin user creation validation failed.', ['user' => $exception->getMessage()], 422);
} catch (Throwable $exception) {
return $this->error(500, 'Admin user creation failed.', ['user' => $exception->getMessage()]);
}
return $this->ok('Admin user created.', ['user' => $this->sanitizeUser($user)]);
}
public function updateUser(Request $request, int $id): Response
{
if ($guard = $this->guard($request)) {
return $guard;
}
try {
$payload = $this->payload($request);
$repository = new AdminUserRepository();
if ($repository->findAnyById($id) === null) {
return $this->error(404, 'Admin user not found.', ['id' => $id], 404);
}
$updates = [];
if (array_key_exists('display_name', $payload)) {
$updates['display_name'] = $payload['display_name'];
}
if (array_key_exists('password', $payload)) {
$updates['password'] = $payload['password'];
}
if (array_key_exists('is_active', $payload)) {
$updates['is_active'] = (bool) $payload['is_active'];
}
$user = $repository->updateUser($id, $updates);
} catch (JsonException $exception) {
return $this->error(400, 'Invalid JSON body.', ['body' => $exception->getMessage()], 400);
} catch (Throwable $exception) {
return $this->error(500, 'Admin user update failed.', ['user' => $exception->getMessage()]);
}
return $this->ok('Admin user updated.', ['user' => $this->sanitizeUser($user ?? [])]);
}
public function docs(Request $request): Response
{
if ($guard = $this->guard($request)) {
return $guard;
}
try {
$docs = (new AdminDocService())->list();
} catch (Throwable $exception) {
return $this->error(500, 'API docs lookup failed.', ['docs' => $exception->getMessage()]);
}
return $this->ok('API docs loaded.', ['items' => $docs]);
}
public function doc(Request $request, string $name): Response
{
if ($guard = $this->guard($request)) {
return $guard;
}
try {
$doc = (new AdminDocService())->read($name);
} catch (Throwable $exception) {
return $this->error(404, 'API doc not found.', ['doc' => $exception->getMessage()], 404);
}
return $this->ok('API doc loaded.', $doc);
}
public function scripts(Request $request): Response
{
if ($guard = $this->guard($request)) {
return $guard;
}
return $this->ok('Maintenance scripts loaded.', ['items' => (new MaintenanceScriptService())->list()]);
}
public function script(Request $request, string $name): Response
{
if ($guard = $this->guard($request)) {
return $guard;
}
try {
$script = (new MaintenanceScriptService())->describe($name);
} catch (Throwable $exception) {
return $this->error(404, 'Maintenance script not found.', ['script' => $exception->getMessage()], 404);
}
return $this->ok('Maintenance script loaded.', $script);
}
public function runScript(Request $request): Response
{
if ($guard = $this->guard($request)) {
return $guard;
}
try {
$payload = $this->payload($request);
$scriptName = trim((string) ($payload['script_name'] ?? ''));
$args = $payload['args'] ?? [];
if ($scriptName === '') {
throw new InvalidArgumentException('script_name is required.');
}
if (!is_array($args)) {
throw new InvalidArgumentException('args must be an array.');
}
$result = (new MaintenanceScriptService())->run($scriptName, $args);
} catch (JsonException $exception) {
return $this->error(400, 'Invalid JSON body.', ['body' => $exception->getMessage()], 400);
} catch (InvalidArgumentException $exception) {
return $this->error(422, 'Script execution validation failed.', ['script' => $exception->getMessage()], 422);
} catch (Throwable $exception) {
return $this->error(500, 'Script execution failed.', ['script' => $exception->getMessage()]);
}
return $this->ok('Maintenance script finished.', $result);
}
private function guard(Request $request): ?Response
{
return (new AdminAuthService())->current($request) === null
? $this->error(401, 'Admin session not found.', [], 401)
: null;
}
/**
* @throws JsonException
*/
private function payload(Request $request): array
{
$rawBody = trim($request->rawBody());
if ($rawBody === '') {
return $request->post();
}
$payload = json_decode($rawBody, true, 512, JSON_THROW_ON_ERROR);
return is_array($payload) ? $payload : [];
}
private function ok(string $message, array $data): Response
{
return $this->jsonResponse([
'code' => 0,
'message' => $message,
'data' => $data,
], 200);
}
private function error(int $code, string $message, array $errors = [], int $status = 500): Response
{
return $this->jsonResponse([
'code' => $code,
'message' => $message,
'errors' => $errors,
], $status);
}
private function sanitizeUser(array $user): array
{
unset($user['password_hash']);
return $user;
}
private function jsonResponse(array $data, int $status): Response
{
return response(
json_encode($data, JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES | JSON_THROW_ON_ERROR),
$status,
['Content-Type' => 'application/json']
);
}
}

View File

@ -0,0 +1,255 @@
<?php
namespace app\controller\Api;
use app\service\ArchiveRepository;
use support\Response;
use Throwable;
class EvidenceController
{
public function archive(string $archiveUid): Response
{
try {
$archive = (new ArchiveRepository())->findArchive($archiveUid);
} catch (Throwable $exception) {
return $this->jsonResponse([
'code' => 500,
'message' => 'Archive lookup failed.',
'errors' => ['archive' => $exception->getMessage()],
], 500);
}
if ($archive === null) {
return $this->jsonResponse([
'code' => 404,
'message' => 'Archive not found.',
'errors' => ['archive_uid' => $archiveUid],
], 404);
}
$archive['chunk_count'] = is_array($archive['chunks'] ?? null) ? count($archive['chunks']) : 0;
return $this->jsonResponse([
'code' => 0,
'message' => 'Archive loaded.',
'data' => $archive,
], 200);
}
public function chunk(string $chunkUid): Response
{
try {
$chunk = (new ArchiveRepository())->findChunk($chunkUid);
} catch (Throwable $exception) {
return $this->jsonResponse([
'code' => 500,
'message' => 'Chunk lookup failed.',
'errors' => ['chunk' => $exception->getMessage()],
], 500);
}
if ($chunk === null) {
return $this->jsonResponse([
'code' => 404,
'message' => 'Chunk not found.',
'errors' => ['chunk_uid' => $chunkUid],
], 404);
}
return $this->jsonResponse([
'code' => 0,
'message' => 'Chunk loaded.',
'data' => $chunk,
], 200);
}
public function archiveChunks(string $archiveUid): Response
{
try {
$repository = new ArchiveRepository();
$archive = $repository->findArchive($archiveUid);
$chunks = $archive === null ? [] : $repository->findArchiveChunks($archiveUid);
} catch (Throwable $exception) {
return $this->jsonResponse([
'code' => 500,
'message' => 'Archive chunks lookup failed.',
'errors' => ['archive_chunks' => $exception->getMessage()],
], 500);
}
if ($archive === null) {
return $this->jsonResponse([
'code' => 404,
'message' => 'Archive not found.',
'errors' => ['archive_uid' => $archiveUid],
], 404);
}
return $this->jsonResponse([
'code' => 0,
'message' => 'Archive chunks loaded.',
'data' => [
'archive_uid' => $archive['archive_uid'],
'title' => $archive['title'],
'summary' => $archive['summary'],
'source' => $archive['source'],
'author' => $archive['author'],
'year' => $archive['year'],
'series' => $archive['series'],
'tags' => $archive['tags'],
'chunk_count' => count($chunks),
'chunks' => $chunks,
],
], 200);
}
public function evidence(string $chunkUid): Response
{
try {
$chunk = (new ArchiveRepository())->findChunk($chunkUid);
} catch (Throwable $exception) {
return $this->jsonResponse([
'code' => 500,
'message' => 'Evidence lookup failed.',
'errors' => ['evidence' => $exception->getMessage()],
], 500);
}
if ($chunk === null) {
return $this->jsonResponse([
'code' => 404,
'message' => 'Evidence not found.',
'errors' => ['chunk_uid' => $chunkUid],
], 404);
}
$archive = $chunk['archive'];
$pages = $chunk['pages'];
$pageLabel = $this->pageLabel($pages);
return $this->jsonResponse([
'code' => 0,
'message' => 'Evidence loaded.',
'data' => [
'chunk_uid' => $chunk['chunk_uid'],
'archive_uid' => $chunk['archive_uid'],
'title' => $archive['title'] ?? null,
'source' => $archive['source'] ?? null,
'author' => $archive['author'] ?? null,
'year' => $archive['year'] ?? null,
'series' => $archive['series'] ?? null,
'tags' => $archive['tags'] ?? [],
'page_start' => $chunk['page_start'],
'page_end' => $chunk['page_end'],
'pages' => $pages,
'page_label' => $pageLabel,
'citation' => $this->citation($archive, $pageLabel),
'quote' => $chunk['text'],
'chunk' => [
'chunk_index' => $chunk['chunk_index'],
'length' => $chunk['length'],
'embedding_model' => $chunk['embedding_model'],
'embedding_status' => $chunk['embedding_status'],
'search_index_status' => $chunk['search_index_status'],
],
],
], 200);
}
public function archiveEvidence(string $archiveUid): Response
{
try {
$repository = new ArchiveRepository();
$archive = $repository->findArchive($archiveUid);
$chunks = $archive === null ? [] : $repository->findArchiveChunks($archiveUid);
} catch (Throwable $exception) {
return $this->jsonResponse([
'code' => 500,
'message' => 'Archive evidence lookup failed.',
'errors' => ['archive_evidence' => $exception->getMessage()],
], 500);
}
if ($archive === null) {
return $this->jsonResponse([
'code' => 404,
'message' => 'Archive not found.',
'errors' => ['archive_uid' => $archiveUid],
], 404);
}
$evidence = array_map(function (array $chunk): array {
$archive = $chunk['archive'];
$pages = $chunk['pages'];
$pageLabel = $this->pageLabel($pages);
return [
'chunk_uid' => $chunk['chunk_uid'],
'chunk_index' => $chunk['chunk_index'],
'page_start' => $chunk['page_start'],
'page_end' => $chunk['page_end'],
'pages' => $pages,
'page_label' => $pageLabel,
'citation' => $this->citation($archive, $pageLabel),
'quote' => $chunk['text'],
'length' => $chunk['length'],
'embedding_model' => $chunk['embedding_model'],
'embedding_status' => $chunk['embedding_status'],
'search_index_status' => $chunk['search_index_status'],
];
}, $chunks);
return $this->jsonResponse([
'code' => 0,
'message' => 'Archive evidence loaded.',
'data' => [
'archive_uid' => $archive['archive_uid'],
'title' => $archive['title'],
'summary' => $archive['summary'],
'source' => $archive['source'],
'author' => $archive['author'],
'year' => $archive['year'],
'series' => $archive['series'],
'tags' => $archive['tags'],
'chunk_count' => count($evidence),
'evidence' => $evidence,
],
], 200);
}
private function citation(array $archive, string $pageLabel): string
{
$parts = array_values(array_filter([
$archive['title'] ?? null,
$archive['author'] ?? null,
isset($archive['year']) ? (string) $archive['year'] : null,
$pageLabel === '' ? null : $pageLabel,
$archive['source'] ?? null,
], static fn ($value): bool => $value !== null && trim((string) $value) !== ''));
return implode(' | ', $parts);
}
private function pageLabel(array $pages): string
{
if ($pages === []) {
return '';
}
if (count($pages) === 1) {
return 'p. ' . (string) $pages[0];
}
return 'pp. ' . (string) $pages[0] . '-' . (string) $pages[count($pages) - 1];
}
private function jsonResponse(array $data, int $status): Response
{
return response(
json_encode($data, JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES | JSON_THROW_ON_ERROR),
$status,
['Content-Type' => 'application/json']
);
}
}

View File

@ -0,0 +1,63 @@
<?php
namespace app\service;
use support\Request;
class AdminAuthService
{
private const SESSION_KEY = 'proofdb_admin_user_id';
public function __construct(private readonly ?AdminUserRepository $users = null)
{
}
public function authenticate(string $username, string $password): ?array
{
$username = trim($username);
if ($username === '' || $password === '') {
return null;
}
$user = $this->users()->findByUsername($username);
if ($user === null || !password_verify($password, $user['password_hash'])) {
return null;
}
unset($user['password_hash']);
return $user;
}
public function login(Request $request, array $user): void
{
$request->session()->set(self::SESSION_KEY, (int) $user['id']);
$this->users()->touchLastLogin((int) $user['id']);
}
public function logout(Request $request): void
{
$request->session()->delete(self::SESSION_KEY);
}
public function current(Request $request): ?array
{
$id = (int) $request->session()->get(self::SESSION_KEY, 0);
if ($id <= 0) {
return null;
}
$user = $this->users()->findById($id);
if ($user === null) {
$request->session()->delete(self::SESSION_KEY);
return null;
}
unset($user['password_hash']);
return $user;
}
private function users(): AdminUserRepository
{
return $this->users ?? new AdminUserRepository();
}
}

View File

@ -0,0 +1,76 @@
<?php
namespace app\service\AdminConsole;
use RuntimeException;
class AdminDocService
{
public function __construct(private readonly ?MarkdownRenderer $renderer = null)
{
}
public function list(): array
{
$items = [];
foreach (glob(base_path('apidoc/*.md')) ?: [] as $path) {
$name = basename($path);
$content = (string) file_get_contents($path);
$items[] = [
'name' => $name,
'title' => $this->title($content, $name),
];
}
usort($items, fn (array $a, array $b): int => strcmp($a['name'], $b['name']));
return $items;
}
public function read(string $name): array
{
$safeName = basename($name);
$path = base_path('apidoc/' . $safeName);
if (!is_file($path) || pathinfo($path, PATHINFO_EXTENSION) !== 'md') {
throw new RuntimeException('API doc not found.');
}
$content = (string) file_get_contents($path);
return [
'name' => $safeName,
'title' => $this->title($content, $safeName),
'content' => $content,
'html' => $this->renderer()->render($content),
];
}
public function readScriptDoc(string $name): array
{
$safeName = basename($name);
$path = base_path('scriptdoc/' . $safeName);
if (!is_file($path) || pathinfo($path, PATHINFO_EXTENSION) !== 'md') {
throw new RuntimeException('Script doc not found.');
}
$content = (string) file_get_contents($path);
return [
'name' => $safeName,
'title' => $this->title($content, $safeName),
'content' => $content,
'html' => $this->renderer()->render($content),
];
}
private function title(string $content, string $fallback): string
{
if (preg_match('/^#\s+(.+)$/m', $content, $matches)) {
return trim($matches[1]);
}
return $fallback;
}
private function renderer(): MarkdownRenderer
{
return $this->renderer ?? new MarkdownRenderer();
}
}

View File

@ -0,0 +1,205 @@
<?php
namespace app\service\AdminConsole;
use InvalidArgumentException;
use support\Db;
class ArchiveAdminService
{
public function list(string $query = '', int $page = 1, int $pageSize = 20): array
{
$page = max(1, $page);
$pageSize = min(100, max(1, $pageSize));
$builder = Db::table('archives');
$query = trim($query);
if ($query !== '') {
$like = '%' . $query . '%';
$builder->where(function ($subQuery) use ($like): void {
$subQuery
->orWhere('archive_uid', 'like', $like)
->orWhere('title', 'like', $like)
->orWhere('summary', 'like', $like)
->orWhere('author', 'like', $like)
->orWhere('source', 'like', $like)
->orWhere('series', 'like', $like);
});
}
$total = (clone $builder)->count();
$rows = $builder
->orderByDesc('updated_time')
->offset(($page - 1) * $pageSize)
->limit($pageSize)
->get([
'archive_uid',
'title',
'summary',
'year',
'author',
'source',
'series',
'tags',
'created_time',
'updated_time',
Db::raw('jsonb_array_length(chunks) as chunk_count'),
])
->all();
return [
'items' => array_map(fn (object $row): array => $this->listItem($row), $rows),
'total' => (int) $total,
'page' => $page,
'page_size' => $pageSize,
];
}
public function detail(string $archiveUid): ?array
{
$row = Db::table('archives')->where('archive_uid', $archiveUid)->first();
if (!$row) {
return null;
}
return $this->detailItem($row);
}
public function update(string $archiveUid, array $payload): ?array
{
if (!$this->detail($archiveUid)) {
return null;
}
$updates = [];
foreach (['title', 'summary', 'author', 'source', 'series', 'content', 'raw'] as $field) {
if (array_key_exists($field, $payload)) {
$updates[$field] = $this->nullableText($payload[$field]);
}
}
if (array_key_exists('year', $payload)) {
$year = trim((string) ($payload['year'] ?? ''));
if ($year === '') {
$updates['year'] = null;
} elseif (!preg_match('/^\d{1,4}$/', $year)) {
throw new InvalidArgumentException('year must be empty or a 1-4 digit number.');
} else {
$updates['year'] = (int) $year;
}
}
if (array_key_exists('tags', $payload)) {
$updates['tags'] = json_encode($this->normalizeTags($payload['tags']), JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES);
}
if (array_key_exists('metadata', $payload)) {
$updates['metadata'] = json_encode($this->normalizeMetadata($payload['metadata']), JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES);
}
if ($updates !== []) {
Db::table('archives')->where('archive_uid', $archiveUid)->update($updates);
}
return $this->detail($archiveUid);
}
public function delete(string $archiveUid): bool
{
return (int) Db::table('archives')->where('archive_uid', $archiveUid)->delete() > 0;
}
private function listItem(object $row): array
{
$chunks = $this->decodeJson($row->chunks ?? null, []);
return [
'archive_uid' => (string) $row->archive_uid,
'title' => $row->title,
'summary' => $row->summary,
'year' => $row->year === null ? null : (int) $row->year,
'author' => $row->author,
'source' => $row->source,
'series' => $row->series,
'tags' => $this->decodeJson($row->tags ?? null, []),
'chunk_count' => property_exists($row, 'chunk_count')
? ($row->chunk_count === null ? 0 : (int) $row->chunk_count)
: count(is_array($chunks) ? $chunks : []),
'created_time' => $row->created_time,
'updated_time' => $row->updated_time,
];
}
private function detailItem(object $row): array
{
$data = $this->listItem($row);
$data['metadata'] = $this->decodeJson($row->metadata ?? null, []);
$data['content'] = $row->content;
$data['raw'] = $row->raw;
$data['chunks'] = $this->decodeJson($row->chunks ?? null, []);
return $data;
}
private function normalizeTags(mixed $value): array
{
if (is_array($value)) {
$items = $value;
} else {
$text = trim((string) $value);
if ($text === '') {
return [];
}
$items = preg_split('/[\r\n,]+/', $text) ?: [];
}
$tags = [];
foreach ($items as $item) {
$tag = trim((string) $item);
if ($tag !== '') {
$tags[] = $tag;
}
}
return array_values(array_unique($tags));
}
private function normalizeMetadata(mixed $value): array
{
if (is_array($value)) {
return $value;
}
$text = trim((string) $value);
if ($text === '') {
return [];
}
$decoded = json_decode($text, true);
if (!is_array($decoded)) {
throw new InvalidArgumentException('metadata must be a JSON object or array.');
}
return $decoded;
}
private function nullableText(mixed $value): ?string
{
$text = trim((string) $value);
return $text === '' ? null : $text;
}
private function decodeJson(mixed $value, mixed $fallback): mixed
{
if ($value === null) {
return $fallback;
}
if (is_array($value)) {
return $value;
}
$decoded = json_decode((string) $value, true);
return $decoded === null && json_last_error() !== JSON_ERROR_NONE ? $fallback : $decoded;
}
}

View File

@ -0,0 +1,148 @@
<?php
namespace app\service\AdminConsole;
use RuntimeException;
class MaintenanceScriptService
{
private const ARG_PATTERN = '/^--[A-Za-z0-9_]+(?:=[A-Za-z0-9._:@\/-]+)?$/';
public function list(): array
{
$docs = new AdminDocService();
$items = [];
foreach ($this->definitions() as $definition) {
$item = $definition;
try {
$doc = $docs->readScriptDoc($definition['doc_name']);
$item['doc_title'] = $doc['title'];
$item['doc_html'] = $doc['html'];
$item['doc_content'] = $doc['content'];
} catch (RuntimeException) {
$item['doc_title'] = null;
$item['doc_html'] = null;
$item['doc_content'] = null;
}
$items[] = $item;
}
return $items;
}
public function describe(string $name): array
{
$definitions = $this->definitions();
if (!isset($definitions[$name])) {
throw new RuntimeException('Script is not allowed.');
}
foreach ($this->list() as $item) {
if ($item['name'] === $name) {
return $item;
}
}
throw new RuntimeException('Script metadata not found.');
}
public function run(string $name, array $args = []): array
{
$definitions = $this->definitions();
if (!isset($definitions[$name])) {
throw new RuntimeException('Script is not allowed.');
}
$script = $definitions[$name];
$scriptPath = base_path('scripts/' . $script['file']);
if (!is_file($scriptPath)) {
throw new RuntimeException('Script file not found.');
}
$safeArgs = [];
foreach ($args as $arg) {
$arg = trim((string) $arg);
if ($arg === '') {
continue;
}
if (!preg_match(self::ARG_PATTERN, $arg)) {
throw new RuntimeException('Only --key=value style arguments are allowed.');
}
$safeArgs[] = $arg;
}
$command = array_merge([PHP_BINARY, $scriptPath], $safeArgs);
$descriptors = [
0 => ['pipe', 'r'],
1 => ['pipe', 'w'],
2 => ['pipe', 'w'],
];
$process = proc_open($command, $descriptors, $pipes, base_path());
if (!is_resource($process)) {
throw new RuntimeException('Failed to start script process.');
}
fclose($pipes[0]);
$stdout = (string) stream_get_contents($pipes[1]);
fclose($pipes[1]);
$stderr = (string) stream_get_contents($pipes[2]);
fclose($pipes[2]);
$exitCode = proc_close($process);
return [
'script_name' => $name,
'command' => array_merge(['php', 'scripts/' . $script['file']], $safeArgs),
'exit_code' => $exitCode,
'stdout' => $stdout,
'stderr' => $stderr,
'ok' => $exitCode === 0,
];
}
private function definitions(): array
{
return [
'setup_database' => [
'name' => 'setup_database',
'file' => 'setup_database.php',
'label' => '初始化数据库',
'description' => '创建或补齐 archives、chunks 相关表结构与索引。',
'doc_name' => 'setup_database.md',
'args_hint' => '无参数',
],
'setup_opensearch' => [
'name' => 'setup_opensearch',
'file' => 'setup_opensearch.php',
'label' => '初始化 OpenSearch',
'description' => '创建或补齐 proofdb_chunks 索引与 mapping。',
'doc_name' => 'setup_opensearch.md',
'args_hint' => '无参数',
],
'reindex_opensearch' => [
'name' => 'reindex_opensearch',
'file' => 'reindex_opensearch.php',
'label' => '重建 OpenSearch 索引',
'description' => '把 PostgreSQL 中已向量化的数据重新写入 OpenSearch。',
'doc_name' => 'reindex_opensearch.md',
'args_hint' => '--archive_uid=01...',
],
'backfill_archive_content' => [
'name' => 'backfill_archive_content',
'file' => 'backfill_archive_content.php',
'label' => '回填 archive content',
'description' => '从 raw 或 chunks 回填 archives.content。',
'doc_name' => 'backfill_archive_content.md',
'args_hint' => '--archive_uid=01...',
],
'setup_admin_users' => [
'name' => 'setup_admin_users',
'file' => 'setup_admin_users.php',
'label' => '初始化管理员用户',
'description' => '创建 admin_users 表并写入或更新管理员账号。',
'doc_name' => 'setup_admin_users.md',
'args_hint' => '--username=admin --password=secret',
],
];
}
}

View File

@ -0,0 +1,185 @@
<?php
namespace app\service\AdminConsole;
class MarkdownRenderer
{
public function render(string $markdown): string
{
$lines = preg_split('/\r\n|\n|\r/', $markdown) ?: [];
$html = [];
$paragraph = [];
$listType = null;
$table = null;
$inCodeBlock = false;
$codeLines = [];
$flushParagraph = function () use (&$paragraph, &$html): void {
if ($paragraph === []) {
return;
}
$text = implode(' ', array_map('trim', $paragraph));
$html[] = '<p>' . $this->renderInline($text) . '</p>';
$paragraph = [];
};
$flushList = function () use (&$listType, &$html): void {
if ($listType !== null) {
$html[] = '</' . $listType . '>';
$listType = null;
}
};
$flushTable = function () use (&$table, &$html): void {
if ($table === null) {
return;
}
$html[] = '<table class="markdown-table"><thead><tr>' .
implode('', array_map(fn (string $cell): string => '<th>' . $this->renderInline($cell) . '</th>', $table['headers'])) .
'</tr></thead><tbody>';
foreach ($table['rows'] as $row) {
$html[] = '<tr>' .
implode('', array_map(fn (string $cell): string => '<td>' . $this->renderInline($cell) . '</td>', $row)) .
'</tr>';
}
$html[] = '</tbody></table>';
$table = null;
};
foreach ($lines as $line) {
if (preg_match('/^```/', $line)) {
$flushParagraph();
$flushList();
$flushTable();
if ($inCodeBlock) {
$html[] = '<pre class="markdown-pre"><code>' . htmlspecialchars(implode("\n", $codeLines), ENT_QUOTES, 'UTF-8') . '</code></pre>';
$codeLines = [];
$inCodeBlock = false;
} else {
$inCodeBlock = true;
}
continue;
}
if ($inCodeBlock) {
$codeLines[] = $line;
continue;
}
$trimmed = trim($line);
if ($trimmed === '') {
$flushParagraph();
$flushList();
$flushTable();
continue;
}
if (preg_match('/^(#{1,6})\s+(.+)$/', $trimmed, $matches)) {
$flushParagraph();
$flushList();
$flushTable();
$level = strlen($matches[1]);
$html[] = sprintf('<h%d>%s</h%d>', $level, $this->renderInline($matches[2]), $level);
continue;
}
if (preg_match('/^>\s?(.+)$/', $trimmed, $matches)) {
$flushParagraph();
$flushList();
$flushTable();
$html[] = '<blockquote>' . $this->renderInline($matches[1]) . '</blockquote>';
continue;
}
if (preg_match('/^---+$/', $trimmed)) {
$flushParagraph();
$flushList();
$flushTable();
$html[] = '<hr>';
continue;
}
if ($this->isTableDelimiter($trimmed) && $table !== null) {
continue;
}
if (str_contains($trimmed, '|')) {
$cells = $this->tableCells($trimmed);
if (count($cells) >= 2) {
$flushParagraph();
$flushList();
if ($table === null) {
$table = ['headers' => $cells, 'rows' => []];
} else {
$table['rows'][] = $cells;
}
continue;
}
}
if (preg_match('/^[-*]\s+(.+)$/', $trimmed, $matches)) {
$flushParagraph();
$flushTable();
if ($listType !== 'ul') {
$flushList();
$listType = 'ul';
$html[] = '<ul>';
}
$html[] = '<li>' . $this->renderInline($matches[1]) . '</li>';
continue;
}
if (preg_match('/^\d+\.\s+(.+)$/', $trimmed, $matches)) {
$flushParagraph();
$flushTable();
if ($listType !== 'ol') {
$flushList();
$listType = 'ol';
$html[] = '<ol>';
}
$html[] = '<li>' . $this->renderInline($matches[1]) . '</li>';
continue;
}
$flushList();
$flushTable();
$paragraph[] = $trimmed;
}
if ($inCodeBlock) {
$html[] = '<pre class="markdown-pre"><code>' . htmlspecialchars(implode("\n", $codeLines), ENT_QUOTES, 'UTF-8') . '</code></pre>';
}
$flushParagraph();
$flushList();
$flushTable();
return implode("\n", $html);
}
private function renderInline(string $text): string
{
$text = htmlspecialchars($text, ENT_QUOTES, 'UTF-8');
$text = preg_replace('/`([^`]+)`/', '<code>$1</code>', $text) ?? $text;
$text = preg_replace('/\*\*([^*]+)\*\*/', '<strong>$1</strong>', $text) ?? $text;
$text = preg_replace('/\*([^*]+)\*/', '<em>$1</em>', $text) ?? $text;
$text = preg_replace('/\[(.+?)\]\((.+?)\)/', '<a href="$2" target="_blank" rel="noreferrer">$1</a>', $text) ?? $text;
return $text;
}
private function isTableDelimiter(string $line): bool
{
return (bool) preg_match('/^\|?[\s:-]+\|[\s|:-]*$/', $line);
}
private function tableCells(string $line): array
{
$line = trim($line);
$line = trim($line, '|');
return array_map(static fn (string $cell): string => trim($cell), explode('|', $line));
}
}

View File

@ -0,0 +1,153 @@
<?php
namespace app\service\AdminConsole;
use app\service\Search\OpenSearchClientFactory;
use support\Db;
use Throwable;
class OpenSearchAdminService
{
public function status(): array
{
$config = config('opensearch.default', []);
$indexName = config('opensearch.indices.chunks', 'proofdb_chunks');
$status = [
'config' => [
'hosts' => $config['hosts'] ?? [],
'ssl_verify' => (bool) ($config['ssl_verify'] ?? true),
'index_name' => $indexName,
],
'database' => [
'archives_total' => (int) Db::table('archives')->count(),
'chunks_total' => (int) Db::table('chunks')->count(),
'embedded_chunks' => (int) Db::table('chunks')->where('embedding_status', 3)->count(),
'indexed_chunks' => (int) Db::table('chunks')->where('search_index_status', 3)->count(),
],
'opensearch' => [
'reachable' => false,
'index_exists' => false,
'cluster_name' => null,
'health' => null,
'docs_count' => 0,
'mapping_fields' => [],
'error' => null,
],
];
try {
$client = (new OpenSearchClientFactory())->make();
$health = $client->cluster()->health();
$status['opensearch']['reachable'] = true;
$status['opensearch']['cluster_name'] = $health['cluster_name'] ?? null;
$status['opensearch']['health'] = $health['status'] ?? null;
$exists = (bool) $client->indices()->exists(['index' => $indexName]);
$status['opensearch']['index_exists'] = $exists;
if ($exists) {
$stats = $client->indices()->stats(['index' => $indexName]);
$mapping = $client->indices()->getMapping(['index' => $indexName]);
$status['opensearch']['docs_count'] = (int) (($stats['_all']['primaries']['docs']['count'] ?? 0));
$status['opensearch']['mapping_fields'] = array_keys($mapping[$indexName]['mappings']['properties'] ?? []);
}
} catch (Throwable $exception) {
$status['opensearch']['error'] = $exception->getMessage();
}
return $status;
}
public function documents(string $query = '', int $size = 20): array
{
$size = min(50, max(1, $size));
$indexName = config('opensearch.indices.chunks', 'proofdb_chunks');
$client = (new OpenSearchClientFactory())->make();
if (!(bool) $client->indices()->exists(['index' => $indexName])) {
return [
'index_name' => $indexName,
'items' => [],
'total' => 0,
];
}
$body = [
'_source' => [
'includes' => [
'chunk_uid',
'archive_uid',
'chunk_index',
'page_start',
'page_end',
'title',
'summary',
'source',
'author',
'year',
'series',
'tags',
'text',
'embedding_model',
'embedding_dimensions',
'created_time',
'updated_time',
],
],
'size' => $size,
'sort' => [
['updated_time' => ['order' => 'desc']],
],
];
$query = trim($query);
if ($query === '') {
$body['query'] = ['match_all' => (object) []];
} else {
$body['query'] = [
'multi_match' => [
'query' => $query,
'fields' => ['text^3', 'title^2', 'summary^2', 'source', 'author', 'tags'],
'type' => 'best_fields',
],
];
}
$response = $client->search([
'index' => $indexName,
'body' => $body,
]);
$hits = $response['hits']['hits'] ?? [];
return [
'index_name' => $indexName,
'total' => (int) (($response['hits']['total']['value'] ?? 0)),
'items' => array_map(function (array $hit): array {
$source = $hit['_source'] ?? [];
$text = trim((string) ($source['text'] ?? ''));
return [
'score' => $hit['_score'] ?? null,
'chunk_uid' => $source['chunk_uid'] ?? ($hit['_id'] ?? null),
'archive_uid' => $source['archive_uid'] ?? null,
'chunk_index' => $source['chunk_index'] ?? null,
'page_start' => $source['page_start'] ?? null,
'page_end' => $source['page_end'] ?? null,
'title' => $source['title'] ?? null,
'summary' => $source['summary'] ?? null,
'source' => $source['source'] ?? null,
'author' => $source['author'] ?? null,
'year' => $source['year'] ?? null,
'series' => $source['series'] ?? null,
'tags' => $source['tags'] ?? [],
'text_preview' => mb_substr($text, 0, 320),
'embedding_model' => $source['embedding_model'] ?? null,
'embedding_dimensions' => $source['embedding_dimensions'] ?? null,
'created_time' => $source['created_time'] ?? null,
'updated_time' => $source['updated_time'] ?? null,
];
}, $hits),
];
}
}

View File

@ -0,0 +1,108 @@
<?php
namespace app\service;
use support\Db;
class AdminUserRepository
{
public function listAll(): array
{
$rows = Db::table('admin_users')
->orderByDesc('id')
->get()
->all();
return array_map(fn (object $row): array => $this->toArray($row), $rows);
}
public function findByUsername(string $username): ?array
{
$row = Db::table('admin_users')
->where('username', $username)
->where('is_active', true)
->first();
return $row ? $this->toArray($row) : null;
}
public function findById(int $id): ?array
{
$row = Db::table('admin_users')
->where('id', $id)
->where('is_active', true)
->first();
return $row ? $this->toArray($row) : null;
}
public function findAnyById(int $id): ?array
{
$row = Db::table('admin_users')->where('id', $id)->first();
return $row ? $this->toArray($row) : null;
}
public function findAnyByUsername(string $username): ?array
{
$row = Db::table('admin_users')->where('username', $username)->first();
return $row ? $this->toArray($row) : null;
}
public function touchLastLogin(int $id): void
{
Db::table('admin_users')
->where('id', $id)
->update(['last_login_at' => Db::raw('CURRENT_TIMESTAMP')]);
}
public function create(string $username, string $password, ?string $displayName = null): array
{
$id = Db::table('admin_users')->insertGetId([
'username' => $username,
'display_name' => $displayName,
'password_hash' => password_hash($password, PASSWORD_DEFAULT),
'is_active' => true,
'last_login_at' => null,
]);
return $this->findAnyById((int) $id) ?? [];
}
public function updateUser(int $id, array $fields): ?array
{
$updates = [];
if (array_key_exists('display_name', $fields)) {
$displayName = $fields['display_name'];
$updates['display_name'] = $displayName === null ? null : trim((string) $displayName);
}
if (array_key_exists('password', $fields) && trim((string) $fields['password']) !== '') {
$updates['password_hash'] = password_hash((string) $fields['password'], PASSWORD_DEFAULT);
}
if (array_key_exists('is_active', $fields)) {
$updates['is_active'] = (bool) $fields['is_active'];
}
if ($updates !== []) {
Db::table('admin_users')->where('id', $id)->update($updates);
}
return $this->findAnyById($id);
}
private function toArray(object $row): array
{
return [
'id' => (int) $row->id,
'username' => (string) $row->username,
'display_name' => $row->display_name,
'password_hash' => (string) $row->password_hash,
'is_active' => (bool) $row->is_active,
'last_login_at' => $row->last_login_at,
'created_time' => $row->created_time,
'updated_time' => $row->updated_time,
];
}
}

View File

@ -75,6 +75,102 @@ class ArchiveRepository
return implode("\n\n", array_map(fn ($chunk): string => (string) $chunk->text, $chunks));
}
public function findChunk(string $chunkUid): ?array
{
$row = Db::table('chunks')
->join('archives', 'chunks.archive_uid', '=', 'archives.archive_uid')
->where('chunks.chunk_uid', $chunkUid)
->first([
'chunks.chunk_uid',
'chunks.archive_uid',
'chunks.chunk_index',
'chunks.page_start',
'chunks.page_end',
'chunks.text',
'chunks.length',
'chunks.embedding_status',
'chunks.embedding_ref',
'chunks.embedding_model',
'chunks.embedding_error',
'chunks.search_index_status',
'chunks.search_index_error',
'archives.title',
'archives.summary',
'archives.year',
'archives.author',
'archives.source',
'archives.series',
'archives.tags',
'archives.metadata',
]);
if (!$row) {
return null;
}
return [
'chunk_uid' => (string) $row->chunk_uid,
'archive_uid' => (string) $row->archive_uid,
'chunk_index' => (int) $row->chunk_index,
'page_start' => $row->page_start === null ? null : (int) $row->page_start,
'page_end' => $row->page_end === null ? null : (int) $row->page_end,
'pages' => $this->pages($row->page_start, $row->page_end),
'text' => (string) $row->text,
'length' => $row->length === null ? null : (int) $row->length,
'embedding_status' => (int) $row->embedding_status,
'embedding_ref' => $this->decodeJson($row->embedding_ref ?? null, null),
'embedding_model' => $row->embedding_model,
'embedding_error' => $row->embedding_error,
'search_index_status' => (int) $row->search_index_status,
'search_index_error' => $row->search_index_error,
'archive' => [
'archive_uid' => (string) $row->archive_uid,
'title' => $row->title,
'summary' => $row->summary,
'year' => $row->year === null ? null : (int) $row->year,
'author' => $row->author,
'source' => $row->source,
'series' => $row->series,
'tags' => $this->decodeJson($row->tags ?? null, []),
'metadata' => $this->decodeJson($row->metadata ?? null, []),
],
];
}
public function findArchiveChunks(string $archiveUid): array
{
$rows = Db::table('chunks')
->join('archives', 'chunks.archive_uid', '=', 'archives.archive_uid')
->where('chunks.archive_uid', $archiveUid)
->orderBy('chunks.chunk_index')
->get([
'chunks.chunk_uid',
'chunks.archive_uid',
'chunks.chunk_index',
'chunks.page_start',
'chunks.page_end',
'chunks.text',
'chunks.length',
'chunks.embedding_status',
'chunks.embedding_ref',
'chunks.embedding_model',
'chunks.embedding_error',
'chunks.search_index_status',
'chunks.search_index_error',
'archives.title',
'archives.summary',
'archives.year',
'archives.author',
'archives.source',
'archives.series',
'archives.tags',
'archives.metadata',
])
->all();
return array_map(fn (object $row): array => $this->chunkRowToArray($row), $rows);
}
public function updateMetadata(string $archiveUid, array $fields, array $aiMeta): void
{
$archive = $this->findArchive($archiveUid);
@ -136,4 +232,68 @@ class ArchiveRepository
'chunks' => json_decode($archive->chunks ?? '[]', true) ?: [],
];
}
private function chunkRowToArray(object $row): array
{
return [
'chunk_uid' => (string) $row->chunk_uid,
'archive_uid' => (string) $row->archive_uid,
'chunk_index' => (int) $row->chunk_index,
'page_start' => $row->page_start === null ? null : (int) $row->page_start,
'page_end' => $row->page_end === null ? null : (int) $row->page_end,
'pages' => $this->pages($row->page_start, $row->page_end),
'text' => (string) $row->text,
'length' => $row->length === null ? null : (int) $row->length,
'embedding_status' => (int) $row->embedding_status,
'embedding_ref' => $this->decodeJson($row->embedding_ref ?? null, null),
'embedding_model' => $row->embedding_model,
'embedding_error' => $row->embedding_error,
'search_index_status' => (int) $row->search_index_status,
'search_index_error' => $row->search_index_error,
'archive' => [
'archive_uid' => (string) $row->archive_uid,
'title' => $row->title,
'summary' => $row->summary,
'year' => $row->year === null ? null : (int) $row->year,
'author' => $row->author,
'source' => $row->source,
'series' => $row->series,
'tags' => $this->decodeJson($row->tags ?? null, []),
'metadata' => $this->decodeJson($row->metadata ?? null, []),
],
];
}
private function decodeJson(mixed $value, mixed $fallback): mixed
{
if ($value === null) {
return $fallback;
}
if (is_array($value)) {
return $value;
}
if (!is_string($value) || trim($value) === '') {
return $fallback;
}
$decoded = json_decode($value, true);
return $decoded === null && json_last_error() !== JSON_ERROR_NONE ? $fallback : $decoded;
}
private function pages(mixed $pageStart, mixed $pageEnd): array
{
if (!is_numeric($pageStart) || !is_numeric($pageEnd)) {
return array_values(array_filter([$pageStart, $pageEnd], static fn ($value): bool => $value !== null && $value !== ''));
}
$start = (int) $pageStart;
$end = (int) $pageEnd;
if ($end < $start) {
$end = $start;
}
return range($start, $end);
}
}

View File

@ -70,6 +70,16 @@ class ArticleImportService
}
}
public function normalizeArchiveContentString(string $content): ?string
{
return $this->nullableClean($this->cleanMarkdownPage($content));
}
public function normalizeArchiveRawString(string $content): ?string
{
return $this->nullableClean($content);
}
private function validate(array $payload): array
{
$errors = [];
@ -182,8 +192,8 @@ class ArticleImportService
'tags' => is_array($payload['tags'] ?? null) ? array_values($payload['tags']) : [],
'summary' => $this->nullableClean($payload['summary'] ?? null),
'metadata' => $payload['metadata'] ?? [],
'content' => $this->nullableClean($payload['content_url'] ?? $payload['content_path'] ?? null),
'raw' => $this->nullableClean($payload['raw_url'] ?? $payload['raw_path'] ?? null),
'content' => $this->normalizedArchiveContent($payload),
'raw' => $this->rawArchiveContent($payload),
];
}
@ -200,6 +210,57 @@ class ArticleImportService
return $this->pageBlocksFromItems($payload, preg_split('/\R{2,}/u', $payload['content']));
}
private function normalizedArchiveContent(array $payload): ?string
{
if (isset($payload['pages']) && is_array($payload['pages'])) {
$parts = [];
foreach ($payload['pages'] as $page) {
if (!is_array($page) || !isset($page['content']) || !is_string($page['content'])) {
continue;
}
$content = $this->cleanMarkdownPage($page['content']);
if ($content !== '') {
$parts[] = $content;
}
}
return $this->nullableClean(implode("\n\n", $parts));
}
if (isset($payload['paragraphs']) && is_array($payload['paragraphs'])) {
$parts = [];
foreach ($payload['paragraphs'] as $paragraph) {
$content = is_array($paragraph) ? ($paragraph['content'] ?? '') : $paragraph;
if (!is_string($content)) {
continue;
}
$content = $this->clean($content);
if ($content !== '') {
$parts[] = $content;
}
}
return $this->nullableClean(implode("\n\n", $parts));
}
if (isset($payload['content']) && is_string($payload['content'])) {
return $this->normalizeArchiveContentString($payload['content']);
}
return null;
}
private function rawArchiveContent(array $payload): ?string
{
if (isset($payload['content']) && is_string($payload['content'])) {
return $this->normalizeArchiveRawString($payload['content']);
}
return null;
}
private function pageBlocksFromPages(array $payload): array
{
$pageBlocks = [];

View File

@ -7,6 +7,22 @@ use support\Db;
class ChunkSearchIndexRepository
{
public function resetEmbeddedChunksToPending(?string $archiveUid = null): int
{
$query = Db::table('chunks')
->where('embedding_status', EmbeddingStatus::EMBEDDED);
if ($archiveUid !== null && trim($archiveUid) !== '') {
$query->where('archive_uid', trim($archiveUid));
}
return $query->update([
'search_index_status' => SearchIndexStatus::PENDING,
'search_index_error' => null,
'search_index_updated_at' => null,
]);
}
public function queuePendingArchiveTasks(int $limit): array
{
$statuses = [
@ -63,6 +79,7 @@ class ChunkSearchIndexRepository
'chunks.created_time',
'chunks.updated_time',
'archives.title',
'archives.summary',
'archives.source',
'archives.author',
'archives.year',
@ -105,6 +122,7 @@ class ChunkSearchIndexRepository
'page_start' => $row->page_start === null ? null : (int) $row->page_start,
'page_end' => $row->page_end === null ? null : (int) $row->page_end,
'title' => $row->title,
'summary' => $row->summary,
'source' => $row->source,
'author' => $row->author,
'year' => $row->year === null ? null : (int) $row->year,

View File

@ -16,6 +16,7 @@ class OpenSearchChunkIndex
$index = $this->indexName();
if ($client->indices()->exists(['index' => $index])) {
$this->ensureProperties($client, $index);
return;
}
@ -64,6 +65,7 @@ class OpenSearchChunkIndex
'page_start' => ['type' => 'integer'],
'page_end' => ['type' => 'integer'],
'title' => $this->textWithKeyword(),
'summary' => ['type' => 'text'],
'source' => $this->textWithKeyword(),
'author' => $this->textWithKeyword(),
'year' => ['type' => 'integer'],
@ -93,6 +95,31 @@ class OpenSearchChunkIndex
return $this->client ?? (new OpenSearchClientFactory())->make();
}
private function ensureProperties(Client $client, string $index): void
{
$mapping = $client->indices()->getMapping(['index' => $index]);
$existing = $mapping[$index]['mappings']['properties'] ?? [];
$desired = $this->mapping()['mappings']['properties'] ?? [];
$missing = [];
foreach ($desired as $field => $definition) {
if (!array_key_exists($field, $existing)) {
$missing[$field] = $definition;
}
}
if ($missing === []) {
return;
}
$client->indices()->putMapping([
'index' => $index,
'body' => [
'properties' => $missing,
],
]);
}
private function indexName(): string
{
return config('opensearch.indices.chunks', 'proofdb_chunks');

View File

@ -39,6 +39,7 @@ class OpenSearchSearchService
'fields' => [
'text^4',
'title^3',
'summary^2',
'source^2',
'author^2',
'series^2',
@ -219,6 +220,7 @@ class OpenSearchSearchService
'page_start' => $source['page_start'] ?? null,
'page_end' => $source['page_end'] ?? null,
'title' => $source['title'] ?? null,
'summary' => $source['summary'] ?? null,
'source' => $source['source'] ?? null,
'author' => $source['author'] ?? null,
'year' => $source['year'] ?? null,
@ -322,6 +324,7 @@ class OpenSearchSearchService
'page_start',
'page_end',
'title',
'summary',
'source',
'author',
'year',

View File

@ -0,0 +1,835 @@
<!doctype html>
<html lang="zh-CN">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Proof DB 管理面板</title>
<link rel="stylesheet" href="/admin.css">
</head>
<body class="admin-dashboard">
<main class="admin-console page-shell-wide">
<section class="admin-console-shell panel">
<aside class="admin-console-sidebar">
<div class="admin-console-brand">
<div class="eyebrow">Maintenance Console / v<?=htmlspecialchars($version)?></div>
<h1 class="admin-console-title">Proof DB</h1>
<div class="admin-console-subtitle">管理员工作台</div>
</div>
<nav class="admin-console-nav">
<button class="admin-console-nav-item is-active" type="button" data-target="overview">总览</button>
<button class="admin-console-nav-item" type="button" data-target="archives">档案数据库</button>
<button class="admin-console-nav-item" type="button" data-target="opensearch">OpenSearch</button>
<button class="admin-console-nav-item" type="button" data-target="users">用户管理</button>
<button class="admin-console-nav-item" type="button" data-target="apidoc">API 文档</button>
<button class="admin-console-nav-item" type="button" data-target="scripts">维护脚本</button>
</nav>
<div class="admin-console-sidebar-foot">
<div class="metric-label">当前会话</div>
<div class="admin-console-identity"><?=htmlspecialchars($admin['display_name'] ?: $admin['username'])?></div>
<div class="admin-console-identity-sub">@<?=htmlspecialchars($admin['username'])?></div>
</div>
</aside>
<section class="admin-console-main">
<header class="admin-console-header">
<div>
<div class="eyebrow">Administrative Entry</div>
<h2 class="admin-console-header-title">Proof DB 管理面板</h2>
<p class="admin-console-header-copy">在这里维护 archives 表、OpenSearch 状态、管理员账号、API 文档,以及脚本级运维动作。</p>
</div>
<div class="admin-dashboard-actions">
<?php if ($archiveCaskUrl !== ''): ?>
<a class="button" href="<?=htmlspecialchars($archiveCaskUrl)?>">返回 Archive Cask</a>
<?php endif; ?>
<button class="button" id="logout-button" type="button">退出登录</button>
</div>
</header>
<div id="global-message" class="console-message" hidden></div>
<section class="admin-pane is-active" data-pane="overview">
<div class="admin-pane-head">
<div>
<h3 class="admin-dashboard-section-title">系统总览</h3>
<div class="admin-dashboard-section-note">集中查看数据库、OpenSearch 和当前版本状态。</div>
</div>
<button class="button" type="button" id="refresh-overview">刷新总览</button>
</div>
<div class="metric-grid admin-console-overview-grid" id="overview-metrics"></div>
<div class="admin-console-two-column">
<section class="admin-dashboard-section panel-soft">
<div class="admin-dashboard-section-head">
<h4 class="admin-dashboard-section-title">OpenSearch 摘要</h4>
<div class="admin-dashboard-section-note">来自管理员状态 API</div>
</div>
<div class="terminal-block" id="overview-opensearch-terminal">等待加载...</div>
</section>
<section class="admin-dashboard-section panel-soft">
<div class="admin-dashboard-section-head">
<h4 class="admin-dashboard-section-title">快速入口</h4>
<div class="admin-dashboard-section-note">直接跳到主要维护区块</div>
</div>
<div class="admin-console-quick-actions">
<button class="button" type="button" data-open-pane="archives">管理 archives</button>
<button class="button" type="button" data-open-pane="opensearch">查看 OpenSearch</button>
<button class="button" type="button" data-open-pane="users">管理用户</button>
<button class="button" type="button" data-open-pane="apidoc">查看 APIDOC</button>
<button class="button" type="button" data-open-pane="scripts">执行维护脚本</button>
</div>
</section>
</div>
</section>
<section class="admin-pane" data-pane="archives">
<div class="admin-pane-head">
<div>
<h3 class="admin-dashboard-section-title">archives 表管理</h3>
<div class="admin-dashboard-section-note">搜索、查看、编辑和删除档案记录。这里只操作 archives 表本身。</div>
</div>
</div>
<div class="admin-console-workbench">
<section class="admin-dashboard-section panel-soft">
<div class="admin-toolbar">
<input class="text-input admin-toolbar-input" id="archives-query" placeholder="按 archive_uid / title / summary / author / source 搜索">
<button class="button" type="button" id="archives-search">搜索</button>
<button class="button" type="button" id="archives-reload">刷新</button>
</div>
<div class="admin-list" id="archives-list"></div>
</section>
<section class="admin-dashboard-section panel-soft">
<div class="admin-dashboard-section-head">
<h4 class="admin-dashboard-section-title">档案编辑器</h4>
<div class="admin-dashboard-section-note">选择左侧档案后可编辑。</div>
</div>
<form id="archive-form" class="admin-form-grid">
<label class="field-label" for="archive-uid">archive_uid</label>
<input class="text-input" id="archive-uid" name="archive_uid" readonly>
<label class="field-label" for="archive-title">title</label>
<input class="text-input" id="archive-title" name="title">
<label class="field-label" for="archive-year">year</label>
<input class="text-input" id="archive-year" name="year">
<label class="field-label" for="archive-author">author</label>
<input class="text-input" id="archive-author" name="author">
<label class="field-label" for="archive-source">source</label>
<input class="text-input" id="archive-source" name="source">
<label class="field-label" for="archive-series">series</label>
<input class="text-input" id="archive-series" name="series">
<label class="field-label" for="archive-tags">tags</label>
<textarea class="text-area" id="archive-tags" name="tags" rows="2" placeholder="逗号或换行分隔"></textarea>
<label class="field-label" for="archive-summary">summary</label>
<textarea class="text-area" id="archive-summary" name="summary" rows="5"></textarea>
<label class="field-label" for="archive-metadata">metadata</label>
<textarea class="text-area admin-code-area" id="archive-metadata" name="metadata" rows="8" placeholder='{"key":"value"}'></textarea>
<label class="field-label" for="archive-content">content</label>
<textarea class="text-area admin-code-area" id="archive-content" name="content" rows="10"></textarea>
<label class="field-label" for="archive-raw">raw</label>
<textarea class="text-area admin-code-area" id="archive-raw" name="raw" rows="10"></textarea>
<div class="admin-form-actions">
<button class="button primary" type="submit">保存档案</button>
<button class="button" type="button" id="archive-delete">删除档案</button>
</div>
</form>
</section>
</div>
</section>
<section class="admin-pane" data-pane="opensearch">
<div class="admin-pane-head">
<div>
<h3 class="admin-dashboard-section-title">OpenSearch 管理</h3>
<div class="admin-dashboard-section-note">查看集群、索引、数据库侧索引状态,以及索引中的文档粗览。</div>
</div>
<button class="button" type="button" id="opensearch-refresh">刷新状态</button>
</div>
<div class="metric-grid admin-console-opensearch-grid" id="opensearch-metrics"></div>
<section class="admin-dashboard-section panel-soft">
<div class="admin-toolbar">
<input class="text-input admin-toolbar-input" id="opensearch-query" placeholder="按 title / summary / source / author / text 搜索索引文档">
<button class="button" type="button" id="opensearch-search">搜索索引</button>
<button class="button" type="button" id="opensearch-reload-docs">刷新文档</button>
</div>
<div class="admin-list" id="opensearch-documents"></div>
</section>
<div class="admin-console-two-column">
<section class="admin-dashboard-section panel-soft">
<div class="admin-dashboard-section-head">
<h4 class="admin-dashboard-section-title">OpenSearch 详情</h4>
<div class="admin-dashboard-section-note">主机、索引、mapping 字段等。</div>
</div>
<div class="terminal-block" id="opensearch-terminal">等待加载...</div>
</section>
<section class="admin-dashboard-section panel-soft">
<div class="admin-dashboard-section-head">
<h4 class="admin-dashboard-section-title">建议动作</h4>
<div class="admin-dashboard-section-note">跳转到脚本面板执行维护脚本。</div>
</div>
<div class="admin-console-quick-actions">
<button class="button" type="button" data-script="setup_opensearch">执行 setup_opensearch</button>
<button class="button" type="button" data-script="reindex_opensearch">执行 reindex_opensearch</button>
</div>
</section>
</div>
</section>
<section class="admin-pane" data-pane="users">
<div class="admin-pane-head">
<div>
<h3 class="admin-dashboard-section-title">管理员用户管理</h3>
<div class="admin-dashboard-section-note">创建管理员账号,修改显示名、密码与启用状态。</div>
</div>
</div>
<div class="admin-console-two-column">
<section class="admin-dashboard-section panel-soft">
<div class="admin-dashboard-section-head">
<h4 class="admin-dashboard-section-title">创建新管理员</h4>
<div class="admin-dashboard-section-note">账号创建后即默认启用。</div>
</div>
<form id="create-user-form" class="admin-form-grid compact">
<label class="field-label" for="new-username">username</label>
<input class="text-input" id="new-username" name="username" required>
<label class="field-label" for="new-display-name">display_name</label>
<input class="text-input" id="new-display-name" name="display_name">
<label class="field-label" for="new-password">password</label>
<input class="text-input" id="new-password" name="password" type="password" required>
<div class="admin-form-actions">
<button class="button primary" type="submit">创建用户</button>
</div>
</form>
</section>
<section class="admin-dashboard-section panel-soft">
<div class="admin-dashboard-section-head">
<h4 class="admin-dashboard-section-title">现有管理员</h4>
<div class="admin-dashboard-section-note">留空密码表示不修改。</div>
</div>
<div class="admin-user-list" id="users-list"></div>
</section>
</div>
</section>
<section class="admin-pane" data-pane="apidoc">
<div class="admin-pane-head">
<div>
<h3 class="admin-dashboard-section-title">APIDOC 查看</h3>
<div class="admin-dashboard-section-note">浏览 `/apidoc` 中的接口文档。</div>
</div>
</div>
<div class="admin-console-workbench">
<section class="admin-dashboard-section panel-soft">
<div class="admin-list" id="docs-list"></div>
</section>
<section class="admin-dashboard-section panel-soft">
<div class="admin-dashboard-section-head">
<h4 class="admin-dashboard-section-title" id="doc-title">文档内容</h4>
<div class="admin-dashboard-section-note" id="doc-name">请选择一份文档。</div>
</div>
<div class="admin-markdown-viewer" id="doc-content">等待加载...</div>
</section>
</div>
</section>
<section class="admin-pane" data-pane="scripts">
<div class="admin-pane-head">
<div>
<h3 class="admin-dashboard-section-title">维护脚本伪终端</h3>
<div class="admin-dashboard-section-note">仅允许执行白名单中的 `scripts/*.php` 维护脚本。</div>
</div>
</div>
<div class="admin-console-two-column">
<section class="admin-dashboard-section panel-soft">
<div class="admin-dashboard-section-head">
<h4 class="admin-dashboard-section-title">可执行脚本</h4>
<div class="admin-dashboard-section-note">支持 `--key=value` 参数格式。</div>
</div>
<div class="admin-script-list" id="scripts-list"></div>
</section>
<section class="admin-dashboard-section panel-soft">
<div class="admin-dashboard-section-head">
<h4 class="admin-dashboard-section-title">执行终端</h4>
<div class="admin-dashboard-section-note">脚本 stdout / stderr 会显示在下方。</div>
</div>
<form id="script-form" class="admin-form-grid compact">
<label class="field-label" for="script-select">script_name</label>
<select class="text-input" id="script-select" name="script_name"></select>
<label class="field-label" for="script-args">args</label>
<input class="text-input" id="script-args" name="args" placeholder="--archive_uid=01...">
<div class="admin-form-actions">
<button class="button primary" type="submit">执行脚本</button>
</div>
</form>
<div class="terminal-block" id="script-terminal"><span class="prompt">proofdb-admin$</span> 等待命令...</div>
</section>
</div>
<section class="admin-dashboard-section panel-soft">
<div class="admin-dashboard-section-head">
<h4 class="admin-dashboard-section-title" id="script-doc-title">脚本文档</h4>
<div class="admin-dashboard-section-note" id="script-doc-name">如果该脚本有文档,会显示在这里。</div>
</div>
<div class="admin-markdown-viewer" id="script-doc-content">等待加载...</div>
</section>
</section>
</section>
</section>
</main>
<script>
const state = {
archiveUid: null,
docName: null,
docHtml: null,
scripts: [],
opensearch: null,
selectedScript: null
};
const els = {
message: document.getElementById('global-message'),
overviewMetrics: document.getElementById('overview-metrics'),
overviewTerminal: document.getElementById('overview-opensearch-terminal'),
archivesList: document.getElementById('archives-list'),
archiveForm: document.getElementById('archive-form'),
archiveDelete: document.getElementById('archive-delete'),
opensearchMetrics: document.getElementById('opensearch-metrics'),
opensearchTerminal: document.getElementById('opensearch-terminal'),
opensearchDocuments: document.getElementById('opensearch-documents'),
usersList: document.getElementById('users-list'),
createUserForm: document.getElementById('create-user-form'),
docsList: document.getElementById('docs-list'),
docTitle: document.getElementById('doc-title'),
docName: document.getElementById('doc-name'),
docContent: document.getElementById('doc-content'),
scriptsList: document.getElementById('scripts-list'),
scriptForm: document.getElementById('script-form'),
scriptSelect: document.getElementById('script-select'),
scriptArgs: document.getElementById('script-args'),
scriptTerminal: document.getElementById('script-terminal'),
scriptDocTitle: document.getElementById('script-doc-title'),
scriptDocName: document.getElementById('script-doc-name'),
scriptDocContent: document.getElementById('script-doc-content')
};
function field(form, name) {
return form.elements.namedItem(name);
}
function showMessage(text, kind = 'info') {
if (!text) {
els.message.hidden = true;
els.message.textContent = '';
els.message.className = 'console-message';
return;
}
els.message.hidden = false;
els.message.textContent = text;
els.message.className = `console-message is-${kind}`;
}
function escapeHtml(value) {
return String(value ?? '')
.replaceAll('&', '&amp;')
.replaceAll('<', '&lt;')
.replaceAll('>', '&gt;')
.replaceAll('"', '&quot;');
}
async function api(url, options = {}) {
const response = await fetch(url, {
...options,
headers: {
'Content-Type': 'application/json',
...(options.headers || {})
}
});
const data = await response.json();
if (!response.ok || data.code !== 0) {
throw new Error(data.errors ? Object.values(data.errors)[0] : data.message || '请求失败');
}
return data.data;
}
function activatePane(name) {
document.querySelectorAll('.admin-console-nav-item').forEach((button) => {
button.classList.toggle('is-active', button.dataset.target === name);
});
document.querySelectorAll('.admin-pane').forEach((pane) => {
pane.classList.toggle('is-active', pane.dataset.pane === name);
});
}
function tokeniseArgs(text) {
const tokens = text.match(/"[^"]*"|'[^']*'|\S+/g) || [];
return tokens.map((token) => token.replace(/^['"]|['"]$/g, ''));
}
function renderMetricCards(metrics) {
return metrics.map((metric) => `
<div class="metric-card">
<div class="metric-label">${escapeHtml(metric.label)}</div>
<div class="metric-value">${escapeHtml(metric.value)}</div>
<div class="metric-subvalue">${escapeHtml(metric.note || '')}</div>
</div>
`).join('');
}
async function loadOpenSearchStatus() {
const data = await api('/api/admin/opensearch/status');
state.opensearch = data;
const metrics = [
{label: 'archives', value: data.database.archives_total, note: 'archives 表记录数'},
{label: 'chunks', value: data.database.chunks_total, note: 'chunks 表记录数'},
{label: 'embedded', value: data.database.embedded_chunks, note: 'embedding_status = 3'},
{label: 'indexed', value: data.database.indexed_chunks, note: 'search_index_status = 3'},
{label: 'index', value: data.config.index_name, note: data.opensearch.index_exists ? '索引已存在' : '索引不存在'},
{label: 'docs.count', value: data.opensearch.docs_count, note: data.opensearch.health || '未获取健康状态'}
];
els.overviewMetrics.innerHTML = renderMetricCards(metrics);
els.opensearchMetrics.innerHTML = renderMetricCards(metrics);
const terminal = [
`hosts: ${(data.config.hosts || []).join(', ') || '[]'}`,
`ssl_verify: ${String(data.config.ssl_verify)}`,
`cluster_name: ${data.opensearch.cluster_name || '-'}`,
`reachable: ${String(data.opensearch.reachable)}`,
`index_exists: ${String(data.opensearch.index_exists)}`,
`health: ${data.opensearch.health || '-'}`,
`mapping_fields: ${(data.opensearch.mapping_fields || []).join(', ') || '-'}`,
data.opensearch.error ? `error: ${data.opensearch.error}` : ''
].filter(Boolean).join('\n');
els.overviewTerminal.textContent = terminal;
els.opensearchTerminal.textContent = terminal;
}
async function loadOpenSearchDocuments() {
const query = document.getElementById('opensearch-query').value.trim();
const data = await api(`/api/admin/opensearch/documents?query=${encodeURIComponent(query)}&size=20`);
if ((data.items || []).length === 0) {
els.opensearchDocuments.innerHTML = '<div class="admin-list-empty">当前没有可展示的索引文档。</div>';
return;
}
els.opensearchDocuments.innerHTML = data.items.map((item) => `
<article class="admin-list-item no-click">
<div class="admin-list-item-head">
<strong>${escapeHtml(item.title || item.chunk_uid)}</strong>
<span>${escapeHtml(item.chunk_uid || '-')}</span>
</div>
<div class="admin-list-item-copy">${escapeHtml(item.text_preview || item.summary || '无预览文本')}</div>
<div class="admin-list-item-meta">
<span>${escapeHtml(item.archive_uid || '-')}</span>
<span>p.${escapeHtml(item.page_start || '-')} - ${escapeHtml(item.page_end || '-')}</span>
<span>${escapeHtml(item.source || '-')}</span>
<span>${escapeHtml(item.embedding_model || '-')}</span>
</div>
</article>
`).join('');
}
async function loadArchives() {
const query = document.getElementById('archives-query').value.trim();
const data = await api(`/api/admin/archives?query=${encodeURIComponent(query)}`);
if (data.items.length === 0) {
els.archivesList.innerHTML = '<div class="admin-list-empty">没有找到档案记录。</div>';
return;
}
els.archivesList.innerHTML = data.items.map((item) => `
<button class="admin-list-item ${state.archiveUid === item.archive_uid ? 'is-active' : ''}" type="button" data-archive="${escapeHtml(item.archive_uid)}">
<div class="admin-list-item-head">
<strong>${escapeHtml(item.title || item.archive_uid)}</strong>
<span>${escapeHtml(item.archive_uid)}</span>
</div>
<div class="admin-list-item-copy">${escapeHtml(item.summary || '无 summary')}</div>
<div class="admin-list-item-meta">
<span>${escapeHtml(item.source || '-')}</span>
<span>${escapeHtml(item.year || '-')}</span>
<span>${escapeHtml(item.chunk_count)} chunks</span>
</div>
</button>
`).join('');
els.archivesList.querySelectorAll('[data-archive]').forEach((button) => {
button.addEventListener('click', async () => {
try {
await loadArchiveDetail(button.dataset.archive);
} catch (error) {
showMessage(error.message || '加载档案详情失败。', 'error');
}
});
});
if (!state.archiveUid && data.items[0]) {
await loadArchiveDetail(data.items[0].archive_uid);
}
}
async function loadArchiveDetail(archiveUid) {
const data = await api(`/api/admin/archives/${encodeURIComponent(archiveUid)}`);
state.archiveUid = archiveUid;
document.querySelectorAll('#archives-list [data-archive]').forEach((item) => {
item.classList.toggle('is-active', item.dataset.archive === archiveUid);
});
field(els.archiveForm, 'archive_uid').value = data.archive_uid || '';
field(els.archiveForm, 'title').value = data.title || '';
field(els.archiveForm, 'year').value = data.year || '';
field(els.archiveForm, 'author').value = data.author || '';
field(els.archiveForm, 'source').value = data.source || '';
field(els.archiveForm, 'series').value = data.series || '';
field(els.archiveForm, 'tags').value = (data.tags || []).join(', ');
field(els.archiveForm, 'summary').value = data.summary || '';
field(els.archiveForm, 'metadata').value = JSON.stringify(data.metadata || {}, null, 2);
field(els.archiveForm, 'content').value = data.content || '';
field(els.archiveForm, 'raw').value = data.raw || '';
}
async function saveArchive(event) {
event.preventDefault();
if (!state.archiveUid) {
showMessage('请先选择一条档案记录。', 'error');
return;
}
let metadata;
try {
const metadataField = field(els.archiveForm, 'metadata');
metadata = metadataField.value.trim() === '' ? {} : JSON.parse(metadataField.value);
} catch (error) {
showMessage('metadata 不是合法 JSON。', 'error');
return;
}
const payload = {
title: field(els.archiveForm, 'title').value,
year: field(els.archiveForm, 'year').value,
author: field(els.archiveForm, 'author').value,
source: field(els.archiveForm, 'source').value,
series: field(els.archiveForm, 'series').value,
tags: field(els.archiveForm, 'tags').value,
summary: field(els.archiveForm, 'summary').value,
metadata,
content: field(els.archiveForm, 'content').value,
raw: field(els.archiveForm, 'raw').value
};
await api(`/api/admin/archives/${encodeURIComponent(state.archiveUid)}`, {
method: 'PATCH',
body: JSON.stringify(payload)
});
showMessage('档案已更新。', 'success');
await loadArchiveDetail(state.archiveUid);
}
async function deleteArchive() {
if (!state.archiveUid) {
showMessage('请先选择一条档案记录。', 'error');
return;
}
if (!window.confirm(`确认删除档案 ${state.archiveUid} 吗?这会级联删除相关 chunks。`)) {
return;
}
await api(`/api/admin/archives/${encodeURIComponent(state.archiveUid)}`, {method: 'DELETE'});
showMessage('档案已删除。', 'success');
state.archiveUid = null;
els.archiveForm.reset();
await loadArchives();
await loadOpenSearchStatus();
}
function renderUsers(users) {
els.usersList.innerHTML = users.map((user) => `
<form class="admin-user-card" data-user-id="${user.id}">
<div class="admin-user-card-head">
<strong>${escapeHtml(user.username)}</strong>
<span>${user.is_active ? '启用中' : '已停用'}</span>
</div>
<label class="field-label">display_name</label>
<input class="text-input" name="display_name" value="${escapeHtml(user.display_name || '')}">
<label class="field-label">new_password</label>
<input class="text-input" type="password" name="password" placeholder="留空则不修改">
<label class="admin-inline-check">
<input type="checkbox" name="is_active" ${user.is_active ? 'checked' : ''}>
<span>is_active</span>
</label>
<div class="admin-form-actions">
<button class="button" type="submit">保存用户</button>
</div>
</form>
`).join('');
els.usersList.querySelectorAll('.admin-user-card').forEach((form) => {
form.addEventListener('submit', async (event) => {
event.preventDefault();
const id = form.dataset.userId;
await api(`/api/admin/users/${id}`, {
method: 'PATCH',
body: JSON.stringify({
display_name: field(form, 'display_name').value,
password: field(form, 'password').value,
is_active: field(form, 'is_active').checked
})
});
showMessage(`用户 ${id} 已更新。`, 'success');
await loadUsers();
});
});
}
async function loadUsers() {
const data = await api('/api/admin/users');
renderUsers(data.items || []);
}
async function createUser(event) {
event.preventDefault();
await api('/api/admin/users', {
method: 'POST',
body: JSON.stringify({
username: field(els.createUserForm, 'username').value.trim(),
display_name: field(els.createUserForm, 'display_name').value.trim(),
password: field(els.createUserForm, 'password').value
})
});
els.createUserForm.reset();
showMessage('管理员用户已创建。', 'success');
await loadUsers();
}
async function loadDocs() {
const data = await api('/api/admin/docs');
els.docsList.innerHTML = data.items.map((doc) => `
<button class="admin-list-item ${state.docName === doc.name ? 'is-active' : ''}" type="button" data-doc="${escapeHtml(doc.name)}">
<div class="admin-list-item-head">
<strong>${escapeHtml(doc.title)}</strong>
<span>${escapeHtml(doc.name)}</span>
</div>
</button>
`).join('');
els.docsList.querySelectorAll('[data-doc]').forEach((button) => {
button.addEventListener('click', () => loadDoc(button.dataset.doc));
});
if (!state.docName && data.items[0]) {
await loadDoc(data.items[0].name);
}
}
async function loadDoc(name) {
const data = await api(`/api/admin/docs/${encodeURIComponent(name)}`);
state.docName = name;
els.docTitle.textContent = data.title;
els.docName.textContent = data.name;
els.docContent.innerHTML = data.html || '';
await loadDocs();
}
async function loadScripts() {
const data = await api('/api/admin/scripts');
state.scripts = data.items || [];
els.scriptSelect.innerHTML = state.scripts.map((script) => `
<option value="${escapeHtml(script.name)}">${escapeHtml(script.label)} (${escapeHtml(script.name)})</option>
`).join('');
els.scriptsList.innerHTML = state.scripts.map((script) => `
<button class="admin-script-card" type="button" data-script-select="${escapeHtml(script.name)}">
<div class="admin-script-card-head">
<strong>${escapeHtml(script.label)}</strong>
<span>${escapeHtml(script.name)}</span>
</div>
<div class="admin-script-card-copy">${escapeHtml(script.description)}</div>
<div class="admin-script-card-meta">
<span>${escapeHtml(script.args_hint)}</span>
<span>${escapeHtml(script.doc_name)}</span>
</div>
</button>
`).join('');
els.scriptsList.querySelectorAll('[data-script-select]').forEach((button) => {
button.addEventListener('click', async () => {
els.scriptSelect.value = button.dataset.scriptSelect;
await loadScriptDoc(button.dataset.scriptSelect);
});
});
if (!state.selectedScript && state.scripts[0]) {
els.scriptSelect.value = state.scripts[0].name;
await loadScriptDoc(state.scripts[0].name);
}
}
async function loadScriptDoc(name) {
const data = await api(`/api/admin/scripts/${encodeURIComponent(name)}`);
state.selectedScript = name;
els.scriptDocTitle.textContent = data.doc_title || data.label || data.name;
els.scriptDocName.textContent = data.doc_name || '暂无文档';
els.scriptDocContent.innerHTML = data.doc_html || '<p>该脚本暂时没有文档。</p>';
}
async function runScript(event) {
event.preventDefault();
const scriptName = els.scriptSelect.value;
const args = tokeniseArgs(els.scriptArgs.value.trim());
const result = await api('/api/admin/scripts/run', {
method: 'POST',
body: JSON.stringify({
script_name: scriptName,
args
})
});
els.scriptTerminal.textContent = [
`$ ${result.command.join(' ')}`,
'',
result.stdout ? `[stdout]\n${result.stdout}` : '',
result.stderr ? `[stderr]\n${result.stderr}` : '',
`exit_code: ${result.exit_code}`
].filter(Boolean).join('\n');
showMessage(`脚本 ${scriptName} 执行完成。`, result.ok ? 'success' : 'error');
await loadOpenSearchStatus();
}
document.querySelectorAll('.admin-console-nav-item').forEach((button) => {
button.addEventListener('click', async () => {
activatePane(button.dataset.target);
showMessage('');
if (button.dataset.target === 'archives') {
await loadArchives();
} else if (button.dataset.target === 'opensearch' || button.dataset.target === 'overview') {
await loadOpenSearchStatus();
if (button.dataset.target === 'opensearch') {
await loadOpenSearchDocuments();
}
} else if (button.dataset.target === 'users') {
await loadUsers();
} else if (button.dataset.target === 'apidoc') {
await loadDocs();
} else if (button.dataset.target === 'scripts') {
await loadScripts();
}
});
});
document.querySelectorAll('[data-open-pane]').forEach((button) => {
button.addEventListener('click', () => {
document.querySelector(`.admin-console-nav-item[data-target="${button.dataset.openPane}"]`)?.click();
});
});
document.querySelectorAll('[data-script]').forEach((button) => {
button.addEventListener('click', async () => {
activatePane('scripts');
await loadScripts();
els.scriptSelect.value = button.dataset.script;
showMessage(`已切换到维护脚本,并选中 ${button.dataset.script}。`);
});
});
document.getElementById('logout-button').addEventListener('click', async () => {
await fetch('/api/admin/logout', {method: 'POST'});
window.location.href = '/';
});
document.getElementById('refresh-overview').addEventListener('click', loadOpenSearchStatus);
document.getElementById('opensearch-refresh').addEventListener('click', loadOpenSearchStatus);
document.getElementById('opensearch-search').addEventListener('click', loadOpenSearchDocuments);
document.getElementById('opensearch-reload-docs').addEventListener('click', loadOpenSearchDocuments);
document.getElementById('archives-search').addEventListener('click', loadArchives);
document.getElementById('archives-reload').addEventListener('click', loadArchives);
els.scriptSelect.addEventListener('change', async () => {
try {
await loadScriptDoc(els.scriptSelect.value);
} catch (error) {
showMessage(error.message || '加载脚本文档失败。', 'error');
}
});
els.archiveForm.addEventListener('submit', async (event) => {
try {
await saveArchive(event);
} catch (error) {
showMessage(error.message || '保存档案失败。', 'error');
}
});
els.archiveDelete.addEventListener('click', async () => {
try {
await deleteArchive();
} catch (error) {
showMessage(error.message || '删除档案失败。', 'error');
}
});
els.createUserForm.addEventListener('submit', async (event) => {
try {
await createUser(event);
} catch (error) {
showMessage(error.message || '创建用户失败。', 'error');
}
});
els.scriptForm.addEventListener('submit', async (event) => {
try {
await runScript(event);
} catch (error) {
showMessage(error.message || '脚本执行失败。', 'error');
}
});
(async function bootstrap() {
try {
await loadOpenSearchStatus();
} catch (error) {
showMessage(error.message || '加载总览失败。', 'error');
}
})();
</script>
</body>
</html>

View File

@ -0,0 +1,83 @@
<!doctype html>
<html lang="zh-CN">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Proof DB Admin</title>
<link rel="stylesheet" href="/admin.css">
</head>
<body class="admin-landing">
<main class="admin-landing-page">
<header class="topbar page-shell">
<div class="brand">
<span class="brand-mark" aria-hidden="true"></span>
<span>Proof DB Admin Console</span>
</div>
<div>Version <?=htmlspecialchars($version)?></div>
</header>
<section class="admin-landing-hero page-shell">
<section class="admin-landing-intro panel">
<div>
<div class="eyebrow">Administrative Entry</div>
<h1 class="admin-landing-title">Proof DB</h1><br><h2>档案数据中心</h2>
<p class="lead admin-landing-lead">
档案储存 标签处理 向量存储 全文搜索
</p>
</div>
<div class="admin-landing-meta">
<div>
<div class="metric-label">关于Proof DB</div>
<div class="admin-landing-meta-value">ProofDB是一个专业级的历史档案后端数据中心集成档案数据库、全文搜索引擎和RAG向量引擎。</div>
</div>
</div>
</section>
<section class="admin-landing-portal panel-soft">
<div class="admin-landing-portal-head">
<span>Access Control</span>
<span>v1</span>
</div>
<div>
<p class="admin-landing-portal-title">选择进入路径</p>
<p class="admin-landing-portal-copy">Tips: PoofDB的Proof是酒精度的意思</p>
</div>
<div class="admin-landing-button-stack">
<a class="button primary <?=$archiveCaskUrl === '' ? 'disabled' : ''?>" href="<?=htmlspecialchars($archiveCaskUrl ?: '#')?>">
<span>返回 Archive Cask</span>
<span class="button-key">BACK</span>
</a>
<a class="button" href="/admin/login">
<span>管理员登录</span>
<span class="button-key">Login</span>
</a>
</div>
<div class="admin-landing-status">
<?php if ($archiveCaskUrl === ''): ?>
<strong>Archive Cask 未配置。</strong> 请在 `.env` 中设置 `ARCHIVE_CASK_URL`。
<?php else: ?>
<strong>Archive Cask 已连接。</strong>
<?php endif; ?>
</div>
</section>
</section>
<footer class="footer page-shell">
<style>
lower {
text-transform: lowercase;
}
</style>
<span>
Proof DB
<lower>by</lower>
<a href="https://laysense.cn/" target="_blank">Shanghai Laysense Information Technology Co. Ltd.</a>
</span>
<span>Admin Surface / v<?=htmlspecialchars($version)?></span>
</footer>
</main>
</body>
</html>

87
app/view/admin/login.html Normal file
View File

@ -0,0 +1,87 @@
<!doctype html>
<html lang="zh-CN">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>管理员登录</title>
<link rel="stylesheet" href="/admin.css">
</head>
<body class="admin-login">
<main class="admin-login-layout page-shell-wide">
<section class="admin-login-intro panel">
<div>
<div class="eyebrow">Protected Access / v<?=htmlspecialchars($version)?></div>
<h1 class="admin-login-title">管理员<br />登录</h1>
<p class="lead admin-login-lead">登入 Proof DB 管理后台。</p>
</div>
<div class="admin-login-facts">
<div>
<div class="metric-label">身份鉴别</div>
<div class="admin-landing-meta-value">内部账号密码</div>
</div>
</div>
</section>
<section class="admin-login-form-shell panel-soft">
<div class="admin-login-form-head">
<p class="admin-landing-portal-title">进入维护面板</p>
<p class="admin-login-form-copy">请使用您的账号和密码进行登录</p>
</div>
<div id="error" class="error-box"></div>
<form id="login-form">
<label class="field-label" for="username">用户名</label>
<input class="text-input" id="username" name="username" autocomplete="username" required>
<label class="field-label" for="password">密码</label>
<input class="text-input" id="password" name="password" type="password" autocomplete="current-password" required>
<div class="admin-login-actions">
<button class="button primary" type="submit">登录</button>
<a class="button" href="/">返回入口</a>
<?php if ($archiveCaskUrl !== ''): ?>
<a class="button" href="<?=htmlspecialchars($archiveCaskUrl)?>">返回 Archive Cask</a>
<?php endif; ?>
</div>
</form>
<div class="admin-login-footnote">如忘记密码请联系数据库管理员。</div>
</section>
</main>
<script>
const form = document.getElementById('login-form');
const errorBox = document.getElementById('error');
form.addEventListener('submit', async (event) => {
event.preventDefault();
errorBox.style.display = 'none';
errorBox.textContent = '';
const payload = {
username: form.username.value.trim(),
password: form.password.value
};
try {
const response = await fetch('/api/admin/login', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify(payload)
});
const data = await response.json();
if (!response.ok || data.code !== 0) {
throw new Error(data.errors?.auth || data.message || '登录失败。');
}
window.location.href = '/admin';
} catch (error) {
errorBox.textContent = error.message || '登录失败。';
errorBox.style.display = 'block';
}
});
</script>
</body>
</html>

5
config/admin.php Normal file
View File

@ -0,0 +1,5 @@
<?php
return [
'archive_cask_url' => getenv('ARCHIVE_CASK_URL') ?: '',
];

View File

@ -20,14 +20,14 @@ return [
* - OPENSEARCH_HOSTS=http://127.0.0.1:9200
* - OPENSEARCH_USERNAME=admin
* - OPENSEARCH_PASSWORD=...
* - OPENSEARCH_SSL_VERIFY=true
* - OPENSEARCH_SSL_VERIFY=false
* - OPENSEARCH_INDEX_CHUNKS=proofdb_chunks
*/
'default' => [
'hosts' => $hosts,
'username' => getenv('OPENSEARCH_USERNAME') ?: null,
'password' => getenv('OPENSEARCH_PASSWORD') ?: null,
'ssl_verify' => $bool('OPENSEARCH_SSL_VERIFY', true),
'ssl_verify' => $bool('OPENSEARCH_SSL_VERIFY', false),
'timeout' => (float) (getenv('OPENSEARCH_TIMEOUT') ?: 30),
'connect_timeout' => (float) (getenv('OPENSEARCH_CONNECT_TIMEOUT') ?: 5),
],

View File

@ -14,8 +14,33 @@
use Webman\Route;
Route::get('/', [app\controller\AdminController::class, 'landing']);
Route::get('/admin/login', [app\controller\AdminController::class, 'login']);
Route::get('/admin', [app\controller\AdminController::class, 'dashboard']);
Route::post('/api/articles/import', [app\controller\Api\ArticleImportController::class, 'import']);
Route::post('/api/admin/login', [app\controller\Api\AdminAuthController::class, 'login']);
Route::post('/api/admin/logout', [app\controller\Api\AdminAuthController::class, 'logout']);
Route::get('/api/admin/me', [app\controller\Api\AdminAuthController::class, 'me']);
Route::get('/api/admin/archives', [app\controller\Api\AdminConsoleController::class, 'archives']);
Route::get('/api/admin/archives/{archiveUid}', [app\controller\Api\AdminConsoleController::class, 'archive']);
Route::patch('/api/admin/archives/{archiveUid}', [app\controller\Api\AdminConsoleController::class, 'updateArchive']);
Route::delete('/api/admin/archives/{archiveUid}', [app\controller\Api\AdminConsoleController::class, 'deleteArchive']);
Route::get('/api/admin/opensearch/status', [app\controller\Api\AdminConsoleController::class, 'openSearchStatus']);
Route::get('/api/admin/opensearch/documents', [app\controller\Api\AdminConsoleController::class, 'openSearchDocuments']);
Route::get('/api/admin/users', [app\controller\Api\AdminConsoleController::class, 'users']);
Route::post('/api/admin/users', [app\controller\Api\AdminConsoleController::class, 'createUser']);
Route::patch('/api/admin/users/{id}', [app\controller\Api\AdminConsoleController::class, 'updateUser']);
Route::get('/api/admin/docs', [app\controller\Api\AdminConsoleController::class, 'docs']);
Route::get('/api/admin/docs/{name}', [app\controller\Api\AdminConsoleController::class, 'doc']);
Route::get('/api/admin/scripts', [app\controller\Api\AdminConsoleController::class, 'scripts']);
Route::get('/api/admin/scripts/{name}', [app\controller\Api\AdminConsoleController::class, 'script']);
Route::post('/api/admin/scripts/run', [app\controller\Api\AdminConsoleController::class, 'runScript']);
Route::post('/api/search/fulltext', [app\controller\Api\SearchController::class, 'fulltext']);
Route::post('/api/search/vector', [app\controller\Api\SearchController::class, 'vector']);
Route::post('/api/search/hybrid', [app\controller\Api\SearchController::class, 'hybrid']);
Route::get('/api/archives/{archive_uid}', [app\controller\Api\EvidenceController::class, 'archive']);
Route::get('/api/archives/{archive_uid}/chunks', [app\controller\Api\EvidenceController::class, 'archiveChunks']);
Route::get('/api/archives/{archive_uid}/evidence', [app\controller\Api\EvidenceController::class, 'archiveEvidence']);
Route::get('/api/chunks/{chunk_uid}', [app\controller\Api\EvidenceController::class, 'chunk']);
Route::get('/api/evidence/{chunk_uid}', [app\controller\Api\EvidenceController::class, 'evidence']);

1054
public/admin.css Normal file

File diff suppressed because it is too large Load Diff

View File

@ -233,8 +233,11 @@ GET /api/evidence/{chunk_uid}
- [x] `archive_uid` uses ULID and `chunk_uid` follows `{archive_uid}_{chunk_index}_{short_uid}`.
- [x] Runtime import snapshot writing is implemented under `runtime/proofdb/imports/{import_uid}.json`.
- [x] Relational persistence is implemented through `ArchiveRepository::saveImport()`, including `archives` and `chunks` writes.
- [x] Minimal admin entry frontend exists: landing page with Archive Cask redirect and admin login, plus a session-backed admin dashboard shell.
- [x] Admin dashboard now includes archives-table management, OpenSearch status, admin-user management, APIDOC viewing, and a whitelist-based maintenance-script terminal.
- [x] PostgreSQL is the selected relational database, matching current `pgsql`, JSONB, `BIGSERIAL`, and `TIMESTAMPTZ` implementation.
- [x] PostgreSQL setup script exists for creating `archives` and `chunks` tables plus indexes.
- [x] Admin user bootstrap script exists for creating `admin_users` and seeding/updating an admin account.
- [x] Async AI metadata queue exists on Redis with pending, delayed, failed, retry, and error keys.
- [x] `ai_metadata` Workerman process is registered and can consume Redis jobs.
- [x] OpenAI-compatible chat client exists for metadata enrichment.
@ -246,6 +249,7 @@ GET /api/evidence/{chunk_uid}
- [x] OpenSearch client factory is implemented and supports passwordless local OpenSearch when security is disabled.
- [x] OpenSearch `proofdb_chunks` hybrid index mapping exists with BM25 text fields and a 2048-dimension `knn_vector` embedding field.
- [x] OpenSearch search-index task handler is implemented and writes embedded chunks through bulk upsert.
- [x] Archive-level `summary` metadata is written into OpenSearch chunk documents and participates in BM25 search alongside `text`, `title`, and other metadata fields.
- [x] End-to-end embedding-to-OpenSearch smoke test passed for 14 chunks: all are `embedding_status=embedded`, `search_index_status=indexed`, and OpenSearch documents contain 2048-dimension vectors.
- [x] Full-text search service, route, controller, and external API documentation are implemented for `POST /api/search/fulltext`.
- [x] Full-text OpenSearch smoke test passed with `query="policy documents"`, returning 12 total hits from indexed chunks.
@ -254,13 +258,19 @@ GET /api/evidence/{chunk_uid}
- [x] Hybrid search service, route, controller, and external API documentation are implemented for `POST /api/search/hybrid` using Reciprocal Rank Fusion over full-text and vector candidates.
- [x] Hybrid smoke tests passed: English query combines fulltext/vector ranks, and Chinese query falls back to vector recall with the Iraq/Kuwait/Desert Storm chunk as top hit.
- [x] Hybrid search supports `ai=true`: the original query is used for vector search, while the full-text query is rewritten into BM25 keywords through the existing OpenAI-compatible LLM chat path. Keyword generation has a shorter timeout and falls back to the original query on failure.
- [x] Chunk detail API and evidence API are implemented with external documentation: `GET /api/chunks/{chunk_uid}` and `GET /api/evidence/{chunk_uid}`.
- [x] Archive detail API is implemented with external documentation: `GET /api/archives/{archive_uid}`.
- [x] Archive chunk-list and archive evidence-list APIs are implemented with external documentation: `GET /api/archives/{archive_uid}/chunks` and `GET /api/archives/{archive_uid}/evidence`.
- [x] Evidence smoke test passed for `01KQHVREB6XPYF604RVZAP9NNY_1_39003`, returning page label, citation string, and chunk quote.
- [x] Historical `archives.content` can now be repaired with `php scripts/backfill_archive_content.php`, using normalized `raw` when available and ordered chunk text as fallback.
- [x] OpenSearch repair/reindex maintenance script exists: `php scripts/reindex_opensearch.php`, with optional `--archive_uid=...` targeting.
### Partially Done
- [ ] Archive/Page/Chunk model is partly persisted: `archives` and `chunks` tables exist, but pages/page blocks are only summarized in import output and snapshots, not stored as first-class relational tables.
- [ ] `embedding_status`, `embedding_ref`, `embedding_model`, `embedding_error`, and `embedding_updated_at` fields exist; embedding generation into PostgreSQL JSONB and OpenSearch vector indexing are implemented, but vector retrieval API is not implemented yet.
- [x] `embedding_status`, `embedding_ref`, `embedding_model`, `embedding_error`, and `embedding_updated_at` fields exist; embedding generation into PostgreSQL JSONB, OpenSearch vector indexing, and vector retrieval API are all implemented.
- [ ] `search_index_status`, `search_index_error`, and `search_index_updated_at` fields exist and are used by the generic task dispatcher/worker.
- [ ] Import response exposes page summaries and chunk IDs, but there is no read API yet to fetch archive, page, or chunk records after import.
- [ ] Import response exposes page summaries and chunk IDs. Archive-level and chunk-level read APIs now exist, but there is still no first-class page record API because pages are not stored as relational rows yet.
- [ ] AI metadata enrichment updates the archive row, but import-time response only reports the queue state; clients need a follow-up API or polling path to observe completed enrichment.
- [ ] Database and Redis credentials are hard-coded in config files; move them to environment variables before production use.
@ -300,16 +310,13 @@ Redis tasks may be duplicated or lost; PostgreSQL status is the recovery source
### Not Done
- [ ] Evidence reconstruction API is not implemented: `GET /api/evidence/{chunk_uid}`.
- [ ] Chunk detail API is not implemented: `GET /api/chunks/{chunk_uid}`.
- [ ] Page-level citation reconstruction is not implemented beyond storing `page_start` and `page_end` on chunks.
- [ ] Reindex/re-embed maintenance commands are not present.
- [ ] Reindex maintenance should detect/recover OpenSearch index loss or stale `search_index_status=indexed` rows when the index has been recreated.
- [ ] Re-embed maintenance command is not present.
- [ ] Request validation is handwritten in the service; no dedicated validator classes or reusable validation layer are present.
- [ ] Automated tests for Markdown parsing, chunking, import persistence, queue behavior, and metadata enrichment are not present.
- [ ] API authentication, rate limiting, and admin controls are not present.
- [ ] Observability for import/search/enrichment jobs is minimal; no structured job metrics or admin status endpoints are present.
- [ ] Default index page/view still uses Webman starter content and is not Proof DB specific.
- [ ] Public API authentication and rate limiting are not present. Minimal admin login/session controls are now present for the maintenance frontend.
- [ ] Observability for import/search/enrichment jobs is still minimal; the admin panel now exposes coarse status endpoints, but there are no historical metrics, tracing, or alerting pipelines yet.
- [x] Default landing page is replaced with a Proof DB-specific admin entry surface instead of the Webman starter content.
### Future Optimizations
@ -321,4 +328,4 @@ Redis tasks may be duplicated or lost; PostgreSQL status is the recovery source
2. Add read APIs for archives/chunks/evidence so imported data can be verified without reading snapshots or the database directly.
3. Add focused tests for DOCMASTER page parsing, noise filtering, comment coalescing, chunk UID stability, and repository persistence.
4. Add async task foundation: task statuses, Redis task payload format, generic DB dispatcher process, and generic Redis worker process. (Done for embedding and OpenSearch indexing)
5. Add chunk detail API and evidence reconstruction API.
5. Improve page-level citation reconstruction beyond chunk page range metadata.

35
scriptdoc/README.md Normal file
View File

@ -0,0 +1,35 @@
# 脚本文档总览
当前 `scriptdoc/` 中的文档按脚本拆分:
- [setup_database.md](/www/proofdb/scriptdoc/setup_database.md): PostgreSQL 结构初始化与升级
- [setup_admin_users.md](/www/proofdb/scriptdoc/setup_admin_users.md): 管理员用户表与首个管理员账号初始化
- [setup_opensearch.md](/www/proofdb/scriptdoc/setup_opensearch.md): OpenSearch 索引初始化
- [reindex_opensearch.md](/www/proofdb/scriptdoc/reindex_opensearch.md): OpenSearch 重建索引与回灌
- [backfill_archive_content.md](/www/proofdb/scriptdoc/backfill_archive_content.md): 历史 archive `content` 正文字段回填
## 当前运维脚本
```text
scripts/setup_database.php
scripts/setup_admin_users.php
scripts/setup_opensearch.php
scripts/reindex_opensearch.php
scripts/backfill_archive_content.php
```
## 推荐顺序
首次初始化或较完整的修复操作,通常按下面顺序执行:
```bash
php scripts/setup_database.php
php scripts/setup_opensearch.php
php scripts/reindex_opensearch.php
```
如果本地 OpenSearch 使用 HTTPS 且证书为自签名,可在相关脚本前临时加:
```bash
OPENSEARCH_SSL_VERIFY=false
```

View File

@ -0,0 +1,76 @@
# Archive Content 回填脚本
## 脚本路径
```text
scripts/backfill_archive_content.php
```
## 脚本作用
回填历史 `archives.content` 字段。
这个脚本主要用于修复旧数据中 `content` 为空的问题。它会按下面顺序尝试生成 `content`
1. 如果 archive 有 `raw`,就按当前导入规则把原始 Markdown 规范化成正文文本。
2. 如果 `raw` 为空,就按 `chunk_index` 顺序拼接现有 chunk 的 `text` 作为回退正文。
脚本不会伪造 `raw`。如果历史数据里 `raw` 丢了,脚本只会尽力补 `content`
## 运行前提
- 当前环境中的 PostgreSQL 配置可用。
- 项目依赖已安装完成。
- 从项目根目录执行命令。
## 运行命令
默认只处理 `content` 为空的 archive
```bash
php scripts/backfill_archive_content.php
```
只处理一个 archive
```bash
php scripts/backfill_archive_content.php --archive_uid=01KQHVREB6XPYF604RVZAP9NNY
```
强制重算,即使 `content` 已经有值:
```bash
php scripts/backfill_archive_content.php --force
```
只预览,不写数据库:
```bash
php scripts/backfill_archive_content.php --dry-run
```
## 成功输出示例
```text
[updated] 01KQHVREB6XPYF604RVZAP9NNY source=chunks content_length=6375
Archive content backfill completed.
Archive filter: auto
Force mode: no
Dry run: no
Scanned: 1
Updated: 1
From raw: 0
From chunks: 1
Skipped: 0
```
## 适用场景
- 修复旧版本导入留下的 `archives.content` 为空问题。
- 导入逻辑更新后,希望重算归一化正文。
- 为后续 AI / RAG / archive 级读取补齐正文字段。
## 重要限制
- 如果历史数据既没有 `raw`,也没有 chunks脚本会跳过该 archive。
- 用 chunks 回填时,得到的是拼接后的正文文本,不会恢复原始 Markdown 结构。

View File

@ -0,0 +1,93 @@
# OpenSearch 重建索引脚本
## 脚本路径
```text
scripts/reindex_opensearch.php
```
## 脚本作用
根据 PostgreSQL 中已经完成向量化的 chunk重新构建 OpenSearch 中的 `proofdb_chunks` 索引内容,并刷新每条 chunk 的派生搜索字段。
脚本会做这些事:
1. 确保 OpenSearch 索引存在。
2. 把已向量化 chunk 的 `search_index_status` 重置为待索引。
3. 按 archive 批量重新投递索引任务。
4. 调用现有 OpenSearch indexing handler 批量写入 chunk 文档。
5. 输出重建统计结果。
## 运行前提
- PostgreSQL 可连接。
- OpenSearch 可连接。
- 目标 chunk 的 `embedding_status` 已经是 `embedded`
- 项目依赖已安装完成。
- 从项目根目录执行命令。
如果本地 OpenSearch 使用 HTTPS 且证书是自签名:
```bash
OPENSEARCH_SSL_VERIFY=false php scripts/reindex_opensearch.php
```
## 运行命令
全量重建:
```bash
php scripts/reindex_opensearch.php
```
只重建一个 archive
```bash
php scripts/reindex_opensearch.php --archive_uid=01KQHVREB6XPYF604RVZAP9NNY
```
## 成功输出示例
```text
OpenSearch reindex completed.
Index: proofdb_chunks
Archive filter: (all embedded archives)
Reset chunks: 14
Indexed archives: 1
Indexed chunk rows now marked indexed: 14
Archives: 01KQHVREB6XPYF604RVZAP9NNY
```
## 适用场景
- `proofdb_chunks` 被误删后恢复。
- 数据库里 `search_index_status=3`,但 OpenSearch 中没有对应文档。
- 索引 mapping 重建后,需要把已经 embedding 完成的数据重新灌回 OpenSearch。
- archive 的 `summary`、`title`、`tags` 等搜索元数据有更新后,需要刷新到 OpenSearch。
## 重要限制
这个脚本只处理已经向量化完成的 chunk。
它不会:
- 重新生成 embedding。
- 修复 embedding 失败的数据。
- 修复 PostgreSQL 中缺失的 archive 或 chunk。
## 推荐用法
如果 OpenSearch 整个索引丢了,通常按下面顺序执行:
```bash
php scripts/setup_opensearch.php
php scripts/reindex_opensearch.php
```
如果数据库 schema 也有变动,则先补数据库:
```bash
php scripts/setup_database.php
php scripts/setup_opensearch.php
php scripts/reindex_opensearch.php
```

View File

@ -0,0 +1,56 @@
# 管理员用户初始化脚本
## 脚本路径
```text
scripts/setup_admin_users.php
```
## 脚本作用
初始化管理员登录使用的 `admin_users` 表,并写入一个管理员账号。
当前版本会确保:
- `admin_users` 表存在。
- `username` 唯一索引存在。
- `updated_time` 自动更新时间 trigger 存在。
- 指定用户名会被创建;如果已存在,则会更新显示名和密码哈希。
## 运行前提
- 当前环境中的 PostgreSQL 配置可用。
- 项目依赖已安装完成。
- 从项目根目录执行命令。
## 运行命令
```bash
php scripts/setup_admin_users.php --username=admin --password='your-password' --display_name='Proof DB Admin'
```
其中:
- `--username` 必填
- `--password` 必填
- `--display_name` 选填
## 成功输出示例
```text
Admin users table initialized.
Seeded username: admin
Display name: Proof DB Admin
```
## 适用场景
- 首次启用管理员登录。
- 需要创建第一个管理员用户。
- 需要重置已有管理员的密码。
## 重要说明
- 这个脚本不会输出明文密码。
- 再次执行同一用户名时,会更新密码哈希。
- 建议在安全环境下执行,不要把明文密码写进仓库文件。

View File

@ -0,0 +1,51 @@
# 数据库初始化脚本
## 脚本路径
```text
scripts/setup_database.php
```
## 脚本作用
初始化或升级 Proof DB 使用的 PostgreSQL 结构。
当前版本会确保:
- `archives` 表存在。
- `chunks` 表存在。
- 档案与 chunk 的常用索引存在。
- embedding / search index 相关状态字段存在。
- `updated_time` 自动更新时间触发器存在。
## 运行前提
- 当前环境中的 PostgreSQL 配置可用。
- 项目依赖已安装完成。
- 从项目根目录执行命令。
## 运行命令
```bash
php scripts/setup_database.php
```
## 成功输出示例
```text
Database connection ok: postgre
Tables initialized: archives, chunks
```
## 适用场景
- 首次部署环境。
- 拉取了数据库结构相关代码后同步 schema。
- 新增了状态字段、索引或 trigger 后补齐现有数据库。
## 常见失败信号
- `PDOException`
说明数据库地址、账号密码、网络或 DNS 有问题。
- SQL 执行错误
说明权限不足,或者现有 schema 与代码预期不一致。

View File

@ -0,0 +1,59 @@
# OpenSearch 索引初始化脚本
## 脚本路径
```text
scripts/setup_opensearch.php
```
## 脚本作用
创建或确认 Proof DB 使用的 OpenSearch chunk 索引,并同步缺失的增量 mapping 字段。
当前版本会确保:
- `proofdb_chunks` 索引存在。
- BM25 全文字段 mapping 已建立。
- 已存在索引上的缺失字段 mapping 会被补齐,例如后续新增的 `summary`
- `embedding` 字段为 `knn_vector`
- 向量维度与当前配置一致。
## 运行前提
- OpenSearch 服务已经启动。
- 当前环境中的 OpenSearch 配置可用。
- 项目依赖已安装完成。
- 从项目根目录执行命令。
如果本地 OpenSearch 使用 HTTPS 且证书是自签名:
```bash
OPENSEARCH_SSL_VERIFY=false php scripts/setup_opensearch.php
```
## 运行命令
```bash
php scripts/setup_opensearch.php
```
## 成功输出示例
```text
OpenSearch chunk index initialized: proofdb_chunks
Vector dimensions: 2048
```
## 适用场景
- OpenSearch 首次初始化。
- `proofdb_chunks` 被删除后重建。
- 增加了新的文档字段,例如 `summary`
- 调整了索引 mapping 或向量维度后重新准备索引。
## 常见失败信号
- `NoNodesAvailableException`
说明 host、协议、端口、SSL 校验或服务状态不对。
- 鉴权失败
说明 `OPENSEARCH_USERNAME` / `OPENSEARCH_PASSWORD` 不正确。

View File

@ -0,0 +1,117 @@
#!/usr/bin/env php
<?php
use app\service\ArticleImportService;
use support\Db;
require __DIR__ . '/../vendor/autoload.php';
require __DIR__ . '/../support/bootstrap.php';
require __DIR__ . '/../vendor/webman/database/src/support/Db.php';
$archiveUid = null;
$force = false;
$dryRun = false;
foreach (array_slice($argv, 1) as $argument) {
if (str_starts_with($argument, '--archive_uid=')) {
$archiveUid = substr($argument, strlen('--archive_uid='));
continue;
}
if ($argument === '--force') {
$force = true;
continue;
}
if ($argument === '--dry-run') {
$dryRun = true;
}
}
$query = Db::table('archives')->orderBy('id');
if ($archiveUid !== null && trim($archiveUid) !== '') {
$query->where('archive_uid', trim($archiveUid));
}
if (!$force) {
$query->where(function ($builder) {
$builder->whereNull('content')->orWhere('content', '');
});
}
$archives = $query->get(['archive_uid', 'title', 'content', 'raw'])->all();
$normalizer = new ArticleImportService();
$scanned = 0;
$updated = 0;
$fromRaw = 0;
$fromChunks = 0;
$skipped = 0;
foreach ($archives as $archive) {
$scanned++;
$archiveUidValue = (string) $archive->archive_uid;
$raw = is_string($archive->raw ?? null) ? $archive->raw : null;
$content = null;
$source = 'none';
if (is_string($raw) && trim($raw) !== '') {
$content = $normalizer->normalizeArchiveContentString($raw);
$source = 'raw';
} else {
$chunks = Db::table('chunks')
->where('archive_uid', $archiveUidValue)
->orderBy('chunk_index')
->pluck('text')
->all();
$chunks = array_values(array_filter(array_map(
static fn ($value): string => trim((string) $value),
$chunks
), static fn (string $value): bool => $value !== ''));
if ($chunks !== []) {
$content = trim(implode("\n\n", $chunks));
$source = 'chunks';
}
}
if ($content === null || $content === '') {
$skipped++;
echo "[skip] {$archiveUidValue} no usable raw/chunks" . PHP_EOL;
continue;
}
if ($dryRun) {
echo "[dry-run] {$archiveUidValue} source={$source} content_length=" . mb_strlen($content) . PHP_EOL;
if ($source === 'raw') {
$fromRaw++;
} else {
$fromChunks++;
}
continue;
}
Db::table('archives')
->where('archive_uid', $archiveUidValue)
->update(['content' => $content]);
$updated++;
if ($source === 'raw') {
$fromRaw++;
} else {
$fromChunks++;
}
echo "[updated] {$archiveUidValue} source={$source} content_length=" . mb_strlen($content) . PHP_EOL;
}
echo 'Archive content backfill completed.' . PHP_EOL;
echo 'Archive filter: ' . ($archiveUid ?: 'auto') . PHP_EOL;
echo 'Force mode: ' . ($force ? 'yes' : 'no') . PHP_EOL;
echo 'Dry run: ' . ($dryRun ? 'yes' : 'no') . PHP_EOL;
echo 'Scanned: ' . $scanned . PHP_EOL;
echo 'Updated: ' . $updated . PHP_EOL;
echo 'From raw: ' . $fromRaw . PHP_EOL;
echo 'From chunks: ' . $fromChunks . PHP_EOL;
echo 'Skipped: ' . $skipped . PHP_EOL;

View File

@ -0,0 +1,68 @@
#!/usr/bin/env php
<?php
use app\service\Search\ChunkSearchIndexHandler;
use app\service\Search\ChunkSearchIndexRepository;
use app\service\Search\OpenSearchChunkIndex;
use support\Db;
require __DIR__ . '/../vendor/autoload.php';
require __DIR__ . '/../support/bootstrap.php';
require __DIR__ . '/../vendor/webman/database/src/support/Db.php';
$archiveUid = null;
foreach (array_slice($argv, 1) as $argument) {
if (str_starts_with($argument, '--archive_uid=')) {
$archiveUid = substr($argument, strlen('--archive_uid='));
}
}
$repository = new ChunkSearchIndexRepository();
$handler = new ChunkSearchIndexHandler();
$index = new OpenSearchChunkIndex();
try {
$index->ensureExists();
$resetCount = $repository->resetEmbeddedChunksToPending($archiveUid);
$archiveCount = 0;
$indexedArchives = [];
$indexedChunks = 0;
while (true) {
$archiveUids = $repository->queuePendingArchiveTasks(100);
if ($archiveUids === []) {
break;
}
foreach ($archiveUids as $uid) {
$handler->handle([
'task_type' => 'search_index',
'target_type' => 'archive',
'target_uid' => $uid,
'attempt' => 1,
]);
$archiveCount++;
$indexedArchives[] = $uid;
}
}
$indexedChunksQuery = Db::table('chunks')->where('search_index_status', 3);
if ($archiveUid !== null && trim($archiveUid) !== '') {
$indexedChunksQuery->where('archive_uid', trim($archiveUid));
}
$indexedChunks = (int) $indexedChunksQuery->count();
echo 'OpenSearch reindex completed.' . PHP_EOL;
echo 'Index: ' . config('opensearch.indices.chunks', 'proofdb_chunks') . PHP_EOL;
echo 'Archive filter: ' . ($archiveUid ?: '(all embedded archives)') . PHP_EOL;
echo 'Reset chunks: ' . $resetCount . PHP_EOL;
echo 'Indexed archives: ' . $archiveCount . PHP_EOL;
echo 'Indexed chunk rows now marked indexed: ' . $indexedChunks . PHP_EOL;
if ($indexedArchives !== []) {
echo 'Archives: ' . implode(', ', $indexedArchives) . PHP_EOL;
}
} catch (Throwable $exception) {
fwrite(STDERR, $exception::class . ': ' . $exception->getMessage() . PHP_EOL);
exit(1);
}

View File

@ -0,0 +1,92 @@
#!/usr/bin/env php
<?php
use support\Db;
require __DIR__ . '/../vendor/autoload.php';
require __DIR__ . '/../support/bootstrap.php';
require __DIR__ . '/../vendor/webman/database/src/support/Db.php';
$username = null;
$password = null;
$displayName = null;
foreach (array_slice($argv, 1) as $argument) {
if (str_starts_with($argument, '--username=')) {
$username = substr($argument, strlen('--username='));
continue;
}
if (str_starts_with($argument, '--password=')) {
$password = substr($argument, strlen('--password='));
continue;
}
if (str_starts_with($argument, '--display_name=')) {
$displayName = substr($argument, strlen('--display_name='));
}
}
if (!is_string($username) || trim($username) === '' || !is_string($password) || $password === '') {
fwrite(STDERR, "Usage: php scripts/setup_admin_users.php --username=<username> --password=<password> [--display_name=<name>]" . PHP_EOL);
exit(1);
}
$username = trim($username);
$displayName = is_string($displayName) && trim($displayName) !== '' ? trim($displayName) : $username;
$passwordHash = password_hash($password, PASSWORD_DEFAULT);
$statements = [
<<<SQL
CREATE TABLE IF NOT EXISTS admin_users (
id BIGSERIAL PRIMARY KEY,
username VARCHAR(120) NOT NULL UNIQUE,
display_name TEXT,
password_hash TEXT NOT NULL,
is_active BOOLEAN NOT NULL DEFAULT TRUE,
last_login_at TIMESTAMPTZ,
created_time TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_time TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP
)
SQL,
'CREATE INDEX IF NOT EXISTS admin_users_is_active_index ON admin_users (is_active)',
<<<SQL
CREATE OR REPLACE FUNCTION set_updated_time()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_time = CURRENT_TIMESTAMP;
RETURN NEW;
END;
$$ LANGUAGE plpgsql
SQL,
'DROP TRIGGER IF EXISTS admin_users_set_updated_time ON admin_users',
<<<SQL
CREATE TRIGGER admin_users_set_updated_time
BEFORE UPDATE ON admin_users
FOR EACH ROW
EXECUTE FUNCTION set_updated_time()
SQL,
];
try {
Db::connection()->getPdo();
foreach ($statements as $statement) {
Db::statement($statement);
}
Db::table('admin_users')->updateOrInsert(
['username' => $username],
[
'display_name' => $displayName,
'password_hash' => $passwordHash,
'is_active' => true,
]
);
echo 'Admin users table initialized.' . PHP_EOL;
echo 'Seeded username: ' . $username . PHP_EOL;
echo 'Display name: ' . $displayName . PHP_EOL;
} catch (Throwable $exception) {
fwrite(STDERR, $exception::class . ': ' . $exception->getMessage() . PHP_EOL);
exit(1);
}