- Updated README.md to include detailed descriptions of new features, including Zep mixed search functionality and detailed persona generation for individual and group entities. - Implemented a robust mechanism for checking simulation preparation status to avoid redundant profile generation. - Added support for parallel profile generation, improving efficiency in creating OASIS Agent Profiles. - Enhanced the simulation configuration generator to adopt a stepwise approach, ensuring better handling of complex configurations. - Introduced error handling and retry mechanisms for LLM calls, improving the reliability of profile generation. - Updated simulation management to support new API parameters for controlling profile generation behavior.
1838 lines
52 KiB
Markdown
1838 lines
52 KiB
Markdown
# MiroFish Backend
|
||
|
||
社会舆论模拟系统后端服务,基于Flask框架。
|
||
|
||
## 项目结构
|
||
|
||
```
|
||
backend/
|
||
├── app/
|
||
│ ├── __init__.py # Flask应用工厂
|
||
│ ├── config.py # 配置管理
|
||
│ ├── api/ # API路由
|
||
│ │ ├── __init__.py # Blueprint注册
|
||
│ │ ├── graph.py # Step1: 图谱相关接口
|
||
│ │ └── simulation.py # Step2: 模拟相关接口
|
||
│ ├── services/ # 业务逻辑层
|
||
│ │ ├── __init__.py # 服务模块导出
|
||
│ │ ├── ontology_generator.py # 本体生成服务
|
||
│ │ ├── graph_builder.py # 图谱构建服务
|
||
│ │ ├── text_processor.py # 文本处理服务
|
||
│ │ ├── zep_entity_reader.py # Zep实体读取与过滤
|
||
│ │ ├── oasis_profile_generator.py # Agent Profile生成器
|
||
│ │ ├── simulation_config_generator.py # LLM智能配置生成器(核心)
|
||
│ │ └── simulation_manager.py # 模拟管理器
|
||
│ ├── models/ # 数据模型
|
||
│ │ ├── task.py # 任务状态管理
|
||
│ │ └── project.py # 项目上下文管理
|
||
│ └── utils/ # 工具模块
|
||
│ ├── file_parser.py # 文件解析
|
||
│ ├── llm_client.py # LLM客户端
|
||
│ └── logger.py # 日志工具
|
||
├── scripts/ # 预设模拟脚本
|
||
│ ├── run_twitter_simulation.py # Twitter模拟脚本
|
||
│ ├── run_reddit_simulation.py # Reddit模拟脚本
|
||
│ └── run_parallel_simulation.py # 双平台并行脚本
|
||
├── uploads/ # 上传文件存储
|
||
│ ├── projects/ # 项目文件
|
||
│ └── simulations/ # 模拟数据(含配置和脚本副本)
|
||
├── requirements.txt
|
||
└── run.py # 启动入口
|
||
```
|
||
|
||
## 安装
|
||
|
||
```bash
|
||
conda activate MiroFish
|
||
cd backend
|
||
pip install -r requirements.txt
|
||
```
|
||
|
||
## 配置
|
||
|
||
在项目根目录 `MiroFish/.env` 中配置:
|
||
|
||
```bash
|
||
# LLM配置(统一使用OpenAI格式)
|
||
LLM_API_KEY=your-llm-api-key
|
||
LLM_BASE_URL=https://openrouter.ai/api/v1
|
||
LLM_MODEL_NAME=gpt-4o-mini
|
||
|
||
# Zep配置
|
||
ZEP_API_KEY=your-zep-api-key
|
||
|
||
# OASIS模拟配置(可选)
|
||
OASIS_DEFAULT_MAX_ROUNDS=10
|
||
```
|
||
|
||
## 启动服务
|
||
|
||
```bash
|
||
python run.py
|
||
```
|
||
|
||
服务默认运行在 http://localhost:5001
|
||
|
||
---
|
||
|
||
# 系统架构
|
||
|
||
## 完整工作流程
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────────────┐
|
||
│ Step 1: 图谱构建 │
|
||
├─────────────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ 上传文档 ──→ 生成本体定义 ──→ 构建Zep图谱 ──→ 图谱数据 │
|
||
│ (PDF/MD/TXT) (LLM分析) (异步任务) (节点/边) │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────────────────┐
|
||
│ Step 2: 实体读取与模拟准备 │
|
||
├─────────────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ 读取图谱节点 ──→ 过滤符合条件实体 ──→ 生成Agent Profile ──→ 生成脚本 │
|
||
│ (Zep API) (按Labels筛选) (LLM生成人设) (OASIS启动) │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────────────────┐
|
||
│ Step 3: 双平台并行模拟 │
|
||
├─────────────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ ┌─────────────────┐ ┌─────────────────┐ │
|
||
│ │ Twitter模拟 │ │ Reddit模拟 │ │
|
||
│ │ (短平快交互) │ 并行运行 │ (深度话题讨论) │ │
|
||
│ └─────────────────┘ └─────────────────┘ │
|
||
│ │ │
|
||
│ ▼ │
|
||
│ 同一批智能体,模拟真实社交环境 │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
# Step 1: 图谱构建 API
|
||
|
||
## 核心工作流程
|
||
|
||
```
|
||
1. 上传文件 + 生成本体
|
||
POST /api/graph/ontology/generate
|
||
→ 返回 project_id
|
||
|
||
2. 构建图谱
|
||
POST /api/graph/build
|
||
→ 返回 task_id
|
||
|
||
3. 查询任务进度
|
||
GET /api/graph/task/{task_id}
|
||
|
||
4. 获取图谱数据
|
||
GET /api/graph/data/{graph_id}
|
||
```
|
||
|
||
---
|
||
|
||
### 接口1:生成本体定义
|
||
|
||
**POST** `/api/graph/ontology/generate`
|
||
|
||
上传文档,分析生成适合社会模拟的实体和关系类型定义。
|
||
|
||
**请求(form-data):**
|
||
|
||
| 字段 | 类型 | 必填 | 说明 |
|
||
|------|------|------|------|
|
||
| `files` | File | 是 | PDF/MD/TXT文件,可多个 |
|
||
| `simulation_requirement` | Text | 是 | 模拟需求描述 |
|
||
| `project_name` | Text | 否 | 项目名称 |
|
||
| `additional_context` | Text | 否 | 额外说明 |
|
||
|
||
**响应示例:**
|
||
```json
|
||
{
|
||
"success": true,
|
||
"data": {
|
||
"project_id": "proj_abc123def456",
|
||
"project_name": "武汉大学舆情分析",
|
||
"ontology": {
|
||
"entity_types": [
|
||
{
|
||
"name": "Student",
|
||
"description": "Students enrolled in educational institutions",
|
||
"attributes": [
|
||
{"name": "student_id", "type": "text", "description": "Unique identifier"},
|
||
{"name": "major", "type": "text", "description": "Field of study"}
|
||
]
|
||
}
|
||
],
|
||
"edge_types": [
|
||
{
|
||
"name": "AFFILIATED_WITH",
|
||
"description": "Indicates affiliation between entities",
|
||
"source_targets": [
|
||
{"source": "Student", "target": "University"}
|
||
]
|
||
}
|
||
]
|
||
},
|
||
"analysis_summary": "分析说明...",
|
||
"files": [{"filename": "报告.pdf", "size": 123456}],
|
||
"total_text_length": 20833
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 接口2:构建图谱
|
||
|
||
**POST** `/api/graph/build`
|
||
|
||
根据 `project_id` 构建Zep知识图谱(异步任务)。
|
||
|
||
**请求(JSON):**
|
||
```json
|
||
{
|
||
"project_id": "proj_abc123def456",
|
||
"graph_name": "图谱名称",
|
||
"chunk_size": 500,
|
||
"chunk_overlap": 50
|
||
}
|
||
```
|
||
|
||
| 字段 | 类型 | 必填 | 说明 |
|
||
|------|------|------|------|
|
||
| `project_id` | string | 是 | 来自接口1的返回 |
|
||
| `graph_name` | string | 否 | 图谱名称 |
|
||
| `chunk_size` | int | 否 | 文本块大小,默认500 |
|
||
| `chunk_overlap` | int | 否 | 块重叠字符,默认50 |
|
||
|
||
**响应:**
|
||
```json
|
||
{
|
||
"success": true,
|
||
"data": {
|
||
"project_id": "proj_abc123def456",
|
||
"task_id": "task_xyz789",
|
||
"message": "图谱构建任务已启动"
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 任务状态查询
|
||
|
||
**GET** `/api/graph/task/{task_id}`
|
||
|
||
```json
|
||
{
|
||
"success": true,
|
||
"data": {
|
||
"task_id": "task_xyz789",
|
||
"status": "processing",
|
||
"progress": 45,
|
||
"message": "Zep处理中... 15/30 完成",
|
||
"result": null
|
||
}
|
||
}
|
||
```
|
||
|
||
**状态值:**
|
||
- `pending` - 等待中
|
||
- `processing` - 处理中
|
||
- `completed` - 已完成
|
||
- `failed` - 失败
|
||
|
||
---
|
||
|
||
### 项目管理接口
|
||
|
||
| 方法 | 路径 | 说明 |
|
||
|------|------|------|
|
||
| GET | `/api/graph/project/{project_id}` | 获取项目详情 |
|
||
| GET | `/api/graph/project/list` | 列出所有项目 |
|
||
| DELETE | `/api/graph/project/{project_id}` | 删除项目 |
|
||
|
||
---
|
||
|
||
### 图谱数据接口
|
||
|
||
| 方法 | 路径 | 说明 |
|
||
|------|------|------|
|
||
| GET | `/api/graph/data/{graph_id}` | 获取图谱节点和边 |
|
||
| DELETE | `/api/graph/delete/{graph_id}` | 删除Zep图谱 |
|
||
|
||
---
|
||
|
||
# Step 2: 实体读取与模拟运行 API
|
||
|
||
## 核心设计理念
|
||
|
||
**全程自动化,无需人工设置参数:**
|
||
- 脚本是**预设的**,不是动态生成
|
||
- 所有模拟参数由**LLM智能生成**
|
||
- LLM读取模拟需求+文档+图谱信息,自动设置最佳参数
|
||
- **通过API接口启动和监控模拟**,前端可实时展示
|
||
|
||
## 核心工作流程
|
||
|
||
```
|
||
1. 创建模拟
|
||
POST /api/simulation/create
|
||
→ 返回 simulation_id
|
||
|
||
2. 准备模拟环境(异步任务)
|
||
POST /api/simulation/prepare
|
||
Body: { "simulation_id": "sim_xxxx" }
|
||
→ 返回 task_id(立即响应)
|
||
|
||
查询进度:
|
||
POST /api/simulation/prepare/status
|
||
Body: { "task_id": "task_xxxx" }
|
||
→ 返回 status, progress, result
|
||
|
||
3. 开始模拟
|
||
POST /api/simulation/start
|
||
Body: { "simulation_id": "sim_xxxx", "platform": "parallel" }
|
||
→ 在后台启动OASIS模拟进程
|
||
→ 返回运行状态
|
||
|
||
4. 实时监控(前端轮询)
|
||
GET /api/simulation/{simulation_id}/run-status/detail
|
||
→ 返回当前进度、最近Agent动作
|
||
|
||
5. 停止模拟(可选)
|
||
POST /api/simulation/stop
|
||
Body: { "simulation_id": "sim_xxxx" }
|
||
```
|
||
|
||
---
|
||
|
||
## 实体读取接口
|
||
|
||
### 获取图谱实体(已过滤)
|
||
|
||
**GET** `/api/simulation/entities/{graph_id}`
|
||
|
||
获取图谱中符合预定义实体类型的节点。
|
||
|
||
**实体过滤逻辑:**
|
||
- Zep对符合预定义类型的实体,Labels为 `["Entity", "Student"]`
|
||
- 对不符合预定义类型的实体,Labels仅为 `["Entity"]`
|
||
- **筛选规则**:只保留Labels中包含除"Entity"和"Node"之外标签的节点
|
||
|
||
**Query参数:**
|
||
|
||
| 参数 | 类型 | 必填 | 说明 |
|
||
|------|------|------|------|
|
||
| `entity_types` | string | 否 | 逗号分隔的实体类型,用于进一步过滤 |
|
||
| `enrich` | boolean | 否 | 是否获取相关边信息,默认true |
|
||
|
||
**响应示例:**
|
||
```json
|
||
{
|
||
"success": true,
|
||
"data": {
|
||
"entities": [
|
||
{
|
||
"uuid": "node_uuid_123",
|
||
"name": "杨景媛",
|
||
"labels": ["Entity", "Student"],
|
||
"summary": "武汉大学学生,图书馆事件当事人",
|
||
"attributes": {
|
||
"student_id": "2021001",
|
||
"major": "计算机科学"
|
||
},
|
||
"related_edges": [
|
||
{
|
||
"direction": "outgoing",
|
||
"edge_name": "AFFILIATED_WITH",
|
||
"fact": "杨景媛是武汉大学的学生",
|
||
"target_node_uuid": "node_uuid_456"
|
||
}
|
||
],
|
||
"related_nodes": [
|
||
{
|
||
"uuid": "node_uuid_456",
|
||
"name": "武汉大学",
|
||
"labels": ["Entity", "University"],
|
||
"summary": "中国著名高等学府"
|
||
}
|
||
]
|
||
}
|
||
],
|
||
"entity_types": ["Student", "University", "PublicFigure"],
|
||
"total_count": 100,
|
||
"filtered_count": 45
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 获取单个实体详情
|
||
|
||
**GET** `/api/simulation/entities/{graph_id}/{entity_uuid}`
|
||
|
||
获取单个实体的完整信息,包含所有相关边和关联节点。
|
||
|
||
---
|
||
|
||
### 按类型获取实体
|
||
|
||
**GET** `/api/simulation/entities/{graph_id}/by-type/{entity_type}`
|
||
|
||
获取指定类型(如Student、PublicFigure)的所有实体。
|
||
|
||
---
|
||
|
||
## 模拟管理接口
|
||
|
||
### 创建模拟
|
||
|
||
**POST** `/api/simulation/create`
|
||
|
||
**请求(JSON):**
|
||
```json
|
||
{
|
||
"project_id": "proj_abc123def456",
|
||
"graph_id": "mirofish_xxxx",
|
||
"enable_twitter": true,
|
||
"enable_reddit": true,
|
||
"max_rounds": 10,
|
||
"agents_per_round": -1
|
||
}
|
||
```
|
||
|
||
| 字段 | 类型 | 必填 | 说明 |
|
||
|------|------|------|------|
|
||
| `project_id` | string | 是 | 项目ID |
|
||
| `graph_id` | string | 否 | 图谱ID,不提供则从project获取 |
|
||
| `enable_twitter` | boolean | 否 | 启用Twitter模拟,默认true |
|
||
| `enable_reddit` | boolean | 否 | 启用Reddit模拟,默认true |
|
||
| `max_rounds` | int | 否 | 最大模拟轮数,默认10 |
|
||
| `agents_per_round` | int | 否 | 每轮激活智能体数,-1表示全部 |
|
||
|
||
**响应示例:**
|
||
```json
|
||
{
|
||
"success": true,
|
||
"data": {
|
||
"simulation_id": "sim_abc123def456",
|
||
"config": {
|
||
"project_id": "proj_xxxx",
|
||
"graph_id": "mirofish_xxxx",
|
||
"enable_twitter": true,
|
||
"enable_reddit": true,
|
||
"max_rounds": 10
|
||
},
|
||
"status": "created",
|
||
"created_at": "2025-12-01T10:00:00"
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 准备模拟环境(异步任务)
|
||
|
||
**POST** `/api/simulation/prepare`
|
||
|
||
**异步接口**:这是一个耗时操作,接口会立即返回`task_id`,通过`/prepare/status`查询进度。
|
||
|
||
执行模拟准备流程(LLM智能生成所有参数,带自动重试机制):
|
||
1. 从Zep图谱读取并过滤实体
|
||
2. 为每个实体生成OASIS Agent Profile(带重试)
|
||
3. LLM智能生成模拟配置(带重试)
|
||
4. 保存配置文件和复制预设脚本
|
||
|
||
**请求(JSON):**
|
||
```json
|
||
{
|
||
"simulation_id": "sim_xxxx",
|
||
"entity_types": ["Student", "PublicFigure"],
|
||
"use_llm_for_profiles": true
|
||
}
|
||
```
|
||
|
||
**响应示例:**
|
||
```json
|
||
{
|
||
"success": true,
|
||
"data": {
|
||
"simulation_id": "sim_xxxx",
|
||
"task_id": "task_xxxx",
|
||
"status": "preparing",
|
||
"message": "准备任务已启动,请通过 /api/simulation/prepare/status 查询进度"
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 查询准备进度
|
||
|
||
**POST** `/api/simulation/prepare/status`
|
||
|
||
查询准备任务的执行进度。
|
||
|
||
**请求(JSON):**
|
||
```json
|
||
{
|
||
"task_id": "task_xxxx"
|
||
}
|
||
```
|
||
|
||
**响应示例:**
|
||
```json
|
||
{
|
||
"success": true,
|
||
"data": {
|
||
"task_id": "task_xxxx",
|
||
"task_type": "simulation_prepare",
|
||
"status": "processing",
|
||
"progress": 45,
|
||
"message": "[2/4] 生成Agent人设: 35/93 - 生成 教授张三 的人设...",
|
||
"progress_detail": {
|
||
"current_stage": "generating_profiles",
|
||
"current_stage_name": "生成Agent人设",
|
||
"stage_index": 2,
|
||
"total_stages": 4,
|
||
"stage_progress": 38,
|
||
"current_item": 35,
|
||
"total_items": 93,
|
||
"item_description": "生成 教授张三 的人设..."
|
||
},
|
||
"result": null,
|
||
"error": null,
|
||
"metadata": {
|
||
"project_id": "proj_xxxx",
|
||
"simulation_id": "sim_xxxx"
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
**进度详情字段(progress_detail):**
|
||
|
||
| 字段 | 类型 | 说明 |
|
||
|------|------|------|
|
||
| `current_stage` | string | 当前阶段标识 (reading/generating_profiles/generating_config/copying_scripts) |
|
||
| `current_stage_name` | string | 当前阶段中文名称 |
|
||
| `stage_index` | int | 当前阶段序号 (1-4) |
|
||
| `total_stages` | int | 总阶段数 (4) |
|
||
| `stage_progress` | int | 当前阶段内进度 (0-100) |
|
||
| `current_item` | int | 当前处理的项目序号 |
|
||
| `total_items` | int | 当前阶段总项目数 |
|
||
| `item_description` | string | 当前项目描述 |
|
||
|
||
**阶段说明:**
|
||
|
||
| 阶段 | 名称 | 权重 | 说明 |
|
||
|------|------|------|------|
|
||
| 1 | 读取图谱实体 | 0-20% | 从Zep读取并过滤实体 |
|
||
| 2 | 生成Agent人设 | 20-70% | 为每个实体生成OASIS Profile |
|
||
| 3 | 生成模拟配置 | 70-90% | LLM智能生成模拟参数 |
|
||
| 4 | 准备模拟脚本 | 90-100% | 复制预设脚本到模拟目录 |
|
||
|
||
**状态值(status):**
|
||
- `pending` - 等待中
|
||
- `processing` - 处理中
|
||
- `completed` - 已完成(此时result包含结果)
|
||
- `failed` - 失败(此时error包含错误信息)
|
||
|
||
**完成后的响应:**
|
||
```json
|
||
{
|
||
"success": true,
|
||
"data": {
|
||
"task_id": "task_xxxx",
|
||
"status": "completed",
|
||
"progress": 100,
|
||
"message": "任务完成",
|
||
"result": {
|
||
"simulation_id": "sim_xxxx",
|
||
"project_id": "proj_xxxx",
|
||
"graph_id": "mirofish_xxxx",
|
||
"status": "ready",
|
||
"entities_count": 93,
|
||
"profiles_count": 93,
|
||
"entity_types": ["University", "Student", ...],
|
||
"config_generated": true,
|
||
"error": null
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
| 字段 | 类型 | 必填 | 说明 |
|
||
|------|------|------|------|
|
||
| `entity_types` | array | 否 | 指定实体类型进行过滤 |
|
||
| `use_llm_for_profiles` | boolean | 否 | 是否使用LLM生成人设,默认true |
|
||
|
||
**注意**:`simulation_requirement`和`document_text`自动从项目中获取
|
||
|
||
**响应示例:**
|
||
```json
|
||
{
|
||
"success": true,
|
||
"data": {
|
||
"simulation_id": "sim_abc123def456",
|
||
"status": "ready",
|
||
"entities_count": 45,
|
||
"profiles_count": 45,
|
||
"entity_types": ["Student", "PublicFigure", "University"],
|
||
"config_generated": true,
|
||
"config_reasoning": "根据武汉大学图书馆事件的特点,设置72小时模拟时长...",
|
||
"run_instructions": {
|
||
"simulation_dir": "/path/to/sim_xxx",
|
||
"commands": {...},
|
||
"instructions": "..."
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 获取模拟状态
|
||
|
||
**GET** `/api/simulation/{simulation_id}`
|
||
|
||
**响应示例:**
|
||
```json
|
||
{
|
||
"success": true,
|
||
"data": {
|
||
"simulation_id": "sim_abc123def456",
|
||
"status": "ready",
|
||
"entities_count": 45,
|
||
"profiles_count": 45,
|
||
"entity_types": ["Student", "PublicFigure"],
|
||
"current_round": 0,
|
||
"twitter_status": "not_started",
|
||
"reddit_status": "not_started"
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 列出所有模拟
|
||
|
||
**GET** `/api/simulation/list`
|
||
|
||
| Query参数 | 类型 | 说明 |
|
||
|-----------|------|------|
|
||
| `project_id` | string | 按项目ID过滤(可选) |
|
||
|
||
---
|
||
|
||
### 获取Agent Profile
|
||
|
||
**GET** `/api/simulation/{simulation_id}/profiles`
|
||
|
||
| Query参数 | 类型 | 说明 |
|
||
|-----------|------|------|
|
||
| `platform` | string | 平台类型:reddit 或 twitter |
|
||
|
||
**响应示例:**
|
||
```json
|
||
{
|
||
"success": true,
|
||
"data": {
|
||
"platform": "reddit",
|
||
"count": 45,
|
||
"profiles": [
|
||
{
|
||
"user_id": 0,
|
||
"user_name": "yangjingyuan_123",
|
||
"name": "杨景媛",
|
||
"bio": "武汉大学学生,关注教育公平与学生权益",
|
||
"persona": "杨景媛是一名积极参与社会讨论的大学生,性格内敛但观点鲜明...",
|
||
"karma": 1500,
|
||
"age": 22,
|
||
"gender": "female",
|
||
"mbti": "INFJ",
|
||
"country": "China",
|
||
"profession": "Student",
|
||
"interested_topics": ["Education", "Social Issues"]
|
||
}
|
||
]
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 获取模拟配置
|
||
|
||
**GET** `/api/simulation/{simulation_id}/config`
|
||
|
||
获取LLM智能生成的完整配置,包含:
|
||
- `time_config`: 时间配置
|
||
- `agent_configs`: 每个Agent的活动配置
|
||
- `event_config`: 事件配置
|
||
- `generation_reasoning`: LLM的配置推理说明
|
||
|
||
---
|
||
|
||
### 下载文件
|
||
|
||
| 接口 | 说明 |
|
||
|------|------|
|
||
| GET `/api/simulation/{id}/config/download` | 下载配置文件 |
|
||
| GET `/api/simulation/{id}/script/{script_name}/download` | 下载脚本文件 |
|
||
|
||
**脚本名称:**
|
||
- `run_twitter_simulation.py`
|
||
- `run_reddit_simulation.py`
|
||
- `run_parallel_simulation.py`
|
||
|
||
---
|
||
|
||
### 直接生成Profile
|
||
|
||
**POST** `/api/simulation/generate-profiles`
|
||
|
||
不创建模拟,直接从图谱生成Agent Profile。
|
||
|
||
```json
|
||
{
|
||
"graph_id": "mirofish_xxxx",
|
||
"entity_types": ["Student", "PublicFigure"],
|
||
"use_llm": true,
|
||
"platform": "reddit"
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 模拟运行控制接口
|
||
|
||
### 开始模拟
|
||
|
||
**POST** `/api/simulation/start`
|
||
|
||
启动OASIS模拟,在后台运行。
|
||
|
||
**请求(JSON):**
|
||
```json
|
||
{
|
||
"simulation_id": "sim_xxxx", // 必填
|
||
"platform": "parallel" // 可选: twitter / reddit / parallel (默认)
|
||
}
|
||
```
|
||
|
||
**响应示例:**
|
||
```json
|
||
{
|
||
"success": true,
|
||
"data": {
|
||
"simulation_id": "sim_xxxx",
|
||
"runner_status": "running",
|
||
"process_pid": 12345,
|
||
"twitter_running": true,
|
||
"reddit_running": true,
|
||
"total_rounds": 144,
|
||
"total_simulation_hours": 72,
|
||
"started_at": "2025-12-01T10:00:00"
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 停止模拟
|
||
|
||
**POST** `/api/simulation/stop`
|
||
|
||
停止正在运行的模拟。
|
||
|
||
**请求(JSON):**
|
||
```json
|
||
{
|
||
"simulation_id": "sim_xxxx" // 必填
|
||
}
|
||
```
|
||
|
||
**响应示例:**
|
||
```json
|
||
{
|
||
"success": true,
|
||
"data": {
|
||
"simulation_id": "sim_xxxx",
|
||
"runner_status": "stopped",
|
||
"completed_at": "2025-12-01T12:00:00",
|
||
"twitter_actions_count": 500,
|
||
"reddit_actions_count": 650
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 实时状态监控接口
|
||
|
||
### 获取运行状态(基础)
|
||
|
||
**GET** `/api/simulation/{simulation_id}/run-status`
|
||
|
||
获取模拟运行的实时状态,用于前端轮询。
|
||
|
||
**响应示例:**
|
||
```json
|
||
{
|
||
"success": true,
|
||
"data": {
|
||
"simulation_id": "sim_xxxx",
|
||
"runner_status": "running",
|
||
"current_round": 25,
|
||
"total_rounds": 144,
|
||
"progress_percent": 17.4,
|
||
"simulated_hours": 12,
|
||
"total_simulation_hours": 72,
|
||
"twitter_running": true,
|
||
"reddit_running": true,
|
||
"twitter_actions_count": 150,
|
||
"reddit_actions_count": 200,
|
||
"total_actions_count": 350,
|
||
"started_at": "2025-12-01T10:00:00",
|
||
"updated_at": "2025-12-01T10:30:00"
|
||
}
|
||
}
|
||
```
|
||
|
||
**运行状态值(runner_status):**
|
||
- `idle` - 未运行
|
||
- `starting` - 启动中
|
||
- `running` - 运行中
|
||
- `paused` - 已暂停
|
||
- `stopping` - 停止中
|
||
- `stopped` - 已停止
|
||
- `completed` - 已完成
|
||
- `failed` - 失败
|
||
|
||
---
|
||
|
||
### 获取运行状态(详细,含最近动作)
|
||
|
||
**GET** `/api/simulation/{simulation_id}/run-status/detail`
|
||
|
||
获取详细运行状态,包含最近的Agent动作列表,**用于前端实时展示动态**。
|
||
|
||
**响应示例:**
|
||
```json
|
||
{
|
||
"success": true,
|
||
"data": {
|
||
"simulation_id": "sim_xxxx",
|
||
"runner_status": "running",
|
||
"current_round": 25,
|
||
"progress_percent": 17.4,
|
||
"recent_actions": [
|
||
{
|
||
"round_num": 25,
|
||
"timestamp": "2025-12-01T10:30:00",
|
||
"platform": "twitter",
|
||
"agent_id": 3,
|
||
"agent_name": "Entity Name",
|
||
"action_type": "CREATE_POST",
|
||
"action_args": {"content": "Post content..."},
|
||
"result": null,
|
||
"success": true
|
||
},
|
||
{
|
||
"round_num": 25,
|
||
"timestamp": "2025-12-01T10:29:55",
|
||
"platform": "reddit",
|
||
"agent_id": 7,
|
||
"agent_name": "Another Entity",
|
||
"action_type": "LIKE_POST",
|
||
"action_args": {"post_id": 5},
|
||
"success": true
|
||
}
|
||
]
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 获取动作历史
|
||
|
||
**GET** `/api/simulation/{simulation_id}/actions`
|
||
|
||
获取完整的Agent动作历史记录。
|
||
|
||
**Query参数:**
|
||
|
||
| 参数 | 类型 | 说明 |
|
||
|------|------|------|
|
||
| `limit` | int | 返回数量(默认100) |
|
||
| `offset` | int | 偏移量(默认0) |
|
||
| `platform` | string | 过滤平台(twitter/reddit) |
|
||
| `agent_id` | int | 过滤Agent ID |
|
||
| `round_num` | int | 过滤轮次 |
|
||
|
||
---
|
||
|
||
### 获取时间线
|
||
|
||
**GET** `/api/simulation/{simulation_id}/timeline`
|
||
|
||
获取按轮次汇总的时间线,用于前端展示进度条和时间线视图。
|
||
|
||
**Query参数:**
|
||
|
||
| 参数 | 类型 | 说明 |
|
||
|------|------|------|
|
||
| `start_round` | int | 起始轮次(默认0) |
|
||
| `end_round` | int | 结束轮次(默认全部) |
|
||
|
||
**响应示例:**
|
||
```json
|
||
{
|
||
"success": true,
|
||
"data": {
|
||
"rounds_count": 25,
|
||
"timeline": [
|
||
{
|
||
"round_num": 1,
|
||
"twitter_actions": 10,
|
||
"reddit_actions": 15,
|
||
"total_actions": 25,
|
||
"active_agents_count": 8,
|
||
"active_agents": [0, 1, 3, 5, 7, 10, 12, 15],
|
||
"action_types": {"CREATE_POST": 5, "LIKE_POST": 10, "LLM_ACTION": 10},
|
||
"first_action_time": "2025-12-01T10:00:00",
|
||
"last_action_time": "2025-12-01T10:05:00"
|
||
}
|
||
]
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 获取Agent统计
|
||
|
||
**GET** `/api/simulation/{simulation_id}/agent-stats`
|
||
|
||
获取每个Agent的活跃度统计,用于展示排行榜。
|
||
|
||
**响应示例:**
|
||
```json
|
||
{
|
||
"success": true,
|
||
"data": {
|
||
"agents_count": 45,
|
||
"stats": [
|
||
{
|
||
"agent_id": 3,
|
||
"agent_name": "Active Agent",
|
||
"total_actions": 50,
|
||
"twitter_actions": 30,
|
||
"reddit_actions": 20,
|
||
"action_types": {"CREATE_POST": 10, "LIKE_POST": 25, "REPOST": 15},
|
||
"first_action_time": "2025-12-01T10:00:00",
|
||
"last_action_time": "2025-12-01T12:30:00"
|
||
}
|
||
]
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 数据库查询接口
|
||
|
||
### 获取帖子
|
||
|
||
**GET** `/api/simulation/{simulation_id}/posts`
|
||
|
||
从模拟数据库获取帖子列表。
|
||
|
||
**Query参数:**
|
||
|
||
| 参数 | 类型 | 说明 |
|
||
|------|------|------|
|
||
| `platform` | string | 平台类型(twitter/reddit,默认reddit) |
|
||
| `limit` | int | 返回数量(默认50) |
|
||
| `offset` | int | 偏移量 |
|
||
|
||
---
|
||
|
||
### 获取评论
|
||
|
||
**GET** `/api/simulation/{simulation_id}/comments`
|
||
|
||
从Reddit模拟数据库获取评论列表。
|
||
|
||
**Query参数:**
|
||
|
||
| 参数 | 类型 | 说明 |
|
||
|------|------|------|
|
||
| `post_id` | string | 过滤帖子ID(可选) |
|
||
| `limit` | int | 返回数量(默认50) |
|
||
| `offset` | int | 偏移量 |
|
||
|
||
---
|
||
|
||
# 服务层实现细节
|
||
|
||
## 1. ZepEntityReader(Zep实体读取服务)
|
||
|
||
**文件:** `app/services/zep_entity_reader.py`
|
||
|
||
### 核心功能
|
||
|
||
| 方法 | 说明 |
|
||
|------|------|
|
||
| `get_all_nodes(graph_id)` | 获取图谱所有节点 |
|
||
| `get_all_edges(graph_id)` | 获取图谱所有边 |
|
||
| `filter_defined_entities(graph_id, ...)` | 筛选符合条件的实体 |
|
||
| `get_entity_with_context(graph_id, uuid)` | 获取实体完整上下文 |
|
||
| `get_entities_by_type(graph_id, type)` | 按类型获取实体 |
|
||
|
||
### 数据结构
|
||
|
||
```python
|
||
@dataclass
|
||
class EntityNode:
|
||
uuid: str # 节点UUID
|
||
name: str # 实体名称
|
||
labels: List[str] # 标签列表 ["Entity", "Student"]
|
||
summary: str # 实体摘要
|
||
attributes: Dict[str, Any] # 属性字典
|
||
related_edges: List[Dict] # 相关边信息
|
||
related_nodes: List[Dict] # 关联节点信息
|
||
|
||
def get_entity_type(self) -> Optional[str]:
|
||
"""获取实体类型(排除默认Entity标签)"""
|
||
|
||
@dataclass
|
||
class FilteredEntities:
|
||
entities: List[EntityNode] # 实体列表
|
||
entity_types: Set[str] # 发现的实体类型
|
||
total_count: int # 总节点数
|
||
filtered_count: int # 过滤后数量
|
||
```
|
||
|
||
### 过滤逻辑示例
|
||
|
||
```python
|
||
# Zep返回的节点Labels示例:
|
||
# 符合预定义类型: ["Entity", "Student"]
|
||
# 不符合预定义类型: ["Entity"]
|
||
|
||
for node in all_nodes:
|
||
labels = node.get("labels", [])
|
||
custom_labels = [l for l in labels if l not in ["Entity", "Node"]]
|
||
|
||
if not custom_labels:
|
||
# 只有默认标签,跳过
|
||
continue
|
||
|
||
# 保留符合条件的实体
|
||
entity_type = custom_labels[0]
|
||
filtered_entities.append(node)
|
||
```
|
||
|
||
---
|
||
|
||
## 2. OasisProfileGenerator(Agent Profile生成器)
|
||
|
||
**文件:** `app/services/oasis_profile_generator.py`
|
||
|
||
### 核心功能
|
||
|
||
| 方法 | 说明 |
|
||
|------|------|
|
||
| `generate_profile_from_entity(entity, user_id)` | 从实体生成单个Profile(带详细人设) |
|
||
| `generate_profiles_from_entities(entities, graph_id)` | 批量生成Profile |
|
||
| `save_profiles(profiles, path, platform)` | 保存Profile文件 |
|
||
| `_search_zep_for_entity(entity_name)` | 调用Zep检索获取额外上下文 |
|
||
|
||
### 优化特性(v2.0)
|
||
|
||
1. **Zep混合搜索功能**:使用多种查询策略获取丰富的实体信息
|
||
2. **区分实体类型**:个人实体 vs 群体/机构实体,使用不同的提示词
|
||
3. **详细人设生成**:生成500字以上的详细人设描述
|
||
|
||
### Zep混合搜索策略
|
||
|
||
`_search_zep_for_entity()` 方法采用多种搜索策略获取丰富信息:
|
||
|
||
**查询策略:**
|
||
```python
|
||
queries = [
|
||
f"总结{entity_name}的全部活动、事件和行为",
|
||
f"{entity_name}与其他实体的关系和互动",
|
||
f"{entity_name}的背景、历史和重要信息",
|
||
f"关于{entity_name}的所有事实和描述",
|
||
]
|
||
```
|
||
|
||
**说明:** Zep没有内置的混合搜索接口,需要分别搜索edges和nodes。我们使用**并行请求**同时执行两个搜索:
|
||
|
||
```python
|
||
# 并行执行edges和nodes搜索
|
||
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
|
||
edge_future = executor.submit(search_edges) # scope="edges"
|
||
node_future = executor.submit(search_nodes) # scope="nodes"
|
||
|
||
edge_result = edge_future.result(timeout=30)
|
||
node_result = node_future.result(timeout=30)
|
||
```
|
||
|
||
**搜索参数:**
|
||
|
||
| 搜索类型 | scope | limit | 说明 |
|
||
|----------|-------|-------|------|
|
||
| 边搜索 | edges | 30 | 获取事实/关系信息 |
|
||
| 节点搜索 | nodes | 20 | 获取相关实体摘要 |
|
||
|
||
**关键参数:**
|
||
- 必须传递 `graph_id` 参数,否则Zep API会返回400错误
|
||
- 使用 `rrf` (Reciprocal Rank Fusion) reranker,稳定可靠
|
||
- 使用线程池并行执行,提高效率
|
||
|
||
**返回数据结构:**
|
||
```python
|
||
{
|
||
"facts": [...], # 事实列表(来自edges)
|
||
"node_summaries": [...], # 相关节点摘要(来自nodes)
|
||
"context": "..." # 综合上下文文本
|
||
}
|
||
```
|
||
|
||
### LLM生成与JSON修复
|
||
|
||
为了避免LLM生成的JSON解析失败,实现了以下优化:
|
||
|
||
1. **不限制max_tokens**:让LLM自由发挥,充分利用模型的上下文能力
|
||
2. **多次重试机制**:最多3次尝试,每次降低temperature
|
||
3. **截断检测与修复**:检测`finish_reason='length'`,自动闭合JSON
|
||
4. **完善JSON修复机制**:
|
||
- `_fix_truncated_json()`: 修复被截断的JSON(闭合括号和字符串)
|
||
- `_try_fix_json()`: 多级修复策略
|
||
- 提取JSON部分
|
||
- 替换字符串中的换行符
|
||
- 移除控制字符
|
||
- 从损坏JSON中提取部分信息
|
||
5. **字段验证**:确保必需字段存在,缺失时使用entity_summary填充
|
||
|
||
**错误处理流程**:
|
||
```
|
||
LLM调用 → 检查截断 → JSON解析 → 修复尝试 → 部分提取 → 规则生成
|
||
```
|
||
|
||
### 并行生成与实时输出
|
||
|
||
支持并行生成Agent人设,提高生成效率:
|
||
|
||
```python
|
||
profiles = generator.generate_profiles_from_entities(
|
||
entities=filtered.entities,
|
||
use_llm=True,
|
||
graph_id="mirofish_xxx",
|
||
parallel_count=5 # 并行生成数量,默认5
|
||
)
|
||
```
|
||
|
||
**API参数**:
|
||
```json
|
||
POST /api/simulation/prepare
|
||
{
|
||
"simulation_id": "sim_xxx",
|
||
"parallel_profile_count": 5, // 可选,并行生成人设数量,默认5
|
||
"force_regenerate": false // 可选,强制重新生成,默认false
|
||
}
|
||
```
|
||
|
||
**实时输出**:
|
||
- 每生成一个人设,立即输出到控制台(完整内容不截断)
|
||
- 包含用户名、简介、详细人设、年龄、性别、MBTI等信息
|
||
- 方便实时监控生成进度和质量
|
||
|
||
### 避免重复生成
|
||
|
||
系统会自动检测已完成的准备工作,避免重复生成:
|
||
|
||
**检测条件**:
|
||
1. `state.json` 存在且 `config_generated=true`
|
||
2. 必要文件存在:`reddit_profiles.json`, `twitter_profiles.csv`, `simulation_config.json`
|
||
|
||
**API响应**:
|
||
```json
|
||
// 已准备完成时
|
||
{
|
||
"success": true,
|
||
"data": {
|
||
"simulation_id": "sim_xxx",
|
||
"status": "ready",
|
||
"message": "已有完成的准备工作,无需重复生成",
|
||
"already_prepared": true,
|
||
"prepare_info": {
|
||
"entities_count": 93,
|
||
"profiles_count": 93,
|
||
"entity_types": ["Student", "Professor", ...],
|
||
"existing_files": [...]
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
**强制重新生成**:
|
||
```json
|
||
POST /api/simulation/prepare
|
||
{
|
||
"simulation_id": "sim_xxx",
|
||
"force_regenerate": true // 忽略已有准备,强制重新生成
|
||
}
|
||
```
|
||
|
||
### 实体类型分类
|
||
|
||
```python
|
||
# 个人类型实体 - 生成具体人物设定
|
||
INDIVIDUAL_ENTITY_TYPES = [
|
||
"student", "alumni", "professor", "person", "publicfigure",
|
||
"expert", "faculty", "official", "journalist", "activist"
|
||
]
|
||
|
||
# 群体/机构类型实体 - 生成官方账号设定
|
||
GROUP_ENTITY_TYPES = [
|
||
"university", "governmentagency", "organization", "ngo",
|
||
"mediaoutlet", "company", "institution", "group", "community"
|
||
]
|
||
```
|
||
|
||
### Profile数据结构
|
||
|
||
```python
|
||
@dataclass
|
||
class OasisAgentProfile:
|
||
# 基础字段
|
||
user_id: int # 用户ID
|
||
user_name: str # 用户名
|
||
name: str # 显示名称
|
||
bio: str # 简介(max 150字符)
|
||
persona: str # 详细人设描述(500字以上)
|
||
|
||
# Reddit字段
|
||
karma: int = 1000
|
||
|
||
# Twitter字段
|
||
friend_count: int = 100
|
||
follower_count: int = 150
|
||
statuses_count: int = 500
|
||
|
||
# 人设详情
|
||
age: Optional[int] = None
|
||
gender: Optional[str] = None
|
||
mbti: Optional[str] = None # INTJ, ENFP等
|
||
country: Optional[str] = None
|
||
profession: Optional[str] = None
|
||
interested_topics: List[str] = []
|
||
|
||
# 来源信息
|
||
source_entity_uuid: Optional[str] = None
|
||
source_entity_type: Optional[str] = None
|
||
```
|
||
|
||
### 详细人设生成示例
|
||
|
||
**个人实体人设结构:**
|
||
```markdown
|
||
## 一、基本信息
|
||
- 姓名/称呼、年龄、职业/身份
|
||
- 教育背景、所在地
|
||
|
||
## 二、人物背景
|
||
- 过去的重要经历
|
||
- 与事件的关联
|
||
- 社会关系网络
|
||
|
||
## 三、性格特征
|
||
- MBTI类型及表现
|
||
- 核心性格特点
|
||
- 情绪表达方式
|
||
|
||
## 四、社交媒体行为模式
|
||
- 发帖频率和时间
|
||
- 内容偏好类型
|
||
- 语言风格特点
|
||
|
||
## 五、立场与观点
|
||
- 对核心话题的态度
|
||
- 可能被激怒/感动的内容
|
||
|
||
## 六、独特特征
|
||
- 口头禅、个人爱好等
|
||
```
|
||
|
||
### Profile生成策略
|
||
|
||
**1. LLM生成(默认)**
|
||
|
||
使用LLM根据实体信息生成详细人设:
|
||
|
||
```python
|
||
prompt = f"""
|
||
Entity: {entity_name} ({entity_type})
|
||
Summary: {entity_summary}
|
||
Context: {related_edges_and_nodes}
|
||
|
||
Generate a social media user profile with:
|
||
- bio (max 150 chars)
|
||
- persona (detailed description)
|
||
- age, gender, mbti, country
|
||
- profession, interested_topics
|
||
"""
|
||
```
|
||
|
||
**2. 规则生成(Fallback)**
|
||
|
||
根据实体类型使用预定义模板:
|
||
|
||
| 实体类型 | 生成策略 |
|
||
|----------|----------|
|
||
| Student/Alumni | 年龄18-30,学生身份,关注教育话题 |
|
||
| PublicFigure/Expert | 年龄35-60,专业人士,政治经济话题 |
|
||
| MediaOutlet | 媒体官方账号,新闻时事话题 |
|
||
| University/GovernmentAgency | 机构官方账号,政策公告话题 |
|
||
|
||
---
|
||
|
||
## 3. SimulationConfigGenerator(模拟配置智能生成器)
|
||
|
||
**文件:** `app/services/simulation_config_generator.py`
|
||
|
||
### 核心功能
|
||
|
||
使用LLM分析模拟需求、文档内容、图谱实体信息,自动生成最佳的模拟参数配置。
|
||
|
||
**采用分步生成策略**(避免一次性生成过长内容导致失败):
|
||
1. 生成时间配置(轻量级)
|
||
2. 生成事件配置和热点话题
|
||
3. 分批生成Agent配置(**每批5个**,保证生成质量)
|
||
4. 生成平台配置
|
||
|
||
| 方法 | 说明 |
|
||
|------|------|
|
||
| `generate_config(...)` | 智能生成完整模拟配置(分步) |
|
||
| `_generate_time_config(...)` | 生成时间配置 |
|
||
| `_generate_event_config(...)` | 生成事件配置 |
|
||
| `_generate_agent_configs_batch(...)` | 分批生成Agent配置 |
|
||
| `_generate_agent_config_by_rule(...)` | 规则生成(LLM失败时) |
|
||
|
||
### 中国人作息时间配置
|
||
|
||
系统针对中国用户群体,采用符合北京时间的作息习惯:
|
||
|
||
| 时段 | 时间范围 | 活跃度系数 | 说明 |
|
||
|------|----------|------------|------|
|
||
| 深夜 | 0:00-5:59 | 0.05 | 几乎无人活动 |
|
||
| 早间 | 6:00-8:59 | 0.4 | 逐渐醒来 |
|
||
| 工作 | 9:00-18:59 | 0.7 | 工作时段中等活跃 |
|
||
| 高峰 | 19:00-22:59 | 1.5 | 晚间最活跃 |
|
||
| 夜间 | 23:00-23:59 | 0.5 | 活跃度下降 |
|
||
|
||
### LLM智能生成的配置内容
|
||
|
||
**1. TimeSimulationConfig(时间配置)**
|
||
```python
|
||
@dataclass
|
||
class TimeSimulationConfig:
|
||
total_simulation_hours: int = 72 # 模拟总时长(小时)
|
||
minutes_per_round: int = 30 # 每轮代表的时间(分钟)
|
||
agents_per_hour_min: int = 5 # 每小时激活Agent数量(最小)
|
||
agents_per_hour_max: int = 20 # 每小时激活Agent数量(最大)
|
||
peak_hours: List[int] = [19,20,21,22] # 高峰时段(晚间)
|
||
off_peak_hours: List[int] = [0,1,2,3,4,5] # 低谷时段(凌晨)
|
||
peak_activity_multiplier: float = 1.5 # 高峰活跃度乘数
|
||
off_peak_activity_multiplier: float = 0.05 # 凌晨活跃度极低
|
||
morning_hours: List[int] = [6,7,8] # 早间时段
|
||
morning_activity_multiplier: float = 0.4
|
||
work_hours: List[int] = [9-18] # 工作时段
|
||
work_activity_multiplier: float = 0.7
|
||
```
|
||
|
||
**2. AgentActivityConfig(每个Agent的活动配置)**
|
||
```python
|
||
@dataclass
|
||
class AgentActivityConfig:
|
||
agent_id: int
|
||
entity_uuid: str
|
||
entity_name: str
|
||
entity_type: str
|
||
|
||
activity_level: float = 0.5 # 整体活跃度 (0.0-1.0)
|
||
posts_per_hour: float = 1.0 # 每小时发帖频率
|
||
comments_per_hour: float = 2.0 # 每小时评论频率
|
||
active_hours: List[int] # 活跃时间段 (0-23)
|
||
response_delay_min: int = 5 # 响应延迟最小值(分钟)
|
||
response_delay_max: int = 60 # 响应延迟最大值(分钟)
|
||
sentiment_bias: float = 0.0 # 情感倾向 (-1到1)
|
||
stance: str = "neutral" # 立场 (supportive/opposing/neutral/observer)
|
||
influence_weight: float = 1.0 # 影响力权重
|
||
```
|
||
|
||
**3. 不同实体类型的默认参数差异(符合中国人作息)**
|
||
|
||
| 实体类型 | 活跃度 | 发帖频率 | 活跃时段 | 响应延迟 | 影响力 |
|
||
|----------|--------|----------|----------|----------|--------|
|
||
| University/GovernmentAgency | 0.2 | 0.1/小时 | 9:00-17:59(工作时间) | 60-240分钟 | 3.0 |
|
||
| MediaOutlet | 0.5 | 0.8/小时 | 7:00-23:59(全天) | 5-30分钟 | 2.5 |
|
||
| Professor/Expert | 0.4 | 0.3/小时 | 8:00-21:59(工作+晚间) | 15-90分钟 | 2.0 |
|
||
| Student | 0.8 | 0.6/小时 | 8-13, 18-23(上午+晚间) | 1-15分钟 | 0.8 |
|
||
| Alumni | 0.6 | 0.4/小时 | 12-13, 19-23(午休+晚间) | 5-30分钟 | 1.0 |
|
||
| Person(普通人) | 0.7 | 0.5/小时 | 9-13, 18-23(白天+晚间) | 2-20分钟 | 1.0 |
|
||
|
||
**注意**:凌晨0-5点所有实体类型都几乎不活动(符合中国人作息习惯)
|
||
|
||
---
|
||
|
||
## 4. SimulationManager(模拟管理器)
|
||
|
||
**文件:** `app/services/simulation_manager.py`
|
||
|
||
### 核心功能
|
||
|
||
| 方法 | 说明 |
|
||
|------|------|
|
||
| `create_simulation(project_id, graph_id, ...)` | 创建模拟 |
|
||
| `prepare_simulation(simulation_id, ...)` | 准备模拟环境(调用配置生成器) |
|
||
| `get_simulation(simulation_id)` | 获取模拟状态 |
|
||
| `get_profiles(simulation_id, platform)` | 获取Profile |
|
||
| `get_simulation_config(simulation_id)` | 获取模拟配置 |
|
||
| `get_run_instructions(simulation_id)` | 获取运行说明 |
|
||
|
||
### 模拟状态流转
|
||
|
||
```
|
||
created → preparing → ready → running → completed
|
||
↓ ↓
|
||
failed paused
|
||
```
|
||
|
||
### 生成的文件结构
|
||
|
||
```
|
||
uploads/simulations/sim_xxxx/
|
||
├── state.json # 模拟状态
|
||
├── simulation_config.json # LLM生成的模拟配置(核心文件)
|
||
├── reddit_profiles.json # Reddit Agent Profile(JSON格式)
|
||
├── twitter_profiles.csv # Twitter Agent Profile(CSV格式)
|
||
├── run_reddit_simulation.py # 预设Reddit模拟脚本
|
||
├── run_twitter_simulation.py # 预设Twitter模拟脚本
|
||
├── run_parallel_simulation.py # 预设双平台并行脚本
|
||
├── reddit_simulation.db # Reddit数据库(运行后生成)
|
||
└── twitter_simulation.db # Twitter数据库(运行后生成)
|
||
```
|
||
|
||
**重要:OASIS平台的Profile格式要求不同:**
|
||
|
||
**Twitter CSV格式**(符合OASIS官方要求):
|
||
```csv
|
||
user_id,name,username,user_char,description
|
||
0,张教授,professor_zhang,"完整人设描述(LLM内部使用)","简短简介(外部显示)"
|
||
```
|
||
- `user_id`: 从0开始的顺序ID
|
||
- `name`: 真实姓名
|
||
- `username`: 系统用户名
|
||
- `user_char`: 完整人设(bio + persona),注入LLM系统提示,指导Agent行为
|
||
- `description`: 简短简介,显示在用户资料页面
|
||
|
||
**Reddit JSON格式**:
|
||
```json
|
||
[
|
||
{
|
||
"realname": "张教授",
|
||
"username": "professor_zhang",
|
||
"bio": "简短简介",
|
||
"persona": "详细人设描述",
|
||
"age": 42,
|
||
"gender": "男",
|
||
"mbti": "INTJ",
|
||
"country": "中国",
|
||
"profession": "教授",
|
||
"interested_topics": ["高等教育", "学术诚信"]
|
||
}
|
||
]
|
||
```
|
||
|
||
**user_char vs description 区别**:
|
||
| 字段 | 用途 | 可见性 |
|
||
|------|------|--------|
|
||
| user_char | LLM系统提示,决定Agent如何思考和行动 | 内部使用 |
|
||
| description | 用户资料页面的简介 | 其他用户可见 |
|
||
|
||
### 配置文件示例 (simulation_config.json)
|
||
|
||
```json
|
||
{
|
||
"simulation_id": "sim_abc123",
|
||
"project_id": "proj_xxx",
|
||
"graph_id": "mirofish_xxx",
|
||
"simulation_requirement": "分析武汉大学图书馆事件舆论传播",
|
||
|
||
"time_config": {
|
||
"total_simulation_hours": 72,
|
||
"minutes_per_round": 30,
|
||
"agents_per_hour_min": 5,
|
||
"agents_per_hour_max": 15,
|
||
"peak_hours": [9, 10, 11, 14, 15, 20, 21, 22],
|
||
"off_peak_hours": [0, 1, 2, 3, 4, 5],
|
||
"peak_activity_multiplier": 1.5,
|
||
"off_peak_activity_multiplier": 0.3
|
||
},
|
||
|
||
"agent_configs": [
|
||
{
|
||
"agent_id": 0,
|
||
"entity_name": "武汉大学",
|
||
"entity_type": "University",
|
||
"activity_level": 0.15,
|
||
"posts_per_hour": 0.08,
|
||
"comments_per_hour": 0.02,
|
||
"active_hours": [9, 10, 11, 14, 15, 16, 17],
|
||
"response_delay_min": 120,
|
||
"response_delay_max": 360,
|
||
"sentiment_bias": 0.1,
|
||
"stance": "neutral",
|
||
"influence_weight": 4.0
|
||
},
|
||
{
|
||
"agent_id": 1,
|
||
"entity_name": "杨景媛",
|
||
"entity_type": "Student",
|
||
"activity_level": 0.8,
|
||
"posts_per_hour": 0.5,
|
||
"comments_per_hour": 2.0,
|
||
"active_hours": [7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23],
|
||
"response_delay_min": 1,
|
||
"response_delay_max": 15,
|
||
"sentiment_bias": -0.3,
|
||
"stance": "opposing",
|
||
"influence_weight": 1.5
|
||
}
|
||
],
|
||
|
||
"event_config": {
|
||
"initial_posts": [
|
||
{
|
||
"poster_agent_id": 1,
|
||
"content": "今天在图书馆发生的事情让我非常失望..."
|
||
}
|
||
],
|
||
"hot_topics": ["图书馆事件", "学生权益", "校方回应"],
|
||
"narrative_direction": "事件发酵后各方反应的模拟"
|
||
},
|
||
|
||
"generation_reasoning": "根据武汉大学图书馆事件的特点:1)涉及学生与校方的冲突,设置学生高活跃度、校方低频但高影响力;2)事件性质属于短期热点,设置72小时模拟时长;3)主要当事人杨景媛设置为高活跃度且持opposing立场..."
|
||
}
|
||
|
||
---
|
||
|
||
## 5. 预设模拟脚本
|
||
|
||
**目录:** `backend/scripts/`
|
||
|
||
脚本是**预设的**,不是动态生成。每次准备模拟时,脚本会被复制到模拟目录。
|
||
|
||
### 脚本说明
|
||
|
||
| 脚本 | 说明 |
|
||
|------|------|
|
||
| `run_twitter_simulation.py` | Twitter单平台模拟 |
|
||
| `run_reddit_simulation.py` | Reddit单平台模拟 |
|
||
| `run_parallel_simulation.py` | 双平台并行模拟(推荐) |
|
||
|
||
### 脚本工作原理
|
||
|
||
```python
|
||
# 脚本读取配置文件,自动设置所有参数
|
||
class TwitterSimulationRunner:
|
||
def __init__(self, config_path: str):
|
||
self.config = self._load_config() # 读取simulation_config.json
|
||
|
||
def _get_active_agents_for_round(self, env, current_hour, round_num):
|
||
"""根据时间和配置决定本轮激活哪些Agent"""
|
||
time_config = self.config.get("time_config", {})
|
||
agent_configs = self.config.get("agent_configs", [])
|
||
|
||
# 1. 检查是否高峰/低谷时段,调整激活数量
|
||
# 2. 遍历每个Agent配置,检查是否在活跃时间
|
||
# 3. 根据activity_level计算激活概率
|
||
# 4. 返回本轮应激活的Agent列表
|
||
...
|
||
|
||
async def run(self):
|
||
# 1. 创建LLM模型
|
||
# 2. 加载Agent图
|
||
# 3. 执行初始事件(从event_config读取)
|
||
# 4. 主循环:根据配置激活不同Agent
|
||
...
|
||
```
|
||
|
||
### 使用方式
|
||
|
||
```bash
|
||
# 进入模拟目录
|
||
cd backend/uploads/simulations/sim_xxxx/
|
||
|
||
# 运行模拟
|
||
python run_parallel_simulation.py --config simulation_config.json
|
||
|
||
# 其他选项
|
||
python run_parallel_simulation.py --config simulation_config.json --twitter-only
|
||
python run_parallel_simulation.py --config simulation_config.json --reddit-only
|
||
```
|
||
|
||
---
|
||
|
||
## 6. Profile文件格式说明
|
||
|
||
**OASIS对两个平台的Profile格式有不同要求:**
|
||
|
||
### Twitter Profile (CSV格式)
|
||
|
||
```csv
|
||
user_id,user_name,name,bio,friend_count,follower_count,statuses_count,created_at
|
||
0,user0,User Zero,I am user zero with interests in technology.,100,150,500,2023-01-01
|
||
1,user1,User One,Tech enthusiast and coffee lover.,200,250,1000,2023-01-02
|
||
```
|
||
|
||
| 字段 | 类型 | 说明 |
|
||
|------|------|------|
|
||
| `user_id` | int | 用户ID |
|
||
| `user_name` | string | 用户名 |
|
||
| `name` | string | 显示名称 |
|
||
| `bio` | string | 简介 |
|
||
| `friend_count` | int | 关注数 |
|
||
| `follower_count` | int | 粉丝数 |
|
||
| `statuses_count` | int | 发帖数 |
|
||
| `created_at` | string | 创建日期 |
|
||
|
||
### Reddit Profile (JSON详细格式)
|
||
|
||
```json
|
||
[
|
||
{
|
||
"realname": "Test User",
|
||
"username": "test_user_123",
|
||
"bio": "A test user for validation",
|
||
"persona": "Test User is an enthusiastic participant in social discussions.",
|
||
"age": 25,
|
||
"gender": "male",
|
||
"mbti": "INTJ",
|
||
"country": "China",
|
||
"profession": "Student",
|
||
"interested_topics": ["Technology", "Education"]
|
||
}
|
||
]
|
||
```
|
||
|
||
| 字段 | 类型 | 必填 | 说明 |
|
||
|------|------|------|------|
|
||
| `realname` | string | 是 | 真实姓名 |
|
||
| `username` | string | 是 | 用户名 |
|
||
| `bio` | string | 是 | 简介(最大150字符) |
|
||
| `persona` | string | 是 | 详细人设描述 |
|
||
| `age` | int | 否 | 年龄 |
|
||
| `gender` | string | 否 | 性别 |
|
||
| `mbti` | string | 否 | MBTI人格类型 |
|
||
| `country` | string | 否 | 国家 |
|
||
| `profession` | string | 否 | 职业 |
|
||
| `interested_topics` | array | 否 | 感兴趣话题列表 |
|
||
|
||
---
|
||
|
||
## 7. OASIS平台动作类型
|
||
|
||
### Twitter可用动作
|
||
|
||
| 动作 | 说明 |
|
||
|------|------|
|
||
| `CREATE_POST` | 发布推文 |
|
||
| `LIKE_POST` | 点赞推文 |
|
||
| `REPOST` | 转发推文 |
|
||
| `FOLLOW` | 关注用户 |
|
||
| `QUOTE_POST` | 引用转发 |
|
||
| `DO_NOTHING` | 不执行动作 |
|
||
|
||
### Reddit可用动作
|
||
|
||
| 动作 | 说明 |
|
||
|------|------|
|
||
| `CREATE_POST` | 发布帖子 |
|
||
| `CREATE_COMMENT` | 发表评论 |
|
||
| `LIKE_POST` | 点赞帖子 |
|
||
| `DISLIKE_POST` | 踩帖子 |
|
||
| `LIKE_COMMENT` | 点赞评论 |
|
||
| `DISLIKE_COMMENT` | 踩评论 |
|
||
| `SEARCH_POSTS` | 搜索帖子 |
|
||
| `SEARCH_USER` | 搜索用户 |
|
||
| `TREND` | 查看热门 |
|
||
| `REFRESH` | 刷新推荐 |
|
||
| `FOLLOW` | 关注用户 |
|
||
| `MUTE` | 屏蔽用户 |
|
||
| `DO_NOTHING` | 不执行动作 |
|
||
|
||
---
|
||
|
||
# 实体设计原则
|
||
|
||
本系统专为社会舆论模拟设计,实体必须是:
|
||
|
||
**可以是:**
|
||
- 具体的个人(有名有姓)
|
||
- 注册的公司、组织、机构
|
||
- 媒体机构
|
||
- 政府部门
|
||
- 高校、NGO等
|
||
|
||
**不可以是:**
|
||
- 抽象概念(如"技术"、"创新")
|
||
- 情绪、观点、趋势
|
||
- 泛指的群体(如"用户"、"消费者")
|
||
|
||
这是因为后续需要模拟各实体对舆论的反应和传播,抽象概念无法参与这种模拟。
|
||
|
||
---
|
||
|
||
# 项目状态流转
|
||
|
||
```
|
||
created → ontology_generated → graph_building → graph_completed
|
||
↓
|
||
failed
|
||
```
|
||
|
||
---
|
||
|
||
# 运行模拟
|
||
|
||
准备完成后,进入模拟数据目录运行预设脚本:
|
||
|
||
```bash
|
||
# 激活conda环境
|
||
conda activate MiroFish
|
||
|
||
# 进入模拟目录
|
||
cd backend/uploads/simulations/sim_xxxx/
|
||
|
||
# 运行单平台模拟
|
||
python run_reddit_simulation.py --config simulation_config.json
|
||
# 或
|
||
python run_twitter_simulation.py --config simulation_config.json
|
||
|
||
# 运行双平台并行模拟(推荐)
|
||
python run_parallel_simulation.py --config simulation_config.json
|
||
```
|
||
|
||
### 脚本参数
|
||
|
||
| 参数 | 说明 |
|
||
|------|------|
|
||
| `--config` | 配置文件路径(必填) |
|
||
| `--twitter-only` | 只运行Twitter模拟(仅parallel脚本) |
|
||
| `--reddit-only` | 只运行Reddit模拟(仅parallel脚本) |
|
||
|
||
### 输出文件
|
||
|
||
模拟运行后会生成:
|
||
- `twitter_simulation.db` - Twitter模拟数据库
|
||
- `reddit_simulation.db` - Reddit模拟数据库
|
||
|
||
可使用SQLite工具查看模拟结果(帖子、评论、点赞等)
|
||
|
||
---
|
||
|
||
# API调用重试机制
|
||
|
||
**文件:** `app/utils/retry.py`
|
||
|
||
为LLM等外部API调用提供自动重试功能,提高系统稳定性。
|
||
|
||
## 重试策略
|
||
|
||
- **最大重试次数**:3次
|
||
- **退避策略**:指数退避(1s → 2s → 4s)
|
||
- **最大延迟**:30秒
|
||
- **随机抖动**:避免请求堆积
|
||
|
||
## 使用方式
|
||
|
||
**装饰器方式:**
|
||
```python
|
||
from app.utils.retry import retry_with_backoff
|
||
|
||
@retry_with_backoff(max_retries=3)
|
||
def call_llm_api():
|
||
return client.chat.completions.create(...)
|
||
```
|
||
|
||
**客户端方式:**
|
||
```python
|
||
from app.utils.retry import RetryableAPIClient
|
||
|
||
retry_client = RetryableAPIClient(max_retries=3)
|
||
result = retry_client.call_with_retry(some_function, arg1, arg2)
|
||
```
|
||
|
||
**批量处理(单项失败不影响其他):**
|
||
```python
|
||
results, failures = retry_client.call_batch_with_retry(
|
||
items=entities,
|
||
process_func=generate_profile,
|
||
continue_on_failure=True
|
||
)
|
||
```
|
||
|
||
## 已应用重试机制的模块
|
||
|
||
| 模块 | 说明 |
|
||
|------|------|
|
||
| `OasisProfileGenerator` | LLM生成Agent人设 |
|
||
| `SimulationConfigGenerator` | LLM生成模拟配置 |
|
||
|
||
---
|
||
|
||
# 依赖说明
|
||
|
||
```
|
||
# Flask框架
|
||
flask>=3.0.0
|
||
flask-cors>=4.0.0
|
||
|
||
# Zep Cloud SDK
|
||
zep-cloud>=2.0.0
|
||
|
||
# OpenAI SDK(LLM调用)
|
||
openai>=1.0.0
|
||
|
||
# PDF处理
|
||
PyMuPDF>=1.24.0
|
||
|
||
# 环境变量
|
||
python-dotenv>=1.0.0
|
||
|
||
# 数据验证
|
||
pydantic>=2.0.0
|
||
|
||
# OASIS社交媒体模拟
|
||
oasis-ai>=0.1.0
|
||
camel-ai>=0.2.0
|
||
```
|