Enhance OASIS simulation capabilities and profile generation
- Updated README.md to include detailed descriptions of new features, including Zep mixed search functionality and detailed persona generation for individual and group entities. - Implemented a robust mechanism for checking simulation preparation status to avoid redundant profile generation. - Added support for parallel profile generation, improving efficiency in creating OASIS Agent Profiles. - Enhanced the simulation configuration generator to adopt a stepwise approach, ensuring better handling of complex configurations. - Introduced error handling and retry mechanisms for LLM calls, improving the reliability of profile generation. - Updated simulation management to support new API parameters for controlling profile generation behavior.
This commit is contained in:
parent
5f159f6d88
commit
af5c235695
5 changed files with 1602 additions and 408 deletions
|
|
@ -1057,9 +1057,165 @@ for node in all_nodes:
|
||||||
|
|
||||||
| 方法 | 说明 |
|
| 方法 | 说明 |
|
||||||
|------|------|
|
|------|------|
|
||||||
| `generate_profile_from_entity(entity, user_id)` | 从实体生成单个Profile |
|
| `generate_profile_from_entity(entity, user_id)` | 从实体生成单个Profile(带详细人设) |
|
||||||
| `generate_profiles_from_entities(entities)` | 批量生成Profile |
|
| `generate_profiles_from_entities(entities, graph_id)` | 批量生成Profile |
|
||||||
| `save_profiles_to_json(profiles, path, platform)` | 保存到JSON文件 |
|
| `save_profiles(profiles, path, platform)` | 保存Profile文件 |
|
||||||
|
| `_search_zep_for_entity(entity_name)` | 调用Zep检索获取额外上下文 |
|
||||||
|
|
||||||
|
### 优化特性(v2.0)
|
||||||
|
|
||||||
|
1. **Zep混合搜索功能**:使用多种查询策略获取丰富的实体信息
|
||||||
|
2. **区分实体类型**:个人实体 vs 群体/机构实体,使用不同的提示词
|
||||||
|
3. **详细人设生成**:生成500字以上的详细人设描述
|
||||||
|
|
||||||
|
### Zep混合搜索策略
|
||||||
|
|
||||||
|
`_search_zep_for_entity()` 方法采用多种搜索策略获取丰富信息:
|
||||||
|
|
||||||
|
**查询策略:**
|
||||||
|
```python
|
||||||
|
queries = [
|
||||||
|
f"总结{entity_name}的全部活动、事件和行为",
|
||||||
|
f"{entity_name}与其他实体的关系和互动",
|
||||||
|
f"{entity_name}的背景、历史和重要信息",
|
||||||
|
f"关于{entity_name}的所有事实和描述",
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
**说明:** Zep没有内置的混合搜索接口,需要分别搜索edges和nodes。我们使用**并行请求**同时执行两个搜索:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# 并行执行edges和nodes搜索
|
||||||
|
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
|
||||||
|
edge_future = executor.submit(search_edges) # scope="edges"
|
||||||
|
node_future = executor.submit(search_nodes) # scope="nodes"
|
||||||
|
|
||||||
|
edge_result = edge_future.result(timeout=30)
|
||||||
|
node_result = node_future.result(timeout=30)
|
||||||
|
```
|
||||||
|
|
||||||
|
**搜索参数:**
|
||||||
|
|
||||||
|
| 搜索类型 | scope | limit | 说明 |
|
||||||
|
|----------|-------|-------|------|
|
||||||
|
| 边搜索 | edges | 30 | 获取事实/关系信息 |
|
||||||
|
| 节点搜索 | nodes | 20 | 获取相关实体摘要 |
|
||||||
|
|
||||||
|
**关键参数:**
|
||||||
|
- 必须传递 `graph_id` 参数,否则Zep API会返回400错误
|
||||||
|
- 使用 `rrf` (Reciprocal Rank Fusion) reranker,稳定可靠
|
||||||
|
- 使用线程池并行执行,提高效率
|
||||||
|
|
||||||
|
**返回数据结构:**
|
||||||
|
```python
|
||||||
|
{
|
||||||
|
"facts": [...], # 事实列表(来自edges)
|
||||||
|
"node_summaries": [...], # 相关节点摘要(来自nodes)
|
||||||
|
"context": "..." # 综合上下文文本
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### LLM生成与JSON修复
|
||||||
|
|
||||||
|
为了避免LLM生成的JSON解析失败,实现了以下优化:
|
||||||
|
|
||||||
|
1. **不限制max_tokens**:让LLM自由发挥,充分利用模型的上下文能力
|
||||||
|
2. **多次重试机制**:最多3次尝试,每次降低temperature
|
||||||
|
3. **截断检测与修复**:检测`finish_reason='length'`,自动闭合JSON
|
||||||
|
4. **完善JSON修复机制**:
|
||||||
|
- `_fix_truncated_json()`: 修复被截断的JSON(闭合括号和字符串)
|
||||||
|
- `_try_fix_json()`: 多级修复策略
|
||||||
|
- 提取JSON部分
|
||||||
|
- 替换字符串中的换行符
|
||||||
|
- 移除控制字符
|
||||||
|
- 从损坏JSON中提取部分信息
|
||||||
|
5. **字段验证**:确保必需字段存在,缺失时使用entity_summary填充
|
||||||
|
|
||||||
|
**错误处理流程**:
|
||||||
|
```
|
||||||
|
LLM调用 → 检查截断 → JSON解析 → 修复尝试 → 部分提取 → 规则生成
|
||||||
|
```
|
||||||
|
|
||||||
|
### 并行生成与实时输出
|
||||||
|
|
||||||
|
支持并行生成Agent人设,提高生成效率:
|
||||||
|
|
||||||
|
```python
|
||||||
|
profiles = generator.generate_profiles_from_entities(
|
||||||
|
entities=filtered.entities,
|
||||||
|
use_llm=True,
|
||||||
|
graph_id="mirofish_xxx",
|
||||||
|
parallel_count=5 # 并行生成数量,默认5
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
**API参数**:
|
||||||
|
```json
|
||||||
|
POST /api/simulation/prepare
|
||||||
|
{
|
||||||
|
"simulation_id": "sim_xxx",
|
||||||
|
"parallel_profile_count": 5, // 可选,并行生成人设数量,默认5
|
||||||
|
"force_regenerate": false // 可选,强制重新生成,默认false
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**实时输出**:
|
||||||
|
- 每生成一个人设,立即输出到控制台(完整内容不截断)
|
||||||
|
- 包含用户名、简介、详细人设、年龄、性别、MBTI等信息
|
||||||
|
- 方便实时监控生成进度和质量
|
||||||
|
|
||||||
|
### 避免重复生成
|
||||||
|
|
||||||
|
系统会自动检测已完成的准备工作,避免重复生成:
|
||||||
|
|
||||||
|
**检测条件**:
|
||||||
|
1. `state.json` 存在且 `config_generated=true`
|
||||||
|
2. 必要文件存在:`reddit_profiles.json`, `twitter_profiles.csv`, `simulation_config.json`
|
||||||
|
|
||||||
|
**API响应**:
|
||||||
|
```json
|
||||||
|
// 已准备完成时
|
||||||
|
{
|
||||||
|
"success": true,
|
||||||
|
"data": {
|
||||||
|
"simulation_id": "sim_xxx",
|
||||||
|
"status": "ready",
|
||||||
|
"message": "已有完成的准备工作,无需重复生成",
|
||||||
|
"already_prepared": true,
|
||||||
|
"prepare_info": {
|
||||||
|
"entities_count": 93,
|
||||||
|
"profiles_count": 93,
|
||||||
|
"entity_types": ["Student", "Professor", ...],
|
||||||
|
"existing_files": [...]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**强制重新生成**:
|
||||||
|
```json
|
||||||
|
POST /api/simulation/prepare
|
||||||
|
{
|
||||||
|
"simulation_id": "sim_xxx",
|
||||||
|
"force_regenerate": true // 忽略已有准备,强制重新生成
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 实体类型分类
|
||||||
|
|
||||||
|
```python
|
||||||
|
# 个人类型实体 - 生成具体人物设定
|
||||||
|
INDIVIDUAL_ENTITY_TYPES = [
|
||||||
|
"student", "alumni", "professor", "person", "publicfigure",
|
||||||
|
"expert", "faculty", "official", "journalist", "activist"
|
||||||
|
]
|
||||||
|
|
||||||
|
# 群体/机构类型实体 - 生成官方账号设定
|
||||||
|
GROUP_ENTITY_TYPES = [
|
||||||
|
"university", "governmentagency", "organization", "ngo",
|
||||||
|
"mediaoutlet", "company", "institution", "group", "community"
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
### Profile数据结构
|
### Profile数据结构
|
||||||
|
|
||||||
|
|
@ -1071,7 +1227,7 @@ class OasisAgentProfile:
|
||||||
user_name: str # 用户名
|
user_name: str # 用户名
|
||||||
name: str # 显示名称
|
name: str # 显示名称
|
||||||
bio: str # 简介(max 150字符)
|
bio: str # 简介(max 150字符)
|
||||||
persona: str # 详细人设描述
|
persona: str # 详细人设描述(500字以上)
|
||||||
|
|
||||||
# Reddit字段
|
# Reddit字段
|
||||||
karma: int = 1000
|
karma: int = 1000
|
||||||
|
|
@ -1094,6 +1250,37 @@ class OasisAgentProfile:
|
||||||
source_entity_type: Optional[str] = None
|
source_entity_type: Optional[str] = None
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### 详细人设生成示例
|
||||||
|
|
||||||
|
**个人实体人设结构:**
|
||||||
|
```markdown
|
||||||
|
## 一、基本信息
|
||||||
|
- 姓名/称呼、年龄、职业/身份
|
||||||
|
- 教育背景、所在地
|
||||||
|
|
||||||
|
## 二、人物背景
|
||||||
|
- 过去的重要经历
|
||||||
|
- 与事件的关联
|
||||||
|
- 社会关系网络
|
||||||
|
|
||||||
|
## 三、性格特征
|
||||||
|
- MBTI类型及表现
|
||||||
|
- 核心性格特点
|
||||||
|
- 情绪表达方式
|
||||||
|
|
||||||
|
## 四、社交媒体行为模式
|
||||||
|
- 发帖频率和时间
|
||||||
|
- 内容偏好类型
|
||||||
|
- 语言风格特点
|
||||||
|
|
||||||
|
## 五、立场与观点
|
||||||
|
- 对核心话题的态度
|
||||||
|
- 可能被激怒/感动的内容
|
||||||
|
|
||||||
|
## 六、独特特征
|
||||||
|
- 口头禅、个人爱好等
|
||||||
|
```
|
||||||
|
|
||||||
### Profile生成策略
|
### Profile生成策略
|
||||||
|
|
||||||
**1. LLM生成(默认)**
|
**1. LLM生成(默认)**
|
||||||
|
|
@ -1135,12 +1322,31 @@ Generate a social media user profile with:
|
||||||
|
|
||||||
使用LLM分析模拟需求、文档内容、图谱实体信息,自动生成最佳的模拟参数配置。
|
使用LLM分析模拟需求、文档内容、图谱实体信息,自动生成最佳的模拟参数配置。
|
||||||
|
|
||||||
|
**采用分步生成策略**(避免一次性生成过长内容导致失败):
|
||||||
|
1. 生成时间配置(轻量级)
|
||||||
|
2. 生成事件配置和热点话题
|
||||||
|
3. 分批生成Agent配置(**每批5个**,保证生成质量)
|
||||||
|
4. 生成平台配置
|
||||||
|
|
||||||
| 方法 | 说明 |
|
| 方法 | 说明 |
|
||||||
|------|------|
|
|------|------|
|
||||||
| `generate_config(...)` | 智能生成完整模拟配置 |
|
| `generate_config(...)` | 智能生成完整模拟配置(分步) |
|
||||||
| `_build_context(...)` | 构建LLM上下文(最大5万字) |
|
| `_generate_time_config(...)` | 生成时间配置 |
|
||||||
| `_generate_config_with_llm(...)` | 调用LLM生成配置 |
|
| `_generate_event_config(...)` | 生成事件配置 |
|
||||||
| `_generate_default_config(...)` | 默认配置(LLM失败时) |
|
| `_generate_agent_configs_batch(...)` | 分批生成Agent配置 |
|
||||||
|
| `_generate_agent_config_by_rule(...)` | 规则生成(LLM失败时) |
|
||||||
|
|
||||||
|
### 中国人作息时间配置
|
||||||
|
|
||||||
|
系统针对中国用户群体,采用符合北京时间的作息习惯:
|
||||||
|
|
||||||
|
| 时段 | 时间范围 | 活跃度系数 | 说明 |
|
||||||
|
|------|----------|------------|------|
|
||||||
|
| 深夜 | 0:00-5:59 | 0.05 | 几乎无人活动 |
|
||||||
|
| 早间 | 6:00-8:59 | 0.4 | 逐渐醒来 |
|
||||||
|
| 工作 | 9:00-18:59 | 0.7 | 工作时段中等活跃 |
|
||||||
|
| 高峰 | 19:00-22:59 | 1.5 | 晚间最活跃 |
|
||||||
|
| 夜间 | 23:00-23:59 | 0.5 | 活跃度下降 |
|
||||||
|
|
||||||
### LLM智能生成的配置内容
|
### LLM智能生成的配置内容
|
||||||
|
|
||||||
|
|
@ -1152,10 +1358,14 @@ class TimeSimulationConfig:
|
||||||
minutes_per_round: int = 30 # 每轮代表的时间(分钟)
|
minutes_per_round: int = 30 # 每轮代表的时间(分钟)
|
||||||
agents_per_hour_min: int = 5 # 每小时激活Agent数量(最小)
|
agents_per_hour_min: int = 5 # 每小时激活Agent数量(最小)
|
||||||
agents_per_hour_max: int = 20 # 每小时激活Agent数量(最大)
|
agents_per_hour_max: int = 20 # 每小时激活Agent数量(最大)
|
||||||
peak_hours: List[int] # 高峰时段 [9,10,11,14,15,20,21,22]
|
peak_hours: List[int] = [19,20,21,22] # 高峰时段(晚间)
|
||||||
off_peak_hours: List[int] # 低谷时段 [0,1,2,3,4,5]
|
off_peak_hours: List[int] = [0,1,2,3,4,5] # 低谷时段(凌晨)
|
||||||
peak_activity_multiplier: float = 1.5 # 高峰活跃度乘数
|
peak_activity_multiplier: float = 1.5 # 高峰活跃度乘数
|
||||||
off_peak_activity_multiplier: float = 0.3 # 低谷活跃度乘数
|
off_peak_activity_multiplier: float = 0.05 # 凌晨活跃度极低
|
||||||
|
morning_hours: List[int] = [6,7,8] # 早间时段
|
||||||
|
morning_activity_multiplier: float = 0.4
|
||||||
|
work_hours: List[int] = [9-18] # 工作时段
|
||||||
|
work_activity_multiplier: float = 0.7
|
||||||
```
|
```
|
||||||
|
|
||||||
**2. AgentActivityConfig(每个Agent的活动配置)**
|
**2. AgentActivityConfig(每个Agent的活动配置)**
|
||||||
|
|
@ -1178,14 +1388,18 @@ class AgentActivityConfig:
|
||||||
influence_weight: float = 1.0 # 影响力权重
|
influence_weight: float = 1.0 # 影响力权重
|
||||||
```
|
```
|
||||||
|
|
||||||
**3. 不同实体类型的默认参数差异**
|
**3. 不同实体类型的默认参数差异(符合中国人作息)**
|
||||||
|
|
||||||
| 实体类型 | 活跃度 | 发帖频率 | 响应延迟 | 影响力 |
|
| 实体类型 | 活跃度 | 发帖频率 | 活跃时段 | 响应延迟 | 影响力 |
|
||||||
|----------|--------|----------|----------|--------|
|
|----------|--------|----------|----------|----------|--------|
|
||||||
| University/GovernmentAgency | 0.2 | 0.1/小时 | 60-240分钟 | 3.0 |
|
| University/GovernmentAgency | 0.2 | 0.1/小时 | 9:00-17:59(工作时间) | 60-240分钟 | 3.0 |
|
||||||
| MediaOutlet | 0.6 | 1.0/小时 | 5-30分钟 | 2.5 |
|
| MediaOutlet | 0.5 | 0.8/小时 | 7:00-23:59(全天) | 5-30分钟 | 2.5 |
|
||||||
| PublicFigure/Expert | 0.5 | 0.3/小时 | 10-60分钟 | 2.0 |
|
| Professor/Expert | 0.4 | 0.3/小时 | 8:00-21:59(工作+晚间) | 15-90分钟 | 2.0 |
|
||||||
| Student/Person | 0.7 | 0.5/小时 | 1-20分钟 | 1.0 |
|
| Student | 0.8 | 0.6/小时 | 8-13, 18-23(上午+晚间) | 1-15分钟 | 0.8 |
|
||||||
|
| Alumni | 0.6 | 0.4/小时 | 12-13, 19-23(午休+晚间) | 5-30分钟 | 1.0 |
|
||||||
|
| Person(普通人) | 0.7 | 0.5/小时 | 9-13, 18-23(白天+晚间) | 2-20分钟 | 1.0 |
|
||||||
|
|
||||||
|
**注意**:凌晨0-5点所有实体类型都几乎不活动(符合中国人作息习惯)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -1228,8 +1442,41 @@ uploads/simulations/sim_xxxx/
|
||||||
```
|
```
|
||||||
|
|
||||||
**重要:OASIS平台的Profile格式要求不同:**
|
**重要:OASIS平台的Profile格式要求不同:**
|
||||||
- **Twitter**: 使用CSV格式,字段:`user_id,user_name,name,bio,friend_count,follower_count,statuses_count,created_at`
|
|
||||||
- **Reddit**: 使用JSON格式,支持详细人设字段:`realname,username,bio,persona,age,gender,mbti,country,profession,interested_topics`
|
**Twitter CSV格式**(符合OASIS官方要求):
|
||||||
|
```csv
|
||||||
|
user_id,name,username,user_char,description
|
||||||
|
0,张教授,professor_zhang,"完整人设描述(LLM内部使用)","简短简介(外部显示)"
|
||||||
|
```
|
||||||
|
- `user_id`: 从0开始的顺序ID
|
||||||
|
- `name`: 真实姓名
|
||||||
|
- `username`: 系统用户名
|
||||||
|
- `user_char`: 完整人设(bio + persona),注入LLM系统提示,指导Agent行为
|
||||||
|
- `description`: 简短简介,显示在用户资料页面
|
||||||
|
|
||||||
|
**Reddit JSON格式**:
|
||||||
|
```json
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"realname": "张教授",
|
||||||
|
"username": "professor_zhang",
|
||||||
|
"bio": "简短简介",
|
||||||
|
"persona": "详细人设描述",
|
||||||
|
"age": 42,
|
||||||
|
"gender": "男",
|
||||||
|
"mbti": "INTJ",
|
||||||
|
"country": "中国",
|
||||||
|
"profession": "教授",
|
||||||
|
"interested_topics": ["高等教育", "学术诚信"]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
**user_char vs description 区别**:
|
||||||
|
| 字段 | 用途 | 可见性 |
|
||||||
|
|------|------|--------|
|
||||||
|
| user_char | LLM系统提示,决定Agent如何思考和行动 | 内部使用 |
|
||||||
|
| description | 用户资料页面的简介 | 其他用户可见 |
|
||||||
|
|
||||||
### 配置文件示例 (simulation_config.json)
|
### 配置文件示例 (simulation_config.json)
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -213,6 +213,112 @@ def create_simulation():
|
||||||
}), 500
|
}), 500
|
||||||
|
|
||||||
|
|
||||||
|
def _check_simulation_prepared(simulation_id: str) -> tuple:
|
||||||
|
"""
|
||||||
|
检查模拟是否已经准备完成
|
||||||
|
|
||||||
|
检查条件:
|
||||||
|
1. state.json 存在且 status 为 "ready"
|
||||||
|
2. 必要文件存在:reddit_profiles.json, twitter_profiles.csv, simulation_config.json
|
||||||
|
|
||||||
|
Args:
|
||||||
|
simulation_id: 模拟ID
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
(is_prepared: bool, info: dict)
|
||||||
|
"""
|
||||||
|
import os
|
||||||
|
from ..config import Config
|
||||||
|
|
||||||
|
simulation_dir = os.path.join(Config.OASIS_SIMULATION_DATA_DIR, simulation_id)
|
||||||
|
|
||||||
|
# 检查目录是否存在
|
||||||
|
if not os.path.exists(simulation_dir):
|
||||||
|
return False, {"reason": "模拟目录不存在"}
|
||||||
|
|
||||||
|
# 必要文件列表
|
||||||
|
required_files = [
|
||||||
|
"state.json",
|
||||||
|
"simulation_config.json",
|
||||||
|
"reddit_profiles.json",
|
||||||
|
"twitter_profiles.csv",
|
||||||
|
"run_reddit_simulation.py",
|
||||||
|
"run_twitter_simulation.py",
|
||||||
|
"run_parallel_simulation.py"
|
||||||
|
]
|
||||||
|
|
||||||
|
# 检查文件是否存在
|
||||||
|
existing_files = []
|
||||||
|
missing_files = []
|
||||||
|
for f in required_files:
|
||||||
|
file_path = os.path.join(simulation_dir, f)
|
||||||
|
if os.path.exists(file_path):
|
||||||
|
existing_files.append(f)
|
||||||
|
else:
|
||||||
|
missing_files.append(f)
|
||||||
|
|
||||||
|
if missing_files:
|
||||||
|
return False, {
|
||||||
|
"reason": "缺少必要文件",
|
||||||
|
"missing_files": missing_files,
|
||||||
|
"existing_files": existing_files
|
||||||
|
}
|
||||||
|
|
||||||
|
# 检查state.json中的状态
|
||||||
|
state_file = os.path.join(simulation_dir, "state.json")
|
||||||
|
try:
|
||||||
|
import json
|
||||||
|
with open(state_file, 'r', encoding='utf-8') as f:
|
||||||
|
state_data = json.load(f)
|
||||||
|
|
||||||
|
status = state_data.get("status", "")
|
||||||
|
|
||||||
|
# 如果状态是ready或preparing(已有文件),认为准备完成
|
||||||
|
if status in ["ready", "preparing"] and state_data.get("config_generated"):
|
||||||
|
# 获取文件统计信息
|
||||||
|
profiles_file = os.path.join(simulation_dir, "reddit_profiles.json")
|
||||||
|
config_file = os.path.join(simulation_dir, "simulation_config.json")
|
||||||
|
|
||||||
|
profiles_count = 0
|
||||||
|
if os.path.exists(profiles_file):
|
||||||
|
with open(profiles_file, 'r', encoding='utf-8') as f:
|
||||||
|
profiles_data = json.load(f)
|
||||||
|
profiles_count = len(profiles_data) if isinstance(profiles_data, list) else 0
|
||||||
|
|
||||||
|
# 如果状态是preparing但文件已完成,自动更新状态为ready
|
||||||
|
if status == "preparing":
|
||||||
|
try:
|
||||||
|
state_data["status"] = "ready"
|
||||||
|
from datetime import datetime
|
||||||
|
state_data["updated_at"] = datetime.now().isoformat()
|
||||||
|
with open(state_file, 'w', encoding='utf-8') as f:
|
||||||
|
json.dump(state_data, f, ensure_ascii=False, indent=2)
|
||||||
|
logger.info(f"自动更新模拟状态: {simulation_id} preparing -> ready")
|
||||||
|
status = "ready"
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"自动更新状态失败: {e}")
|
||||||
|
|
||||||
|
return True, {
|
||||||
|
"status": status,
|
||||||
|
"entities_count": state_data.get("entities_count", 0),
|
||||||
|
"profiles_count": profiles_count,
|
||||||
|
"entity_types": state_data.get("entity_types", []),
|
||||||
|
"config_generated": state_data.get("config_generated", False),
|
||||||
|
"created_at": state_data.get("created_at"),
|
||||||
|
"updated_at": state_data.get("updated_at"),
|
||||||
|
"existing_files": existing_files
|
||||||
|
}
|
||||||
|
else:
|
||||||
|
return False, {
|
||||||
|
"reason": f"状态不是ready: {status}",
|
||||||
|
"status": status,
|
||||||
|
"config_generated": state_data.get("config_generated", False)
|
||||||
|
}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
return False, {"reason": f"读取状态文件失败: {str(e)}"}
|
||||||
|
|
||||||
|
|
||||||
@simulation_bp.route('/prepare', methods=['POST'])
|
@simulation_bp.route('/prepare', methods=['POST'])
|
||||||
def prepare_simulation():
|
def prepare_simulation():
|
||||||
"""
|
"""
|
||||||
|
|
@ -221,17 +327,25 @@ def prepare_simulation():
|
||||||
这是一个耗时操作,接口会立即返回task_id,
|
这是一个耗时操作,接口会立即返回task_id,
|
||||||
使用 GET /api/simulation/prepare/status 查询进度
|
使用 GET /api/simulation/prepare/status 查询进度
|
||||||
|
|
||||||
|
特性:
|
||||||
|
- 自动检测已完成的准备工作,避免重复生成
|
||||||
|
- 如果已准备完成,直接返回已有结果
|
||||||
|
- 支持强制重新生成(force_regenerate=true)
|
||||||
|
|
||||||
步骤:
|
步骤:
|
||||||
1. 从Zep图谱读取并过滤实体
|
1. 检查是否已有完成的准备工作
|
||||||
2. 为每个实体生成OASIS Agent Profile(带重试机制)
|
2. 从Zep图谱读取并过滤实体
|
||||||
3. LLM智能生成模拟配置(带重试机制)
|
3. 为每个实体生成OASIS Agent Profile(带重试机制)
|
||||||
4. 保存配置文件和预设脚本
|
4. LLM智能生成模拟配置(带重试机制)
|
||||||
|
5. 保存配置文件和预设脚本
|
||||||
|
|
||||||
请求(JSON):
|
请求(JSON):
|
||||||
{
|
{
|
||||||
"simulation_id": "sim_xxxx", // 必填,模拟ID
|
"simulation_id": "sim_xxxx", // 必填,模拟ID
|
||||||
"entity_types": ["Student", "PublicFigure"], // 可选,指定实体类型
|
"entity_types": ["Student", "PublicFigure"], // 可选,指定实体类型
|
||||||
"use_llm_for_profiles": true // 可选,是否用LLM生成人设
|
"use_llm_for_profiles": true, // 可选,是否用LLM生成人设
|
||||||
|
"parallel_profile_count": 5, // 可选,并行生成人设数量,默认5
|
||||||
|
"force_regenerate": false // 可选,强制重新生成,默认false
|
||||||
}
|
}
|
||||||
|
|
||||||
返回:
|
返回:
|
||||||
|
|
@ -239,14 +353,17 @@ def prepare_simulation():
|
||||||
"success": true,
|
"success": true,
|
||||||
"data": {
|
"data": {
|
||||||
"simulation_id": "sim_xxxx",
|
"simulation_id": "sim_xxxx",
|
||||||
"task_id": "task_xxxx",
|
"task_id": "task_xxxx", // 新任务时返回
|
||||||
"status": "preparing",
|
"status": "preparing|ready",
|
||||||
"message": "准备任务已启动"
|
"message": "准备任务已启动|已有完成的准备工作",
|
||||||
|
"already_prepared": true|false // 是否已准备完成
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
"""
|
"""
|
||||||
import threading
|
import threading
|
||||||
|
import os
|
||||||
from ..models.task import TaskManager, TaskStatus
|
from ..models.task import TaskManager, TaskStatus
|
||||||
|
from ..config import Config
|
||||||
|
|
||||||
try:
|
try:
|
||||||
data = request.get_json() or {}
|
data = request.get_json() or {}
|
||||||
|
|
@ -267,6 +384,25 @@ def prepare_simulation():
|
||||||
"error": f"模拟不存在: {simulation_id}"
|
"error": f"模拟不存在: {simulation_id}"
|
||||||
}), 404
|
}), 404
|
||||||
|
|
||||||
|
# 检查是否强制重新生成
|
||||||
|
force_regenerate = data.get('force_regenerate', False)
|
||||||
|
|
||||||
|
# 检查是否已经准备完成(避免重复生成)
|
||||||
|
if not force_regenerate:
|
||||||
|
is_prepared, prepare_info = _check_simulation_prepared(simulation_id)
|
||||||
|
if is_prepared:
|
||||||
|
logger.info(f"模拟 {simulation_id} 已准备完成,跳过重复生成")
|
||||||
|
return jsonify({
|
||||||
|
"success": True,
|
||||||
|
"data": {
|
||||||
|
"simulation_id": simulation_id,
|
||||||
|
"status": "ready",
|
||||||
|
"message": "已有完成的准备工作,无需重复生成",
|
||||||
|
"already_prepared": True,
|
||||||
|
"prepare_info": prepare_info
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
# 从项目获取必要信息
|
# 从项目获取必要信息
|
||||||
project = ProjectManager.get_project(state.project_id)
|
project = ProjectManager.get_project(state.project_id)
|
||||||
if not project:
|
if not project:
|
||||||
|
|
@ -288,6 +424,7 @@ def prepare_simulation():
|
||||||
|
|
||||||
entity_types_list = data.get('entity_types')
|
entity_types_list = data.get('entity_types')
|
||||||
use_llm_for_profiles = data.get('use_llm_for_profiles', True)
|
use_llm_for_profiles = data.get('use_llm_for_profiles', True)
|
||||||
|
parallel_profile_count = data.get('parallel_profile_count', 5)
|
||||||
|
|
||||||
# 创建异步任务
|
# 创建异步任务
|
||||||
task_manager = TaskManager()
|
task_manager = TaskManager()
|
||||||
|
|
@ -384,7 +521,8 @@ def prepare_simulation():
|
||||||
document_text=document_text,
|
document_text=document_text,
|
||||||
defined_entity_types=entity_types_list,
|
defined_entity_types=entity_types_list,
|
||||||
use_llm_for_profiles=use_llm_for_profiles,
|
use_llm_for_profiles=use_llm_for_profiles,
|
||||||
progress_callback=progress_callback
|
progress_callback=progress_callback,
|
||||||
|
parallel_profile_count=parallel_profile_count
|
||||||
)
|
)
|
||||||
|
|
||||||
# 任务完成
|
# 任务完成
|
||||||
|
|
@ -414,7 +552,8 @@ def prepare_simulation():
|
||||||
"simulation_id": simulation_id,
|
"simulation_id": simulation_id,
|
||||||
"task_id": task_id,
|
"task_id": task_id,
|
||||||
"status": "preparing",
|
"status": "preparing",
|
||||||
"message": "准备任务已启动,请通过 /api/simulation/prepare/status 查询进度"
|
"message": "准备任务已启动,请通过 /api/simulation/prepare/status 查询进度",
|
||||||
|
"already_prepared": False
|
||||||
}
|
}
|
||||||
})
|
})
|
||||||
|
|
||||||
|
|
@ -438,9 +577,14 @@ def get_prepare_status():
|
||||||
"""
|
"""
|
||||||
查询准备任务进度
|
查询准备任务进度
|
||||||
|
|
||||||
|
支持两种查询方式:
|
||||||
|
1. 通过task_id查询正在进行的任务进度
|
||||||
|
2. 通过simulation_id检查是否已有完成的准备工作
|
||||||
|
|
||||||
请求(JSON):
|
请求(JSON):
|
||||||
{
|
{
|
||||||
"task_id": "task_xxxx" // 必填,prepare返回的task_id
|
"task_id": "task_xxxx", // 可选,prepare返回的task_id
|
||||||
|
"simulation_id": "sim_xxxx" // 可选,模拟ID(用于检查已完成的准备)
|
||||||
}
|
}
|
||||||
|
|
||||||
返回:
|
返回:
|
||||||
|
|
@ -448,21 +592,11 @@ def get_prepare_status():
|
||||||
"success": true,
|
"success": true,
|
||||||
"data": {
|
"data": {
|
||||||
"task_id": "task_xxxx",
|
"task_id": "task_xxxx",
|
||||||
"status": "processing", // pending/processing/completed/failed
|
"status": "processing|completed|ready",
|
||||||
"progress": 45, // 0-100 总进度
|
"progress": 45,
|
||||||
"message": "[2/4] 生成Agent人设: 35/93 - 生成 教授张三 的人设...",
|
"message": "...",
|
||||||
"progress_detail": { // 详细进度信息
|
"already_prepared": true|false, // 是否已有完成的准备
|
||||||
"current_stage": "generating_profiles",
|
"prepare_info": {...} // 已准备完成时的详细信息
|
||||||
"current_stage_name": "生成Agent人设",
|
|
||||||
"stage_index": 2, // 当前阶段序号
|
|
||||||
"total_stages": 4, // 总阶段数
|
|
||||||
"stage_progress": 38, // 阶段内进度 0-100
|
|
||||||
"current_item": 35, // 当前处理项目序号
|
|
||||||
"total_items": 93, // 当前阶段总项目数
|
|
||||||
"item_description": "生成 教授张三 的人设..."
|
|
||||||
},
|
|
||||||
"result": null, // 完成后返回结果
|
|
||||||
"error": null // 失败时返回错误信息
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
"""
|
"""
|
||||||
|
|
@ -472,24 +606,75 @@ def get_prepare_status():
|
||||||
data = request.get_json() or {}
|
data = request.get_json() or {}
|
||||||
|
|
||||||
task_id = data.get('task_id')
|
task_id = data.get('task_id')
|
||||||
|
simulation_id = data.get('simulation_id')
|
||||||
|
|
||||||
|
# 如果提供了simulation_id,先检查是否已准备完成
|
||||||
|
if simulation_id:
|
||||||
|
is_prepared, prepare_info = _check_simulation_prepared(simulation_id)
|
||||||
|
if is_prepared:
|
||||||
|
return jsonify({
|
||||||
|
"success": True,
|
||||||
|
"data": {
|
||||||
|
"simulation_id": simulation_id,
|
||||||
|
"status": "ready",
|
||||||
|
"progress": 100,
|
||||||
|
"message": "已有完成的准备工作",
|
||||||
|
"already_prepared": True,
|
||||||
|
"prepare_info": prepare_info
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
# 如果没有task_id,返回错误
|
||||||
if not task_id:
|
if not task_id:
|
||||||
|
if simulation_id:
|
||||||
|
# 有simulation_id但未准备完成
|
||||||
|
return jsonify({
|
||||||
|
"success": True,
|
||||||
|
"data": {
|
||||||
|
"simulation_id": simulation_id,
|
||||||
|
"status": "not_started",
|
||||||
|
"progress": 0,
|
||||||
|
"message": "尚未开始准备,请调用 /api/simulation/prepare 开始",
|
||||||
|
"already_prepared": False
|
||||||
|
}
|
||||||
|
})
|
||||||
return jsonify({
|
return jsonify({
|
||||||
"success": False,
|
"success": False,
|
||||||
"error": "请提供 task_id"
|
"error": "请提供 task_id 或 simulation_id"
|
||||||
}), 400
|
}), 400
|
||||||
|
|
||||||
task_manager = TaskManager()
|
task_manager = TaskManager()
|
||||||
task = task_manager.get_task(task_id)
|
task = task_manager.get_task(task_id)
|
||||||
|
|
||||||
if not task:
|
if not task:
|
||||||
|
# 任务不存在,但如果有simulation_id,检查是否已准备完成
|
||||||
|
if simulation_id:
|
||||||
|
is_prepared, prepare_info = _check_simulation_prepared(simulation_id)
|
||||||
|
if is_prepared:
|
||||||
|
return jsonify({
|
||||||
|
"success": True,
|
||||||
|
"data": {
|
||||||
|
"simulation_id": simulation_id,
|
||||||
|
"task_id": task_id,
|
||||||
|
"status": "ready",
|
||||||
|
"progress": 100,
|
||||||
|
"message": "任务已完成(准备工作已存在)",
|
||||||
|
"already_prepared": True,
|
||||||
|
"prepare_info": prepare_info
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
return jsonify({
|
return jsonify({
|
||||||
"success": False,
|
"success": False,
|
||||||
"error": f"任务不存在: {task_id}"
|
"error": f"任务不存在: {task_id}"
|
||||||
}), 404
|
}), 404
|
||||||
|
|
||||||
|
task_dict = task.to_dict()
|
||||||
|
task_dict["already_prepared"] = False
|
||||||
|
|
||||||
return jsonify({
|
return jsonify({
|
||||||
"success": True,
|
"success": True,
|
||||||
"data": task.to_dict()
|
"data": task_dict
|
||||||
})
|
})
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
|
|
|
||||||
|
|
@ -1,6 +1,11 @@
|
||||||
"""
|
"""
|
||||||
OASIS Agent Profile生成器
|
OASIS Agent Profile生成器
|
||||||
将Zep图谱中的实体转换为OASIS模拟平台所需的Agent Profile格式
|
将Zep图谱中的实体转换为OASIS模拟平台所需的Agent Profile格式
|
||||||
|
|
||||||
|
优化改进:
|
||||||
|
1. 调用Zep检索功能二次丰富节点信息
|
||||||
|
2. 优化提示词生成非常详细的人设
|
||||||
|
3. 区分个人实体和抽象群体实体
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import json
|
import json
|
||||||
|
|
@ -10,6 +15,7 @@ from dataclasses import dataclass, field
|
||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
|
|
||||||
from openai import OpenAI
|
from openai import OpenAI
|
||||||
|
from zep_cloud.client import Zep
|
||||||
|
|
||||||
from ..config import Config
|
from ..config import Config
|
||||||
from ..utils.logger import get_logger
|
from ..utils.logger import get_logger
|
||||||
|
|
@ -137,6 +143,11 @@ class OasisProfileGenerator:
|
||||||
OASIS Profile生成器
|
OASIS Profile生成器
|
||||||
|
|
||||||
将Zep图谱中的实体转换为OASIS模拟所需的Agent Profile
|
将Zep图谱中的实体转换为OASIS模拟所需的Agent Profile
|
||||||
|
|
||||||
|
优化特性:
|
||||||
|
1. 调用Zep图谱检索功能获取更丰富的上下文
|
||||||
|
2. 生成非常详细的人设(包括基本信息、职业经历、性格特征、社交媒体行为等)
|
||||||
|
3. 区分个人实体和抽象群体实体
|
||||||
"""
|
"""
|
||||||
|
|
||||||
# MBTI类型列表
|
# MBTI类型列表
|
||||||
|
|
@ -153,11 +164,25 @@ class OasisProfileGenerator:
|
||||||
"Canada", "Australia", "Brazil", "India", "South Korea"
|
"Canada", "Australia", "Brazil", "India", "South Korea"
|
||||||
]
|
]
|
||||||
|
|
||||||
|
# 个人类型实体(需要生成具体人设)
|
||||||
|
INDIVIDUAL_ENTITY_TYPES = [
|
||||||
|
"student", "alumni", "professor", "person", "publicfigure",
|
||||||
|
"expert", "faculty", "official", "journalist", "activist"
|
||||||
|
]
|
||||||
|
|
||||||
|
# 群体/机构类型实体(需要生成群体代表人设)
|
||||||
|
GROUP_ENTITY_TYPES = [
|
||||||
|
"university", "governmentagency", "organization", "ngo",
|
||||||
|
"mediaoutlet", "company", "institution", "group", "community"
|
||||||
|
]
|
||||||
|
|
||||||
def __init__(
|
def __init__(
|
||||||
self,
|
self,
|
||||||
api_key: Optional[str] = None,
|
api_key: Optional[str] = None,
|
||||||
base_url: Optional[str] = None,
|
base_url: Optional[str] = None,
|
||||||
model_name: Optional[str] = None
|
model_name: Optional[str] = None,
|
||||||
|
zep_api_key: Optional[str] = None,
|
||||||
|
graph_id: Optional[str] = None
|
||||||
):
|
):
|
||||||
self.api_key = api_key or Config.LLM_API_KEY
|
self.api_key = api_key or Config.LLM_API_KEY
|
||||||
self.base_url = base_url or Config.LLM_BASE_URL
|
self.base_url = base_url or Config.LLM_BASE_URL
|
||||||
|
|
@ -170,6 +195,17 @@ class OasisProfileGenerator:
|
||||||
api_key=self.api_key,
|
api_key=self.api_key,
|
||||||
base_url=self.base_url
|
base_url=self.base_url
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# Zep客户端用于检索丰富上下文
|
||||||
|
self.zep_api_key = zep_api_key or Config.ZEP_API_KEY
|
||||||
|
self.zep_client = None
|
||||||
|
self.graph_id = graph_id
|
||||||
|
|
||||||
|
if self.zep_api_key:
|
||||||
|
try:
|
||||||
|
self.zep_client = Zep(api_key=self.zep_api_key)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Zep客户端初始化失败: {e}")
|
||||||
|
|
||||||
def generate_profile_from_entity(
|
def generate_profile_from_entity(
|
||||||
self,
|
self,
|
||||||
|
|
@ -245,28 +281,195 @@ class OasisProfileGenerator:
|
||||||
suffix = random.randint(100, 999)
|
suffix = random.randint(100, 999)
|
||||||
return f"{username}_{suffix}"
|
return f"{username}_{suffix}"
|
||||||
|
|
||||||
|
def _search_zep_for_entity(self, entity: EntityNode) -> Dict[str, Any]:
|
||||||
|
"""
|
||||||
|
使用Zep图谱混合搜索功能获取实体相关的丰富信息
|
||||||
|
|
||||||
|
Zep没有内置混合搜索接口,需要分别搜索edges和nodes然后合并结果。
|
||||||
|
使用并行请求同时搜索,提高效率。
|
||||||
|
|
||||||
|
Args:
|
||||||
|
entity: 实体节点对象
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
包含facts, node_summaries, context的字典
|
||||||
|
"""
|
||||||
|
import concurrent.futures
|
||||||
|
|
||||||
|
if not self.zep_client:
|
||||||
|
return {"facts": [], "node_summaries": [], "context": ""}
|
||||||
|
|
||||||
|
entity_name = entity.name
|
||||||
|
|
||||||
|
results = {
|
||||||
|
"facts": [],
|
||||||
|
"node_summaries": [],
|
||||||
|
"context": ""
|
||||||
|
}
|
||||||
|
|
||||||
|
# 必须有graph_id才能进行搜索
|
||||||
|
if not self.graph_id:
|
||||||
|
logger.debug(f"跳过Zep检索:未设置graph_id")
|
||||||
|
return results
|
||||||
|
|
||||||
|
comprehensive_query = f"关于{entity_name}的所有信息、活动、事件、关系和背景"
|
||||||
|
|
||||||
|
def search_edges():
|
||||||
|
"""搜索边(事实/关系)"""
|
||||||
|
try:
|
||||||
|
return self.zep_client.graph.search(
|
||||||
|
query=comprehensive_query,
|
||||||
|
graph_id=self.graph_id,
|
||||||
|
limit=30,
|
||||||
|
scope="edges",
|
||||||
|
reranker="rrf"
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug(f"Zep边搜索失败: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
def search_nodes():
|
||||||
|
"""搜索节点(实体摘要)"""
|
||||||
|
try:
|
||||||
|
return self.zep_client.graph.search(
|
||||||
|
query=comprehensive_query,
|
||||||
|
graph_id=self.graph_id,
|
||||||
|
limit=20,
|
||||||
|
scope="nodes",
|
||||||
|
reranker="rrf"
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug(f"Zep节点搜索失败: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
try:
|
||||||
|
# 并行执行edges和nodes搜索
|
||||||
|
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
|
||||||
|
edge_future = executor.submit(search_edges)
|
||||||
|
node_future = executor.submit(search_nodes)
|
||||||
|
|
||||||
|
# 获取结果
|
||||||
|
edge_result = edge_future.result(timeout=30)
|
||||||
|
node_result = node_future.result(timeout=30)
|
||||||
|
|
||||||
|
# 处理边搜索结果
|
||||||
|
all_facts = set()
|
||||||
|
if edge_result and hasattr(edge_result, 'edges') and edge_result.edges:
|
||||||
|
for edge in edge_result.edges:
|
||||||
|
if hasattr(edge, 'fact') and edge.fact:
|
||||||
|
all_facts.add(edge.fact)
|
||||||
|
results["facts"] = list(all_facts)
|
||||||
|
|
||||||
|
# 处理节点搜索结果
|
||||||
|
all_summaries = set()
|
||||||
|
if node_result and hasattr(node_result, 'nodes') and node_result.nodes:
|
||||||
|
for node in node_result.nodes:
|
||||||
|
if hasattr(node, 'summary') and node.summary:
|
||||||
|
all_summaries.add(node.summary)
|
||||||
|
if hasattr(node, 'name') and node.name and node.name != entity_name:
|
||||||
|
all_summaries.add(f"相关实体: {node.name}")
|
||||||
|
results["node_summaries"] = list(all_summaries)
|
||||||
|
|
||||||
|
# 构建综合上下文
|
||||||
|
context_parts = []
|
||||||
|
if results["facts"]:
|
||||||
|
context_parts.append("事实信息:\n" + "\n".join(f"- {f}" for f in results["facts"][:20]))
|
||||||
|
if results["node_summaries"]:
|
||||||
|
context_parts.append("相关实体:\n" + "\n".join(f"- {s}" for s in results["node_summaries"][:10]))
|
||||||
|
results["context"] = "\n\n".join(context_parts)
|
||||||
|
|
||||||
|
logger.info(f"Zep混合检索完成: {entity_name}, 获取 {len(results['facts'])} 条事实, {len(results['node_summaries'])} 个相关节点")
|
||||||
|
|
||||||
|
except concurrent.futures.TimeoutError:
|
||||||
|
logger.warning(f"Zep检索超时 ({entity_name})")
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Zep检索失败 ({entity_name}): {e}")
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
def _build_entity_context(self, entity: EntityNode) -> str:
|
def _build_entity_context(self, entity: EntityNode) -> str:
|
||||||
"""构建实体的上下文信息"""
|
"""
|
||||||
|
构建实体的完整上下文信息
|
||||||
|
|
||||||
|
包括:
|
||||||
|
1. 实体本身的边信息(事实)
|
||||||
|
2. 关联节点的详细信息
|
||||||
|
3. Zep混合检索到的丰富信息
|
||||||
|
"""
|
||||||
context_parts = []
|
context_parts = []
|
||||||
|
|
||||||
# 添加相关边信息
|
# 1. 添加实体属性信息
|
||||||
|
if entity.attributes:
|
||||||
|
attrs = []
|
||||||
|
for key, value in entity.attributes.items():
|
||||||
|
if value and str(value).strip():
|
||||||
|
attrs.append(f"- {key}: {value}")
|
||||||
|
if attrs:
|
||||||
|
context_parts.append("### 实体属性\n" + "\n".join(attrs))
|
||||||
|
|
||||||
|
# 2. 添加相关边信息(事实/关系)
|
||||||
|
existing_facts = set()
|
||||||
if entity.related_edges:
|
if entity.related_edges:
|
||||||
relationships = []
|
relationships = []
|
||||||
for edge in entity.related_edges[:10]: # 最多取10条
|
for edge in entity.related_edges: # 不限制数量
|
||||||
if edge.get("fact"):
|
fact = edge.get("fact", "")
|
||||||
relationships.append(edge["fact"])
|
edge_name = edge.get("edge_name", "")
|
||||||
|
direction = edge.get("direction", "")
|
||||||
|
|
||||||
|
if fact:
|
||||||
|
relationships.append(f"- {fact}")
|
||||||
|
existing_facts.add(fact)
|
||||||
|
elif edge_name:
|
||||||
|
if direction == "outgoing":
|
||||||
|
relationships.append(f"- {entity.name} --[{edge_name}]--> (相关实体)")
|
||||||
|
else:
|
||||||
|
relationships.append(f"- (相关实体) --[{edge_name}]--> {entity.name}")
|
||||||
|
|
||||||
if relationships:
|
if relationships:
|
||||||
context_parts.append("Related facts:\n" + "\n".join(f"- {r}" for r in relationships))
|
context_parts.append("### 相关事实和关系\n" + "\n".join(relationships))
|
||||||
|
|
||||||
# 添加关联节点信息
|
# 3. 添加关联节点的详细信息
|
||||||
if entity.related_nodes:
|
if entity.related_nodes:
|
||||||
related_names = [n["name"] for n in entity.related_nodes[:5]]
|
related_info = []
|
||||||
if related_names:
|
for node in entity.related_nodes: # 不限制数量
|
||||||
context_parts.append(f"Related to: {', '.join(related_names)}")
|
node_name = node.get("name", "")
|
||||||
|
node_labels = node.get("labels", [])
|
||||||
|
node_summary = node.get("summary", "")
|
||||||
|
|
||||||
|
# 过滤掉默认标签
|
||||||
|
custom_labels = [l for l in node_labels if l not in ["Entity", "Node"]]
|
||||||
|
label_str = f" ({', '.join(custom_labels)})" if custom_labels else ""
|
||||||
|
|
||||||
|
if node_summary:
|
||||||
|
related_info.append(f"- **{node_name}**{label_str}: {node_summary}")
|
||||||
|
else:
|
||||||
|
related_info.append(f"- **{node_name}**{label_str}")
|
||||||
|
|
||||||
|
if related_info:
|
||||||
|
context_parts.append("### 关联实体信息\n" + "\n".join(related_info))
|
||||||
|
|
||||||
|
# 4. 使用Zep混合检索获取更丰富的信息
|
||||||
|
zep_results = self._search_zep_for_entity(entity)
|
||||||
|
|
||||||
|
if zep_results.get("facts"):
|
||||||
|
# 去重:排除已存在的事实
|
||||||
|
new_facts = [f for f in zep_results["facts"] if f not in existing_facts]
|
||||||
|
if new_facts:
|
||||||
|
context_parts.append("### Zep检索到的事实信息\n" + "\n".join(f"- {f}" for f in new_facts[:15]))
|
||||||
|
|
||||||
|
if zep_results.get("node_summaries"):
|
||||||
|
context_parts.append("### Zep检索到的相关节点\n" + "\n".join(f"- {s}" for s in zep_results["node_summaries"][:10]))
|
||||||
|
|
||||||
return "\n\n".join(context_parts)
|
return "\n\n".join(context_parts)
|
||||||
|
|
||||||
|
def _is_individual_entity(self, entity_type: str) -> bool:
|
||||||
|
"""判断是否是个人类型实体"""
|
||||||
|
return entity_type.lower() in self.INDIVIDUAL_ENTITY_TYPES
|
||||||
|
|
||||||
|
def _is_group_entity(self, entity_type: str) -> bool:
|
||||||
|
"""判断是否是群体/机构类型实体"""
|
||||||
|
return entity_type.lower() in self.GROUP_ENTITY_TYPES
|
||||||
|
|
||||||
def _generate_profile_with_llm(
|
def _generate_profile_with_llm(
|
||||||
self,
|
self,
|
||||||
entity_name: str,
|
entity_name: str,
|
||||||
|
|
@ -275,63 +478,271 @@ class OasisProfileGenerator:
|
||||||
entity_attributes: Dict[str, Any],
|
entity_attributes: Dict[str, Any],
|
||||||
context: str
|
context: str
|
||||||
) -> Dict[str, Any]:
|
) -> Dict[str, Any]:
|
||||||
"""使用LLM生成详细人设"""
|
"""
|
||||||
|
使用LLM生成非常详细的人设
|
||||||
|
|
||||||
prompt = f"""Based on the following entity information, generate a detailed social media user profile for simulation purposes.
|
根据实体类型区分:
|
||||||
|
- 个人实体:生成具体的人物设定
|
||||||
|
- 群体/机构实体:生成代表性账号设定
|
||||||
|
"""
|
||||||
|
|
||||||
|
is_individual = self._is_individual_entity(entity_type)
|
||||||
|
|
||||||
|
if is_individual:
|
||||||
|
prompt = self._build_individual_persona_prompt(
|
||||||
|
entity_name, entity_type, entity_summary, entity_attributes, context
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
prompt = self._build_group_persona_prompt(
|
||||||
|
entity_name, entity_type, entity_summary, entity_attributes, context
|
||||||
|
)
|
||||||
|
|
||||||
Entity Information:
|
# 尝试多次生成,直到成功或达到最大重试次数
|
||||||
- Name: {entity_name}
|
max_attempts = 3
|
||||||
- Type: {entity_type}
|
last_error = None
|
||||||
- Summary: {entity_summary}
|
|
||||||
- Attributes: {json.dumps(entity_attributes, ensure_ascii=False)}
|
for attempt in range(max_attempts):
|
||||||
|
try:
|
||||||
Context:
|
response = self.client.chat.completions.create(
|
||||||
{context}
|
|
||||||
|
|
||||||
Generate a JSON object with the following fields:
|
|
||||||
{{
|
|
||||||
"bio": "A short bio (max 150 chars) suitable for social media",
|
|
||||||
"persona": "A detailed persona description (2-3 sentences) describing personality, interests, and behavior patterns",
|
|
||||||
"age": <integer between 18-65, or null if not applicable>,
|
|
||||||
"gender": "<male/female/other, or null if not applicable>",
|
|
||||||
"mbti": "<MBTI type like INTJ, ENFP, etc., or null>",
|
|
||||||
"country": "<country name, or null>",
|
|
||||||
"profession": "<profession/occupation, or null>",
|
|
||||||
"interested_topics": ["topic1", "topic2", ...]
|
|
||||||
}}
|
|
||||||
|
|
||||||
Important:
|
|
||||||
- The profile should be consistent with the entity type and context
|
|
||||||
- Make the persona feel realistic and suitable for social media simulation
|
|
||||||
- If the entity is an organization, institution, or non-person, adapt the profile accordingly (e.g., as an official account)
|
|
||||||
- Return ONLY the JSON object, no additional text"""
|
|
||||||
|
|
||||||
try:
|
|
||||||
# 使用重试机制调用LLM API
|
|
||||||
from ..utils.retry import RetryableAPIClient
|
|
||||||
|
|
||||||
retry_client = RetryableAPIClient(max_retries=3, initial_delay=1.0)
|
|
||||||
|
|
||||||
def call_llm():
|
|
||||||
return self.client.chat.completions.create(
|
|
||||||
model=self.model_name,
|
model=self.model_name,
|
||||||
messages=[
|
messages=[
|
||||||
{"role": "system", "content": "You are a profile generator for social media simulation. Generate realistic user profiles based on entity information."},
|
{"role": "system", "content": self._get_system_prompt(is_individual)},
|
||||||
{"role": "user", "content": prompt}
|
{"role": "user", "content": prompt}
|
||||||
],
|
],
|
||||||
response_format={"type": "json_object"},
|
response_format={"type": "json_object"},
|
||||||
temperature=0.7
|
temperature=0.7 - (attempt * 0.1) # 每次重试降低温度
|
||||||
|
# 不设置max_tokens,让LLM自由发挥
|
||||||
)
|
)
|
||||||
|
|
||||||
|
content = response.choices[0].message.content
|
||||||
|
|
||||||
|
# 检查是否被截断(finish_reason不是'stop')
|
||||||
|
finish_reason = response.choices[0].finish_reason
|
||||||
|
if finish_reason == 'length':
|
||||||
|
logger.warning(f"LLM输出被截断 (attempt {attempt+1}), 尝试修复...")
|
||||||
|
content = self._fix_truncated_json(content)
|
||||||
|
|
||||||
|
# 尝试解析JSON
|
||||||
|
try:
|
||||||
|
result = json.loads(content)
|
||||||
|
|
||||||
|
# 验证必需字段
|
||||||
|
if "bio" not in result or not result["bio"]:
|
||||||
|
result["bio"] = entity_summary[:200] if entity_summary else f"{entity_type}: {entity_name}"
|
||||||
|
if "persona" not in result or not result["persona"]:
|
||||||
|
result["persona"] = entity_summary or f"{entity_name}是一个{entity_type}。"
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
except json.JSONDecodeError as je:
|
||||||
|
logger.warning(f"JSON解析失败 (attempt {attempt+1}): {str(je)[:80]}")
|
||||||
|
|
||||||
|
# 尝试修复JSON
|
||||||
|
result = self._try_fix_json(content, entity_name, entity_type, entity_summary)
|
||||||
|
if result.get("_fixed"):
|
||||||
|
del result["_fixed"]
|
||||||
|
return result
|
||||||
|
|
||||||
|
last_error = je
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"LLM调用失败 (attempt {attempt+1}): {str(e)[:80]}")
|
||||||
|
last_error = e
|
||||||
|
import time
|
||||||
|
time.sleep(1 * (attempt + 1)) # 指数退避
|
||||||
|
|
||||||
|
logger.warning(f"LLM生成人设失败({max_attempts}次尝试): {last_error}, 使用规则生成")
|
||||||
|
return self._generate_profile_rule_based(
|
||||||
|
entity_name, entity_type, entity_summary, entity_attributes
|
||||||
|
)
|
||||||
|
|
||||||
|
def _fix_truncated_json(self, content: str) -> str:
|
||||||
|
"""修复被截断的JSON(输出被max_tokens限制截断)"""
|
||||||
|
import re
|
||||||
|
|
||||||
|
# 如果JSON被截断,尝试闭合它
|
||||||
|
content = content.strip()
|
||||||
|
|
||||||
|
# 计算未闭合的括号
|
||||||
|
open_braces = content.count('{') - content.count('}')
|
||||||
|
open_brackets = content.count('[') - content.count(']')
|
||||||
|
|
||||||
|
# 检查是否有未闭合的字符串
|
||||||
|
# 简单检查:如果最后一个引号后没有逗号或闭合括号,可能是字符串被截断
|
||||||
|
if content and content[-1] not in '",}]':
|
||||||
|
# 尝试闭合字符串
|
||||||
|
content += '"'
|
||||||
|
|
||||||
|
# 闭合括号
|
||||||
|
content += ']' * open_brackets
|
||||||
|
content += '}' * open_braces
|
||||||
|
|
||||||
|
return content
|
||||||
|
|
||||||
|
def _try_fix_json(self, content: str, entity_name: str, entity_type: str, entity_summary: str = "") -> Dict[str, Any]:
|
||||||
|
"""尝试修复损坏的JSON"""
|
||||||
|
import re
|
||||||
|
|
||||||
|
# 1. 首先尝试修复被截断的情况
|
||||||
|
content = self._fix_truncated_json(content)
|
||||||
|
|
||||||
|
# 2. 尝试提取JSON部分
|
||||||
|
json_match = re.search(r'\{[\s\S]*\}', content)
|
||||||
|
if json_match:
|
||||||
|
json_str = json_match.group()
|
||||||
|
|
||||||
response = retry_client.call_with_retry(call_llm)
|
# 3. 处理字符串中的换行符问题
|
||||||
result = json.loads(response.choices[0].message.content)
|
# 找到所有字符串值并替换其中的换行符
|
||||||
return result
|
def fix_string_newlines(match):
|
||||||
|
s = match.group(0)
|
||||||
|
# 替换字符串内的实际换行符为空格
|
||||||
|
s = s.replace('\n', ' ').replace('\r', ' ')
|
||||||
|
# 替换多余空格
|
||||||
|
s = re.sub(r'\s+', ' ', s)
|
||||||
|
return s
|
||||||
|
|
||||||
except Exception as e:
|
# 匹配JSON字符串值
|
||||||
logger.warning(f"LLM生成人设失败(已重试): {str(e)}, 使用规则生成")
|
json_str = re.sub(r'"[^"\\]*(?:\\.[^"\\]*)*"', fix_string_newlines, json_str)
|
||||||
return self._generate_profile_rule_based(
|
|
||||||
entity_name, entity_type, entity_summary, entity_attributes
|
# 4. 尝试解析
|
||||||
)
|
try:
|
||||||
|
result = json.loads(json_str)
|
||||||
|
result["_fixed"] = True
|
||||||
|
return result
|
||||||
|
except json.JSONDecodeError as e:
|
||||||
|
# 5. 如果还是失败,尝试更激进的修复
|
||||||
|
try:
|
||||||
|
# 移除所有控制字符
|
||||||
|
json_str = re.sub(r'[\x00-\x1f\x7f-\x9f]', ' ', json_str)
|
||||||
|
# 替换所有连续空白
|
||||||
|
json_str = re.sub(r'\s+', ' ', json_str)
|
||||||
|
result = json.loads(json_str)
|
||||||
|
result["_fixed"] = True
|
||||||
|
return result
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
|
||||||
|
# 6. 尝试从内容中提取部分信息
|
||||||
|
bio_match = re.search(r'"bio"\s*:\s*"([^"]*)"', content)
|
||||||
|
persona_match = re.search(r'"persona"\s*:\s*"([^"]*)', content) # 可能被截断
|
||||||
|
|
||||||
|
bio = bio_match.group(1) if bio_match else (entity_summary[:200] if entity_summary else f"{entity_type}: {entity_name}")
|
||||||
|
persona = persona_match.group(1) if persona_match else (entity_summary or f"{entity_name}是一个{entity_type}。")
|
||||||
|
|
||||||
|
# 如果提取到了有意义的内容,标记为已修复
|
||||||
|
if bio_match or persona_match:
|
||||||
|
logger.info(f"从损坏的JSON中提取了部分信息")
|
||||||
|
return {
|
||||||
|
"bio": bio,
|
||||||
|
"persona": persona,
|
||||||
|
"_fixed": True
|
||||||
|
}
|
||||||
|
|
||||||
|
# 7. 完全失败,返回基础结构
|
||||||
|
logger.warning(f"JSON修复失败,返回基础结构")
|
||||||
|
return {
|
||||||
|
"bio": entity_summary[:200] if entity_summary else f"{entity_type}: {entity_name}",
|
||||||
|
"persona": entity_summary or f"{entity_name}是一个{entity_type}。"
|
||||||
|
}
|
||||||
|
|
||||||
|
def _get_system_prompt(self, is_individual: bool) -> str:
|
||||||
|
"""获取系统提示词"""
|
||||||
|
base_prompt = "你是社交媒体用户画像生成专家。生成详细、真实的人设用于舆论模拟,最大程度还原已有现实情况。必须返回有效的JSON格式,所有字符串值不能包含未转义的换行符。使用中文。"
|
||||||
|
return base_prompt
|
||||||
|
|
||||||
|
def _build_individual_persona_prompt(
|
||||||
|
self,
|
||||||
|
entity_name: str,
|
||||||
|
entity_type: str,
|
||||||
|
entity_summary: str,
|
||||||
|
entity_attributes: Dict[str, Any],
|
||||||
|
context: str
|
||||||
|
) -> str:
|
||||||
|
"""构建个人实体的详细人设提示词"""
|
||||||
|
|
||||||
|
attrs_str = json.dumps(entity_attributes, ensure_ascii=False) if entity_attributes else "无"
|
||||||
|
context_str = context[:3000] if context else "无额外上下文"
|
||||||
|
|
||||||
|
return f"""为实体生成详细的社交媒体用户人设,最大程度还原已有现实情况。
|
||||||
|
|
||||||
|
实体名称: {entity_name}
|
||||||
|
实体类型: {entity_type}
|
||||||
|
实体摘要: {entity_summary}
|
||||||
|
实体属性: {attrs_str}
|
||||||
|
|
||||||
|
上下文信息:
|
||||||
|
{context_str}
|
||||||
|
|
||||||
|
请生成JSON,包含以下字段:
|
||||||
|
|
||||||
|
1. bio: 社交媒体简介,200字
|
||||||
|
2. persona: 详细人设描述(2000字的纯文本),需包含:
|
||||||
|
- 基本信息(年龄、职业、教育背景、所在地)
|
||||||
|
- 人物背景(重要经历、与事件的关联、社会关系)
|
||||||
|
- 性格特征(MBTI类型、核心性格、情绪表达方式)
|
||||||
|
- 社交媒体行为(发帖频率、内容偏好、互动风格、语言特点)
|
||||||
|
- 立场观点(对话题的态度、可能被激怒/感动的内容)
|
||||||
|
- 独特特征(口头禅、特殊经历、个人爱好)
|
||||||
|
- 个人记忆(人设的重要部分,要介绍这个个体与事件的关联,以及这个个体在事件中的已有动作与反应)
|
||||||
|
3. age: 年龄数字
|
||||||
|
4. gender: 性别(男/女)
|
||||||
|
5. mbti: MBTI类型
|
||||||
|
6. country: 国家
|
||||||
|
7. profession: 职业
|
||||||
|
8. interested_topics: 感兴趣话题数组
|
||||||
|
|
||||||
|
重要:
|
||||||
|
- 所有字段值必须是字符串或数字,不要使用换行符
|
||||||
|
- persona必须是一段连贯的文字描述
|
||||||
|
- 使用中文
|
||||||
|
- 内容要与实体信息保持一致"""
|
||||||
|
|
||||||
|
def _build_group_persona_prompt(
|
||||||
|
self,
|
||||||
|
entity_name: str,
|
||||||
|
entity_type: str,
|
||||||
|
entity_summary: str,
|
||||||
|
entity_attributes: Dict[str, Any],
|
||||||
|
context: str
|
||||||
|
) -> str:
|
||||||
|
"""构建群体/机构实体的详细人设提示词"""
|
||||||
|
|
||||||
|
attrs_str = json.dumps(entity_attributes, ensure_ascii=False) if entity_attributes else "无"
|
||||||
|
context_str = context[:3000] if context else "无额外上下文"
|
||||||
|
|
||||||
|
return f"""为机构/群体实体生成详细的社交媒体账号设定,最大程度还原已有现实情况。
|
||||||
|
|
||||||
|
实体名称: {entity_name}
|
||||||
|
实体类型: {entity_type}
|
||||||
|
实体摘要: {entity_summary}
|
||||||
|
实体属性: {attrs_str}
|
||||||
|
|
||||||
|
上下文信息:
|
||||||
|
{context_str}
|
||||||
|
|
||||||
|
请生成JSON,包含以下字段:
|
||||||
|
|
||||||
|
1. bio: 官方账号简介,200字,专业得体
|
||||||
|
2. persona: 详细账号设定描述(2000字的纯文本),需包含:
|
||||||
|
- 机构基本信息(正式名称、机构性质、成立背景、主要职能)
|
||||||
|
- 账号定位(账号类型、目标受众、核心功能)
|
||||||
|
- 发言风格(语言特点、常用表达、禁忌话题)
|
||||||
|
- 发布内容特点(内容类型、发布频率、活跃时间段)
|
||||||
|
- 立场态度(对核心话题的官方立场、面对争议的处理方式)
|
||||||
|
- 特殊说明(代表的群体画像、运营习惯)
|
||||||
|
- 机构记忆(机构人设的重要部分,要介绍这个机构与事件的关联,以及这个机构在事件中的已有动作与反应)
|
||||||
|
3. age: null(机构不适用)
|
||||||
|
4. gender: null(机构不适用)
|
||||||
|
5. mbti: 可选,用于描述账号风格,如ISTJ代表严谨保守
|
||||||
|
6. country: 国家
|
||||||
|
7. profession: 机构职能描述
|
||||||
|
8. interested_topics: 关注领域数组
|
||||||
|
|
||||||
|
重要:
|
||||||
|
- 所有字段值必须是字符串、数字或null
|
||||||
|
- persona必须是一段连贯的文字描述,不要使用换行符
|
||||||
|
- 使用中文
|
||||||
|
- 机构账号发言要符合其身份定位"""
|
||||||
|
|
||||||
def _generate_profile_rule_based(
|
def _generate_profile_rule_based(
|
||||||
self,
|
self,
|
||||||
|
|
@ -398,29 +809,46 @@ Important:
|
||||||
"interested_topics": ["General", "Social Issues"],
|
"interested_topics": ["General", "Social Issues"],
|
||||||
}
|
}
|
||||||
|
|
||||||
|
def set_graph_id(self, graph_id: str):
|
||||||
|
"""设置图谱ID用于Zep检索"""
|
||||||
|
self.graph_id = graph_id
|
||||||
|
|
||||||
def generate_profiles_from_entities(
|
def generate_profiles_from_entities(
|
||||||
self,
|
self,
|
||||||
entities: List[EntityNode],
|
entities: List[EntityNode],
|
||||||
use_llm: bool = True,
|
use_llm: bool = True,
|
||||||
progress_callback: Optional[callable] = None
|
progress_callback: Optional[callable] = None,
|
||||||
|
graph_id: Optional[str] = None,
|
||||||
|
parallel_count: int = 5
|
||||||
) -> List[OasisAgentProfile]:
|
) -> List[OasisAgentProfile]:
|
||||||
"""
|
"""
|
||||||
批量从实体生成Agent Profile
|
批量从实体生成Agent Profile(支持并行生成)
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
entities: 实体列表
|
entities: 实体列表
|
||||||
use_llm: 是否使用LLM生成详细人设
|
use_llm: 是否使用LLM生成详细人设
|
||||||
progress_callback: 进度回调函数 (current, total, message)
|
progress_callback: 进度回调函数 (current, total, message)
|
||||||
|
graph_id: 图谱ID,用于Zep检索获取更丰富上下文
|
||||||
|
parallel_count: 并行生成数量,默认5
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
Agent Profile列表
|
Agent Profile列表
|
||||||
"""
|
"""
|
||||||
profiles = []
|
import concurrent.futures
|
||||||
total = len(entities)
|
from threading import Lock
|
||||||
|
|
||||||
for idx, entity in enumerate(entities):
|
# 设置graph_id用于Zep检索
|
||||||
if progress_callback:
|
if graph_id:
|
||||||
progress_callback(idx + 1, total, f"生成 {entity.name} 的人设...")
|
self.graph_id = graph_id
|
||||||
|
|
||||||
|
total = len(entities)
|
||||||
|
profiles = [None] * total # 预分配列表保持顺序
|
||||||
|
completed_count = [0] # 使用列表以便在闭包中修改
|
||||||
|
lock = Lock()
|
||||||
|
|
||||||
|
def generate_single_profile(idx: int, entity: EntityNode) -> tuple:
|
||||||
|
"""生成单个profile的工作函数"""
|
||||||
|
entity_type = entity.get_entity_type() or "Entity"
|
||||||
|
|
||||||
try:
|
try:
|
||||||
profile = self.generate_profile_from_entity(
|
profile = self.generate_profile_from_entity(
|
||||||
|
|
@ -428,23 +856,115 @@ Important:
|
||||||
user_id=idx,
|
user_id=idx,
|
||||||
use_llm=use_llm
|
use_llm=use_llm
|
||||||
)
|
)
|
||||||
profiles.append(profile)
|
|
||||||
|
# 实时输出生成的人设到控制台和日志
|
||||||
|
self._print_generated_profile(entity.name, entity_type, profile)
|
||||||
|
|
||||||
|
return idx, profile, None
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"生成实体 {entity.name} 的人设失败: {str(e)}")
|
logger.error(f"生成实体 {entity.name} 的人设失败: {str(e)}")
|
||||||
# 创建一个基础profile
|
# 创建一个基础profile
|
||||||
profiles.append(OasisAgentProfile(
|
fallback_profile = OasisAgentProfile(
|
||||||
user_id=idx,
|
user_id=idx,
|
||||||
user_name=self._generate_username(entity.name),
|
user_name=self._generate_username(entity.name),
|
||||||
name=entity.name,
|
name=entity.name,
|
||||||
bio=f"{entity.get_entity_type() or 'Entity'}: {entity.name}",
|
bio=f"{entity_type}: {entity.name}",
|
||||||
persona=entity.summary or f"A participant in social discussions.",
|
persona=entity.summary or f"A participant in social discussions.",
|
||||||
source_entity_uuid=entity.uuid,
|
source_entity_uuid=entity.uuid,
|
||||||
source_entity_type=entity.get_entity_type(),
|
source_entity_type=entity_type,
|
||||||
))
|
)
|
||||||
|
return idx, fallback_profile, str(e)
|
||||||
|
|
||||||
|
logger.info(f"开始并行生成 {total} 个Agent人设(并行数: {parallel_count})...")
|
||||||
|
print(f"\n{'='*60}")
|
||||||
|
print(f"开始生成Agent人设 - 共 {total} 个实体,并行数: {parallel_count}")
|
||||||
|
print(f"{'='*60}\n")
|
||||||
|
|
||||||
|
# 使用线程池并行执行
|
||||||
|
with concurrent.futures.ThreadPoolExecutor(max_workers=parallel_count) as executor:
|
||||||
|
# 提交所有任务
|
||||||
|
future_to_entity = {
|
||||||
|
executor.submit(generate_single_profile, idx, entity): (idx, entity)
|
||||||
|
for idx, entity in enumerate(entities)
|
||||||
|
}
|
||||||
|
|
||||||
|
# 收集结果
|
||||||
|
for future in concurrent.futures.as_completed(future_to_entity):
|
||||||
|
idx, entity = future_to_entity[future]
|
||||||
|
entity_type = entity.get_entity_type() or "Entity"
|
||||||
|
|
||||||
|
try:
|
||||||
|
result_idx, profile, error = future.result()
|
||||||
|
profiles[result_idx] = profile
|
||||||
|
|
||||||
|
with lock:
|
||||||
|
completed_count[0] += 1
|
||||||
|
current = completed_count[0]
|
||||||
|
|
||||||
|
if progress_callback:
|
||||||
|
progress_callback(
|
||||||
|
current,
|
||||||
|
total,
|
||||||
|
f"已完成 {current}/{total}: {entity.name}({entity_type})"
|
||||||
|
)
|
||||||
|
|
||||||
|
if error:
|
||||||
|
logger.warning(f"[{current}/{total}] {entity.name} 使用备用人设: {error}")
|
||||||
|
else:
|
||||||
|
logger.info(f"[{current}/{total}] 成功生成人设: {entity.name} ({entity_type})")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"处理实体 {entity.name} 时发生异常: {str(e)}")
|
||||||
|
with lock:
|
||||||
|
completed_count[0] += 1
|
||||||
|
profiles[idx] = OasisAgentProfile(
|
||||||
|
user_id=idx,
|
||||||
|
user_name=self._generate_username(entity.name),
|
||||||
|
name=entity.name,
|
||||||
|
bio=f"{entity_type}: {entity.name}",
|
||||||
|
persona=entity.summary or "A participant in social discussions.",
|
||||||
|
source_entity_uuid=entity.uuid,
|
||||||
|
source_entity_type=entity_type,
|
||||||
|
)
|
||||||
|
|
||||||
|
print(f"\n{'='*60}")
|
||||||
|
print(f"人设生成完成!共生成 {len([p for p in profiles if p])} 个Agent")
|
||||||
|
print(f"{'='*60}\n")
|
||||||
|
|
||||||
return profiles
|
return profiles
|
||||||
|
|
||||||
|
def _print_generated_profile(self, entity_name: str, entity_type: str, profile: OasisAgentProfile):
|
||||||
|
"""实时输出生成的人设到控制台(完整内容,不截断)"""
|
||||||
|
separator = "-" * 70
|
||||||
|
|
||||||
|
# 构建完整输出内容(不截断)
|
||||||
|
topics_str = ', '.join(profile.interested_topics) if profile.interested_topics else '无'
|
||||||
|
|
||||||
|
output_lines = [
|
||||||
|
f"\n{separator}",
|
||||||
|
f"[已生成] {entity_name} ({entity_type})",
|
||||||
|
f"{separator}",
|
||||||
|
f"用户名: {profile.user_name}",
|
||||||
|
f"",
|
||||||
|
f"【简介】",
|
||||||
|
f"{profile.bio}",
|
||||||
|
f"",
|
||||||
|
f"【详细人设】",
|
||||||
|
f"{profile.persona}",
|
||||||
|
f"",
|
||||||
|
f"【基本属性】",
|
||||||
|
f"年龄: {profile.age} | 性别: {profile.gender} | MBTI: {profile.mbti}",
|
||||||
|
f"职业: {profile.profession} | 国家: {profile.country}",
|
||||||
|
f"兴趣话题: {topics_str}",
|
||||||
|
separator
|
||||||
|
]
|
||||||
|
|
||||||
|
output = "\n".join(output_lines)
|
||||||
|
|
||||||
|
# 只输出到控制台(避免重复,logger不再输出完整内容)
|
||||||
|
print(output)
|
||||||
|
|
||||||
def save_profiles(
|
def save_profiles(
|
||||||
self,
|
self,
|
||||||
profiles: List[OasisAgentProfile],
|
profiles: List[OasisAgentProfile],
|
||||||
|
|
@ -470,10 +990,18 @@ Important:
|
||||||
|
|
||||||
def _save_twitter_csv(self, profiles: List[OasisAgentProfile], file_path: str):
|
def _save_twitter_csv(self, profiles: List[OasisAgentProfile], file_path: str):
|
||||||
"""
|
"""
|
||||||
保存Twitter Profile为CSV格式
|
保存Twitter Profile为CSV格式(符合OASIS官方要求)
|
||||||
|
|
||||||
OASIS Twitter要求的CSV字段:
|
OASIS Twitter要求的CSV字段:
|
||||||
user_id, user_name, name, bio, friend_count, follower_count, statuses_count, created_at
|
- user_id: 用户ID(根据CSV顺序从0开始)
|
||||||
|
- name: 用户真实姓名
|
||||||
|
- username: 系统中的用户名
|
||||||
|
- user_char: 详细人设描述(注入到LLM系统提示中,指导Agent行为)
|
||||||
|
- description: 简短的公开简介(显示在用户资料页面)
|
||||||
|
|
||||||
|
user_char vs description 区别:
|
||||||
|
- user_char: 内部使用,LLM系统提示,决定Agent如何思考和行动
|
||||||
|
- description: 外部显示,其他用户可见的简介
|
||||||
"""
|
"""
|
||||||
import csv
|
import csv
|
||||||
|
|
||||||
|
|
@ -484,28 +1012,32 @@ Important:
|
||||||
with open(file_path, 'w', newline='', encoding='utf-8') as f:
|
with open(file_path, 'w', newline='', encoding='utf-8') as f:
|
||||||
writer = csv.writer(f)
|
writer = csv.writer(f)
|
||||||
|
|
||||||
# 写入表头
|
# 写入OASIS要求的表头
|
||||||
headers = ['user_id', 'user_name', 'name', 'bio', 'friend_count',
|
headers = ['user_id', 'name', 'username', 'user_char', 'description']
|
||||||
'follower_count', 'statuses_count', 'created_at']
|
|
||||||
writer.writerow(headers)
|
writer.writerow(headers)
|
||||||
|
|
||||||
# 写入数据行
|
# 写入数据行
|
||||||
for profile in profiles:
|
for idx, profile in enumerate(profiles):
|
||||||
# bio需要处理换行符和逗号
|
# user_char: 完整人设(bio + persona),用于LLM系统提示
|
||||||
bio = profile.bio.replace('\n', ' ').replace('\r', ' ')
|
user_char = profile.bio
|
||||||
|
if profile.persona and profile.persona != profile.bio:
|
||||||
|
user_char = f"{profile.bio} {profile.persona}"
|
||||||
|
# 处理换行符(CSV中用空格替代)
|
||||||
|
user_char = user_char.replace('\n', ' ').replace('\r', ' ')
|
||||||
|
|
||||||
|
# description: 简短简介,用于外部显示
|
||||||
|
description = profile.bio.replace('\n', ' ').replace('\r', ' ')
|
||||||
|
|
||||||
row = [
|
row = [
|
||||||
profile.user_id,
|
idx, # user_id: 从0开始的顺序ID
|
||||||
profile.user_name,
|
profile.name, # name: 真实姓名
|
||||||
profile.name,
|
profile.user_name, # username: 用户名
|
||||||
bio,
|
user_char, # user_char: 完整人设(内部LLM使用)
|
||||||
profile.friend_count,
|
description # description: 简短简介(外部显示)
|
||||||
profile.follower_count,
|
|
||||||
profile.statuses_count,
|
|
||||||
profile.created_at
|
|
||||||
]
|
]
|
||||||
writer.writerow(row)
|
writer.writerow(row)
|
||||||
|
|
||||||
logger.info(f"已保存 {len(profiles)} 个Twitter Profile到 {file_path} (CSV格式)")
|
logger.info(f"已保存 {len(profiles)} 个Twitter Profile到 {file_path} (OASIS CSV格式)")
|
||||||
|
|
||||||
def _save_reddit_json(self, profiles: List[OasisAgentProfile], file_path: str):
|
def _save_reddit_json(self, profiles: List[OasisAgentProfile], file_path: str):
|
||||||
"""
|
"""
|
||||||
|
|
|
||||||
|
|
@ -2,10 +2,17 @@
|
||||||
模拟配置智能生成器
|
模拟配置智能生成器
|
||||||
使用LLM根据模拟需求、文档内容、图谱信息自动生成细致的模拟参数
|
使用LLM根据模拟需求、文档内容、图谱信息自动生成细致的模拟参数
|
||||||
实现全程自动化,无需人工设置参数
|
实现全程自动化,无需人工设置参数
|
||||||
|
|
||||||
|
采用分步生成策略,避免一次性生成过长内容导致失败:
|
||||||
|
1. 生成时间配置
|
||||||
|
2. 生成事件配置
|
||||||
|
3. 分批生成Agent配置
|
||||||
|
4. 生成平台配置
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import json
|
import json
|
||||||
from typing import Dict, Any, List, Optional
|
import math
|
||||||
|
from typing import Dict, Any, List, Optional, Callable
|
||||||
from dataclasses import dataclass, field, asdict
|
from dataclasses import dataclass, field, asdict
|
||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
|
|
||||||
|
|
@ -17,6 +24,28 @@ from .zep_entity_reader import EntityNode, ZepEntityReader
|
||||||
|
|
||||||
logger = get_logger('mirofish.simulation_config')
|
logger = get_logger('mirofish.simulation_config')
|
||||||
|
|
||||||
|
# 中国作息时间配置(北京时间)
|
||||||
|
CHINA_TIMEZONE_CONFIG = {
|
||||||
|
# 深夜时段(几乎无人活动)
|
||||||
|
"dead_hours": [0, 1, 2, 3, 4, 5],
|
||||||
|
# 早间时段(逐渐醒来)
|
||||||
|
"morning_hours": [6, 7, 8],
|
||||||
|
# 工作时段
|
||||||
|
"work_hours": [9, 10, 11, 12, 13, 14, 15, 16, 17, 18],
|
||||||
|
# 晚间高峰(最活跃)
|
||||||
|
"peak_hours": [19, 20, 21, 22],
|
||||||
|
# 夜间时段(活跃度下降)
|
||||||
|
"night_hours": [23],
|
||||||
|
# 活跃度系数
|
||||||
|
"activity_multipliers": {
|
||||||
|
"dead": 0.05, # 凌晨几乎无人
|
||||||
|
"morning": 0.4, # 早间逐渐活跃
|
||||||
|
"work": 0.7, # 工作时段中等
|
||||||
|
"peak": 1.5, # 晚间高峰
|
||||||
|
"night": 0.5 # 深夜下降
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class AgentActivityConfig:
|
class AgentActivityConfig:
|
||||||
|
|
@ -52,7 +81,7 @@ class AgentActivityConfig:
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class TimeSimulationConfig:
|
class TimeSimulationConfig:
|
||||||
"""时间模拟配置"""
|
"""时间模拟配置(基于中国人作息习惯)"""
|
||||||
# 模拟总时长(模拟小时数)
|
# 模拟总时长(模拟小时数)
|
||||||
total_simulation_hours: int = 72 # 默认模拟72小时(3天)
|
total_simulation_hours: int = 72 # 默认模拟72小时(3天)
|
||||||
|
|
||||||
|
|
@ -63,13 +92,21 @@ class TimeSimulationConfig:
|
||||||
agents_per_hour_min: int = 5
|
agents_per_hour_min: int = 5
|
||||||
agents_per_hour_max: int = 20
|
agents_per_hour_max: int = 20
|
||||||
|
|
||||||
# 高峰时段(活跃度提升)
|
# 高峰时段(晚间19-22点,中国人最活跃的时间)
|
||||||
peak_hours: List[int] = field(default_factory=lambda: [9, 10, 11, 14, 15, 20, 21, 22])
|
peak_hours: List[int] = field(default_factory=lambda: [19, 20, 21, 22])
|
||||||
peak_activity_multiplier: float = 1.5
|
peak_activity_multiplier: float = 1.5
|
||||||
|
|
||||||
# 低谷时段(活跃度降低)
|
# 低谷时段(凌晨0-5点,几乎无人活动)
|
||||||
off_peak_hours: List[int] = field(default_factory=lambda: [0, 1, 2, 3, 4, 5, 6])
|
off_peak_hours: List[int] = field(default_factory=lambda: [0, 1, 2, 3, 4, 5])
|
||||||
off_peak_activity_multiplier: float = 0.3
|
off_peak_activity_multiplier: float = 0.05 # 凌晨活跃度极低
|
||||||
|
|
||||||
|
# 早间时段
|
||||||
|
morning_hours: List[int] = field(default_factory=lambda: [6, 7, 8])
|
||||||
|
morning_activity_multiplier: float = 0.4
|
||||||
|
|
||||||
|
# 工作时段
|
||||||
|
work_hours: List[int] = field(default_factory=lambda: [9, 10, 11, 12, 13, 14, 15, 16, 17, 18])
|
||||||
|
work_activity_multiplier: float = 0.7
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
|
|
@ -137,12 +174,13 @@ class SimulationParameters:
|
||||||
|
|
||||||
def to_dict(self) -> Dict[str, Any]:
|
def to_dict(self) -> Dict[str, Any]:
|
||||||
"""转换为字典"""
|
"""转换为字典"""
|
||||||
|
time_dict = asdict(self.time_config)
|
||||||
return {
|
return {
|
||||||
"simulation_id": self.simulation_id,
|
"simulation_id": self.simulation_id,
|
||||||
"project_id": self.project_id,
|
"project_id": self.project_id,
|
||||||
"graph_id": self.graph_id,
|
"graph_id": self.graph_id,
|
||||||
"simulation_requirement": self.simulation_requirement,
|
"simulation_requirement": self.simulation_requirement,
|
||||||
"time_config": asdict(self.time_config),
|
"time_config": time_dict,
|
||||||
"agent_configs": [asdict(a) for a in self.agent_configs],
|
"agent_configs": [asdict(a) for a in self.agent_configs],
|
||||||
"event_config": asdict(self.event_config),
|
"event_config": asdict(self.event_config),
|
||||||
"twitter_config": asdict(self.twitter_config) if self.twitter_config else None,
|
"twitter_config": asdict(self.twitter_config) if self.twitter_config else None,
|
||||||
|
|
@ -164,10 +202,17 @@ class SimulationConfigGenerator:
|
||||||
|
|
||||||
使用LLM分析模拟需求、文档内容、图谱实体信息,
|
使用LLM分析模拟需求、文档内容、图谱实体信息,
|
||||||
自动生成最佳的模拟参数配置
|
自动生成最佳的模拟参数配置
|
||||||
|
|
||||||
|
采用分步生成策略:
|
||||||
|
1. 生成时间配置和事件配置(轻量级)
|
||||||
|
2. 分批生成Agent配置(每批10-15个)
|
||||||
|
3. 生成平台配置
|
||||||
"""
|
"""
|
||||||
|
|
||||||
# 上下文最大字符数
|
# 上下文最大字符数
|
||||||
MAX_CONTEXT_LENGTH = 50000
|
MAX_CONTEXT_LENGTH = 50000
|
||||||
|
# 每批生成的Agent数量
|
||||||
|
AGENTS_PER_BATCH = 15
|
||||||
|
|
||||||
def __init__(
|
def __init__(
|
||||||
self,
|
self,
|
||||||
|
|
@ -197,9 +242,10 @@ class SimulationConfigGenerator:
|
||||||
entities: List[EntityNode],
|
entities: List[EntityNode],
|
||||||
enable_twitter: bool = True,
|
enable_twitter: bool = True,
|
||||||
enable_reddit: bool = True,
|
enable_reddit: bool = True,
|
||||||
|
progress_callback: Optional[Callable[[int, int, str], None]] = None,
|
||||||
) -> SimulationParameters:
|
) -> SimulationParameters:
|
||||||
"""
|
"""
|
||||||
智能生成完整的模拟配置
|
智能生成完整的模拟配置(分步生成)
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
simulation_id: 模拟ID
|
simulation_id: 模拟ID
|
||||||
|
|
@ -210,37 +256,107 @@ class SimulationConfigGenerator:
|
||||||
entities: 过滤后的实体列表
|
entities: 过滤后的实体列表
|
||||||
enable_twitter: 是否启用Twitter
|
enable_twitter: 是否启用Twitter
|
||||||
enable_reddit: 是否启用Reddit
|
enable_reddit: 是否启用Reddit
|
||||||
|
progress_callback: 进度回调函数(current_step, total_steps, message)
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
SimulationParameters: 完整的模拟参数
|
SimulationParameters: 完整的模拟参数
|
||||||
"""
|
"""
|
||||||
logger.info(f"开始智能生成模拟配置: simulation_id={simulation_id}")
|
logger.info(f"开始智能生成模拟配置: simulation_id={simulation_id}, 实体数={len(entities)}")
|
||||||
|
|
||||||
# 1. 构建上下文信息(截断到50000字符)
|
# 计算总步骤数
|
||||||
|
num_batches = math.ceil(len(entities) / self.AGENTS_PER_BATCH)
|
||||||
|
total_steps = 3 + num_batches # 时间配置 + 事件配置 + N批Agent + 平台配置
|
||||||
|
current_step = 0
|
||||||
|
|
||||||
|
def report_progress(step: int, message: str):
|
||||||
|
nonlocal current_step
|
||||||
|
current_step = step
|
||||||
|
if progress_callback:
|
||||||
|
progress_callback(step, total_steps, message)
|
||||||
|
logger.info(f"[{step}/{total_steps}] {message}")
|
||||||
|
|
||||||
|
# 1. 构建基础上下文信息
|
||||||
context = self._build_context(
|
context = self._build_context(
|
||||||
simulation_requirement=simulation_requirement,
|
simulation_requirement=simulation_requirement,
|
||||||
document_text=document_text,
|
document_text=document_text,
|
||||||
entities=entities
|
entities=entities
|
||||||
)
|
)
|
||||||
|
|
||||||
# 2. 调用LLM生成配置
|
reasoning_parts = []
|
||||||
llm_result = self._generate_config_with_llm(
|
|
||||||
context=context,
|
|
||||||
entities=entities,
|
|
||||||
enable_twitter=enable_twitter,
|
|
||||||
enable_reddit=enable_reddit
|
|
||||||
)
|
|
||||||
|
|
||||||
# 3. 构建SimulationParameters对象
|
# ========== 步骤1: 生成时间配置 ==========
|
||||||
params = self._build_parameters(
|
report_progress(1, "生成时间配置...")
|
||||||
|
time_config_result = self._generate_time_config(context, len(entities))
|
||||||
|
time_config = self._parse_time_config(time_config_result)
|
||||||
|
reasoning_parts.append(f"时间配置: {time_config_result.get('reasoning', '成功')}")
|
||||||
|
|
||||||
|
# ========== 步骤2: 生成事件配置 ==========
|
||||||
|
report_progress(2, "生成事件配置和热点话题...")
|
||||||
|
event_config_result = self._generate_event_config(context, simulation_requirement)
|
||||||
|
event_config = self._parse_event_config(event_config_result)
|
||||||
|
reasoning_parts.append(f"事件配置: {event_config_result.get('reasoning', '成功')}")
|
||||||
|
|
||||||
|
# ========== 步骤3-N: 分批生成Agent配置 ==========
|
||||||
|
all_agent_configs = []
|
||||||
|
for batch_idx in range(num_batches):
|
||||||
|
start_idx = batch_idx * self.AGENTS_PER_BATCH
|
||||||
|
end_idx = min(start_idx + self.AGENTS_PER_BATCH, len(entities))
|
||||||
|
batch_entities = entities[start_idx:end_idx]
|
||||||
|
|
||||||
|
report_progress(
|
||||||
|
3 + batch_idx,
|
||||||
|
f"生成Agent配置 ({start_idx + 1}-{end_idx}/{len(entities)})..."
|
||||||
|
)
|
||||||
|
|
||||||
|
batch_configs = self._generate_agent_configs_batch(
|
||||||
|
context=context,
|
||||||
|
entities=batch_entities,
|
||||||
|
start_idx=start_idx,
|
||||||
|
simulation_requirement=simulation_requirement
|
||||||
|
)
|
||||||
|
all_agent_configs.extend(batch_configs)
|
||||||
|
|
||||||
|
reasoning_parts.append(f"Agent配置: 成功生成 {len(all_agent_configs)} 个")
|
||||||
|
|
||||||
|
# ========== 最后一步: 生成平台配置 ==========
|
||||||
|
report_progress(total_steps, "生成平台配置...")
|
||||||
|
twitter_config = None
|
||||||
|
reddit_config = None
|
||||||
|
|
||||||
|
if enable_twitter:
|
||||||
|
twitter_config = PlatformConfig(
|
||||||
|
platform="twitter",
|
||||||
|
recency_weight=0.4,
|
||||||
|
popularity_weight=0.3,
|
||||||
|
relevance_weight=0.3,
|
||||||
|
viral_threshold=10,
|
||||||
|
echo_chamber_strength=0.5
|
||||||
|
)
|
||||||
|
|
||||||
|
if enable_reddit:
|
||||||
|
reddit_config = PlatformConfig(
|
||||||
|
platform="reddit",
|
||||||
|
recency_weight=0.3,
|
||||||
|
popularity_weight=0.4,
|
||||||
|
relevance_weight=0.3,
|
||||||
|
viral_threshold=15,
|
||||||
|
echo_chamber_strength=0.6
|
||||||
|
)
|
||||||
|
|
||||||
|
# 构建最终参数
|
||||||
|
params = SimulationParameters(
|
||||||
simulation_id=simulation_id,
|
simulation_id=simulation_id,
|
||||||
project_id=project_id,
|
project_id=project_id,
|
||||||
graph_id=graph_id,
|
graph_id=graph_id,
|
||||||
simulation_requirement=simulation_requirement,
|
simulation_requirement=simulation_requirement,
|
||||||
entities=entities,
|
time_config=time_config,
|
||||||
llm_result=llm_result,
|
agent_configs=all_agent_configs,
|
||||||
enable_twitter=enable_twitter,
|
event_config=event_config,
|
||||||
enable_reddit=enable_reddit
|
twitter_config=twitter_config,
|
||||||
|
reddit_config=reddit_config,
|
||||||
|
llm_model=self.model_name,
|
||||||
|
llm_base_url=self.base_url,
|
||||||
|
generation_reasoning=" | ".join(reasoning_parts)
|
||||||
)
|
)
|
||||||
|
|
||||||
logger.info(f"模拟配置生成完成: {len(params.agent_configs)} 个Agent配置")
|
logger.info(f"模拟配置生成完成: {len(params.agent_configs)} 个Agent配置")
|
||||||
|
|
@ -297,288 +413,397 @@ class SimulationConfigGenerator:
|
||||||
|
|
||||||
return "\n".join(lines)
|
return "\n".join(lines)
|
||||||
|
|
||||||
def _generate_config_with_llm(
|
def _call_llm_with_retry(self, prompt: str, system_prompt: str) -> Dict[str, Any]:
|
||||||
|
"""带重试的LLM调用,包含JSON修复逻辑"""
|
||||||
|
import re
|
||||||
|
|
||||||
|
max_attempts = 3
|
||||||
|
last_error = None
|
||||||
|
|
||||||
|
for attempt in range(max_attempts):
|
||||||
|
try:
|
||||||
|
response = self.client.chat.completions.create(
|
||||||
|
model=self.model_name,
|
||||||
|
messages=[
|
||||||
|
{"role": "system", "content": system_prompt},
|
||||||
|
{"role": "user", "content": prompt}
|
||||||
|
],
|
||||||
|
response_format={"type": "json_object"},
|
||||||
|
temperature=0.7 - (attempt * 0.1) # 每次重试降低温度
|
||||||
|
# 不设置max_tokens,让LLM自由发挥
|
||||||
|
)
|
||||||
|
|
||||||
|
content = response.choices[0].message.content
|
||||||
|
finish_reason = response.choices[0].finish_reason
|
||||||
|
|
||||||
|
# 检查是否被截断
|
||||||
|
if finish_reason == 'length':
|
||||||
|
logger.warning(f"LLM输出被截断 (attempt {attempt+1})")
|
||||||
|
content = self._fix_truncated_json(content)
|
||||||
|
|
||||||
|
# 尝试解析JSON
|
||||||
|
try:
|
||||||
|
return json.loads(content)
|
||||||
|
except json.JSONDecodeError as e:
|
||||||
|
logger.warning(f"JSON解析失败 (attempt {attempt+1}): {str(e)[:80]}")
|
||||||
|
|
||||||
|
# 尝试修复JSON
|
||||||
|
fixed = self._try_fix_config_json(content)
|
||||||
|
if fixed:
|
||||||
|
return fixed
|
||||||
|
|
||||||
|
last_error = e
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"LLM调用失败 (attempt {attempt+1}): {str(e)[:80]}")
|
||||||
|
last_error = e
|
||||||
|
import time
|
||||||
|
time.sleep(2 * (attempt + 1))
|
||||||
|
|
||||||
|
raise last_error or Exception("LLM调用失败")
|
||||||
|
|
||||||
|
def _fix_truncated_json(self, content: str) -> str:
|
||||||
|
"""修复被截断的JSON"""
|
||||||
|
content = content.strip()
|
||||||
|
|
||||||
|
# 计算未闭合的括号
|
||||||
|
open_braces = content.count('{') - content.count('}')
|
||||||
|
open_brackets = content.count('[') - content.count(']')
|
||||||
|
|
||||||
|
# 检查是否有未闭合的字符串
|
||||||
|
if content and content[-1] not in '",}]':
|
||||||
|
content += '"'
|
||||||
|
|
||||||
|
# 闭合括号
|
||||||
|
content += ']' * open_brackets
|
||||||
|
content += '}' * open_braces
|
||||||
|
|
||||||
|
return content
|
||||||
|
|
||||||
|
def _try_fix_config_json(self, content: str) -> Optional[Dict[str, Any]]:
|
||||||
|
"""尝试修复配置JSON"""
|
||||||
|
import re
|
||||||
|
|
||||||
|
# 修复被截断的情况
|
||||||
|
content = self._fix_truncated_json(content)
|
||||||
|
|
||||||
|
# 提取JSON部分
|
||||||
|
json_match = re.search(r'\{[\s\S]*\}', content)
|
||||||
|
if json_match:
|
||||||
|
json_str = json_match.group()
|
||||||
|
|
||||||
|
# 移除字符串中的换行符
|
||||||
|
def fix_string(match):
|
||||||
|
s = match.group(0)
|
||||||
|
s = s.replace('\n', ' ').replace('\r', ' ')
|
||||||
|
s = re.sub(r'\s+', ' ', s)
|
||||||
|
return s
|
||||||
|
|
||||||
|
json_str = re.sub(r'"[^"\\]*(?:\\.[^"\\]*)*"', fix_string, json_str)
|
||||||
|
|
||||||
|
try:
|
||||||
|
return json.loads(json_str)
|
||||||
|
except:
|
||||||
|
# 尝试移除所有控制字符
|
||||||
|
json_str = re.sub(r'[\x00-\x1f\x7f-\x9f]', ' ', json_str)
|
||||||
|
json_str = re.sub(r'\s+', ' ', json_str)
|
||||||
|
try:
|
||||||
|
return json.loads(json_str)
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
def _generate_time_config(self, context: str, num_entities: int) -> Dict[str, Any]:
|
||||||
|
"""生成时间配置"""
|
||||||
|
prompt = f"""基于以下模拟需求,生成时间模拟配置。
|
||||||
|
|
||||||
|
{context[:5000]}
|
||||||
|
|
||||||
|
## 任务
|
||||||
|
请生成时间配置JSON,注意:
|
||||||
|
- 用户群体为中国人,需符合北京时间作息习惯
|
||||||
|
- 凌晨0-5点几乎无人活动(活跃度系数0.05)
|
||||||
|
- 早上6-8点逐渐活跃(活跃度系数0.4)
|
||||||
|
- 工作时间9-18点中等活跃(活跃度系数0.7)
|
||||||
|
- 晚间19-22点是高峰期(活跃度系数1.5)
|
||||||
|
- 23点后活跃度下降(活跃度系数0.5)
|
||||||
|
|
||||||
|
当前实体数量: {num_entities}
|
||||||
|
|
||||||
|
返回JSON格式(不要markdown):
|
||||||
|
{{
|
||||||
|
"total_simulation_hours": <72-168,根据事件性质决定>,
|
||||||
|
"minutes_per_round": <15-60>,
|
||||||
|
"agents_per_hour_min": <每小时最少激活Agent数>,
|
||||||
|
"agents_per_hour_max": <每小时最多激活Agent数>,
|
||||||
|
"peak_hours": [19, 20, 21, 22],
|
||||||
|
"off_peak_hours": [0, 1, 2, 3, 4, 5],
|
||||||
|
"morning_hours": [6, 7, 8],
|
||||||
|
"work_hours": [9, 10, 11, 12, 13, 14, 15, 16, 17, 18],
|
||||||
|
"reasoning": "<简要说明>"
|
||||||
|
}}"""
|
||||||
|
|
||||||
|
system_prompt = "你是社交媒体模拟专家。返回纯JSON格式,时间配置需符合中国人作息习惯。"
|
||||||
|
|
||||||
|
try:
|
||||||
|
return self._call_llm_with_retry(prompt, system_prompt)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"时间配置LLM生成失败: {e}, 使用默认配置")
|
||||||
|
return self._get_default_time_config(num_entities)
|
||||||
|
|
||||||
|
def _get_default_time_config(self, num_entities: int) -> Dict[str, Any]:
|
||||||
|
"""获取默认时间配置(中国人作息)"""
|
||||||
|
return {
|
||||||
|
"total_simulation_hours": 72,
|
||||||
|
"minutes_per_round": 30,
|
||||||
|
"agents_per_hour_min": max(1, num_entities // 15),
|
||||||
|
"agents_per_hour_max": max(5, num_entities // 5),
|
||||||
|
"peak_hours": [19, 20, 21, 22],
|
||||||
|
"off_peak_hours": [0, 1, 2, 3, 4, 5],
|
||||||
|
"morning_hours": [6, 7, 8],
|
||||||
|
"work_hours": [9, 10, 11, 12, 13, 14, 15, 16, 17, 18],
|
||||||
|
"reasoning": "使用默认中国人作息配置"
|
||||||
|
}
|
||||||
|
|
||||||
|
def _parse_time_config(self, result: Dict[str, Any]) -> TimeSimulationConfig:
|
||||||
|
"""解析时间配置结果"""
|
||||||
|
return TimeSimulationConfig(
|
||||||
|
total_simulation_hours=result.get("total_simulation_hours", 72),
|
||||||
|
minutes_per_round=result.get("minutes_per_round", 30),
|
||||||
|
agents_per_hour_min=result.get("agents_per_hour_min", 5),
|
||||||
|
agents_per_hour_max=result.get("agents_per_hour_max", 20),
|
||||||
|
peak_hours=result.get("peak_hours", [19, 20, 21, 22]),
|
||||||
|
off_peak_hours=result.get("off_peak_hours", [0, 1, 2, 3, 4, 5]),
|
||||||
|
off_peak_activity_multiplier=0.05, # 凌晨几乎无人
|
||||||
|
morning_hours=result.get("morning_hours", [6, 7, 8]),
|
||||||
|
morning_activity_multiplier=0.4,
|
||||||
|
work_hours=result.get("work_hours", list(range(9, 19))),
|
||||||
|
work_activity_multiplier=0.7,
|
||||||
|
peak_activity_multiplier=1.5
|
||||||
|
)
|
||||||
|
|
||||||
|
def _generate_event_config(self, context: str, simulation_requirement: str) -> Dict[str, Any]:
|
||||||
|
"""生成事件配置"""
|
||||||
|
prompt = f"""基于以下模拟需求,生成事件配置。
|
||||||
|
|
||||||
|
模拟需求: {simulation_requirement}
|
||||||
|
|
||||||
|
{context[:3000]}
|
||||||
|
|
||||||
|
## 任务
|
||||||
|
请生成事件配置JSON:
|
||||||
|
- 提取热点话题关键词
|
||||||
|
- 描述舆论发展方向
|
||||||
|
- 设计初始帖子内容
|
||||||
|
|
||||||
|
返回JSON格式(不要markdown):
|
||||||
|
{{
|
||||||
|
"hot_topics": ["关键词1", "关键词2", ...],
|
||||||
|
"narrative_direction": "<舆论发展方向描述>",
|
||||||
|
"initial_posts": [
|
||||||
|
{{"content": "帖子内容", "poster_type": "MediaOutlet"}},
|
||||||
|
...
|
||||||
|
],
|
||||||
|
"reasoning": "<简要说明>"
|
||||||
|
}}"""
|
||||||
|
|
||||||
|
system_prompt = "你是舆论分析专家。返回纯JSON格式。"
|
||||||
|
|
||||||
|
try:
|
||||||
|
return self._call_llm_with_retry(prompt, system_prompt)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"事件配置LLM生成失败: {e}, 使用默认配置")
|
||||||
|
return {
|
||||||
|
"hot_topics": [],
|
||||||
|
"narrative_direction": "",
|
||||||
|
"initial_posts": [],
|
||||||
|
"reasoning": "使用默认配置"
|
||||||
|
}
|
||||||
|
|
||||||
|
def _parse_event_config(self, result: Dict[str, Any]) -> EventConfig:
|
||||||
|
"""解析事件配置结果"""
|
||||||
|
return EventConfig(
|
||||||
|
initial_posts=result.get("initial_posts", []),
|
||||||
|
scheduled_events=[],
|
||||||
|
hot_topics=result.get("hot_topics", []),
|
||||||
|
narrative_direction=result.get("narrative_direction", "")
|
||||||
|
)
|
||||||
|
|
||||||
|
def _generate_agent_configs_batch(
|
||||||
self,
|
self,
|
||||||
context: str,
|
context: str,
|
||||||
entities: List[EntityNode],
|
entities: List[EntityNode],
|
||||||
enable_twitter: bool,
|
start_idx: int,
|
||||||
enable_reddit: bool
|
simulation_requirement: str
|
||||||
) -> Dict[str, Any]:
|
) -> List[AgentActivityConfig]:
|
||||||
"""调用LLM生成配置"""
|
"""分批生成Agent配置"""
|
||||||
|
|
||||||
# 构建实体列表用于Agent配置
|
# 构建实体信息
|
||||||
entity_list = []
|
entity_list = []
|
||||||
for i, e in enumerate(entities):
|
for i, e in enumerate(entities):
|
||||||
entity_list.append({
|
entity_list.append({
|
||||||
"agent_id": i,
|
"agent_id": start_idx + i,
|
||||||
"entity_uuid": e.uuid,
|
|
||||||
"entity_name": e.name,
|
"entity_name": e.name,
|
||||||
"entity_type": e.get_entity_type() or "Unknown",
|
"entity_type": e.get_entity_type() or "Unknown",
|
||||||
"summary": e.summary[:200] if e.summary else ""
|
"summary": e.summary[:150] if e.summary else ""
|
||||||
})
|
})
|
||||||
|
|
||||||
prompt = f"""你是一个社交媒体舆论模拟专家。请根据以下信息,生成详细的模拟参数配置。
|
prompt = f"""基于以下信息,为每个实体生成社交媒体活动配置。
|
||||||
|
|
||||||
{context}
|
模拟需求: {simulation_requirement}
|
||||||
|
|
||||||
## 实体列表(需要为每个实体生成活动配置)
|
## 实体列表
|
||||||
```json
|
```json
|
||||||
{json.dumps(entity_list, ensure_ascii=False, indent=2)}
|
{json.dumps(entity_list, ensure_ascii=False, indent=2)}
|
||||||
```
|
```
|
||||||
|
|
||||||
## 任务
|
## 任务
|
||||||
请生成一个JSON配置,包含以下部分:
|
为每个实体生成活动配置,注意:
|
||||||
|
- **时间符合中国人作息**:凌晨0-5点几乎不活动,晚间19-22点最活跃
|
||||||
|
- **官方机构**(University/GovernmentAgency):活跃度低(0.1-0.3),工作时间(9-17)活动,响应慢(60-240分钟),影响力高(2.5-3.0)
|
||||||
|
- **媒体**(MediaOutlet):活跃度中(0.4-0.6),全天活动(8-23),响应快(5-30分钟),影响力高(2.0-2.5)
|
||||||
|
- **个人**(Student/Person/Alumni):活跃度高(0.6-0.9),主要晚间活动(18-23),响应快(1-15分钟),影响力低(0.8-1.2)
|
||||||
|
- **公众人物/专家**:活跃度中(0.4-0.6),影响力中高(1.5-2.0)
|
||||||
|
|
||||||
1. **time_config** - 时间模拟配置
|
返回JSON格式(不要markdown):
|
||||||
- total_simulation_hours: 模拟总时长(小时),根据事件性质决定(短期热点24-72小时,长期舆论168-336小时)
|
{{
|
||||||
- minutes_per_round: 每轮代表的时间(分钟),建议15-60
|
"agent_configs": [
|
||||||
- agents_per_hour_min/max: 每小时激活的Agent数量范围
|
{{
|
||||||
- peak_hours: 高峰时段列表(0-23)
|
"agent_id": <必须与输入一致>,
|
||||||
- off_peak_hours: 低谷时段列表
|
"activity_level": <0.0-1.0>,
|
||||||
|
"posts_per_hour": <发帖频率>,
|
||||||
2. **agent_configs** - 每个Agent的活动配置(必须为每个实体生成)
|
"comments_per_hour": <评论频率>,
|
||||||
对于每个agent_id,设置:
|
"active_hours": [<活跃小时列表,考虑中国人作息>],
|
||||||
- activity_level: 活跃度(0.0-1.0),官方机构通常0.1-0.3,媒体0.3-0.5,个人0.5-0.9
|
"response_delay_min": <最小响应延迟分钟>,
|
||||||
- posts_per_hour: 每小时发帖频率,官方机构0.05-0.2,媒体0.5-2,个人0.1-1
|
"response_delay_max": <最大响应延迟分钟>,
|
||||||
- comments_per_hour: 每小时评论频率
|
"sentiment_bias": <-1.0到1.0>,
|
||||||
- active_hours: 活跃时间段列表,官方通常工作时间,个人更分散
|
"stance": "<supportive/opposing/neutral/observer>",
|
||||||
- response_delay_min/max: 响应延迟(模拟分钟),官方较慢(30-180),个人较快(1-30)
|
"influence_weight": <影响力权重>
|
||||||
- sentiment_bias: 情感倾向(-1到1),根据实体立场设置
|
}},
|
||||||
- stance: 立场(supportive/opposing/neutral/observer)
|
...
|
||||||
- influence_weight: 影响力权重,知名人物和媒体较高
|
]
|
||||||
|
}}"""
|
||||||
3. **event_config** - 事件配置
|
|
||||||
- initial_posts: 初始帖子列表,包含content和poster_agent_id
|
|
||||||
- hot_topics: 热点话题关键词列表
|
|
||||||
- narrative_direction: 舆论发展方向描述
|
|
||||||
|
|
||||||
4. **platform_configs** - 平台配置(如果启用)
|
|
||||||
- viral_threshold: 病毒传播阈值
|
|
||||||
- echo_chamber_strength: 回声室效应强度(0-1)
|
|
||||||
|
|
||||||
5. **reasoning** - 你的推理说明,解释为什么这样设置参数
|
|
||||||
|
|
||||||
## 重要原则
|
|
||||||
- 官方机构(University、GovernmentAgency)发言频率低但影响力大
|
|
||||||
- 媒体(MediaOutlet)发言频率中等,传播速度快
|
|
||||||
- 个人(Student、PublicFigure)发言频率高但影响力分散
|
|
||||||
- 根据模拟需求判断各实体的立场和情感倾向
|
|
||||||
- 时间配置要符合真实社交媒体的使用规律
|
|
||||||
|
|
||||||
请返回JSON格式,不要包含markdown代码块标记。"""
|
|
||||||
|
|
||||||
|
system_prompt = "你是社交媒体行为分析专家。返回纯JSON,配置需符合中国人作息习惯。"
|
||||||
|
|
||||||
try:
|
try:
|
||||||
# 使用重试机制调用LLM API
|
result = self._call_llm_with_retry(prompt, system_prompt)
|
||||||
from ..utils.retry import RetryableAPIClient
|
llm_configs = {cfg["agent_id"]: cfg for cfg in result.get("agent_configs", [])}
|
||||||
|
|
||||||
retry_client = RetryableAPIClient(max_retries=3, initial_delay=2.0, max_delay=60.0)
|
|
||||||
|
|
||||||
def call_llm():
|
|
||||||
return self.client.chat.completions.create(
|
|
||||||
model=self.model_name,
|
|
||||||
messages=[
|
|
||||||
{
|
|
||||||
"role": "system",
|
|
||||||
"content": "你是社交媒体舆论模拟专家,擅长设计真实的模拟参数。返回纯JSON格式,不要markdown。"
|
|
||||||
},
|
|
||||||
{"role": "user", "content": prompt}
|
|
||||||
],
|
|
||||||
response_format={"type": "json_object"},
|
|
||||||
temperature=0.7,
|
|
||||||
max_tokens=8000
|
|
||||||
)
|
|
||||||
|
|
||||||
response = retry_client.call_with_retry(call_llm)
|
|
||||||
result = json.loads(response.choices[0].message.content)
|
|
||||||
logger.info(f"LLM配置生成成功")
|
|
||||||
return result
|
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"LLM配置生成失败(已重试): {str(e)}")
|
logger.warning(f"Agent配置批次LLM生成失败: {e}, 使用规则生成")
|
||||||
# 返回默认配置
|
llm_configs = {}
|
||||||
return self._generate_default_config(entities)
|
|
||||||
|
|
||||||
def _generate_default_config(self, entities: List[EntityNode]) -> Dict[str, Any]:
|
|
||||||
"""生成默认配置(LLM失败时的fallback)"""
|
|
||||||
agent_configs = []
|
|
||||||
|
|
||||||
for i, e in enumerate(entities):
|
|
||||||
entity_type = (e.get_entity_type() or "Unknown").lower()
|
|
||||||
|
|
||||||
# 根据实体类型设置默认参数
|
|
||||||
if entity_type in ["university", "governmentagency", "ngo"]:
|
|
||||||
config = {
|
|
||||||
"agent_id": i,
|
|
||||||
"activity_level": 0.2,
|
|
||||||
"posts_per_hour": 0.1,
|
|
||||||
"comments_per_hour": 0.05,
|
|
||||||
"active_hours": list(range(9, 18)),
|
|
||||||
"response_delay_min": 60,
|
|
||||||
"response_delay_max": 240,
|
|
||||||
"sentiment_bias": 0.0,
|
|
||||||
"stance": "neutral",
|
|
||||||
"influence_weight": 3.0
|
|
||||||
}
|
|
||||||
elif entity_type in ["mediaoutlet"]:
|
|
||||||
config = {
|
|
||||||
"agent_id": i,
|
|
||||||
"activity_level": 0.6,
|
|
||||||
"posts_per_hour": 1.0,
|
|
||||||
"comments_per_hour": 0.5,
|
|
||||||
"active_hours": list(range(6, 24)),
|
|
||||||
"response_delay_min": 5,
|
|
||||||
"response_delay_max": 30,
|
|
||||||
"sentiment_bias": 0.0,
|
|
||||||
"stance": "observer",
|
|
||||||
"influence_weight": 2.5
|
|
||||||
}
|
|
||||||
elif entity_type in ["publicfigure", "expert"]:
|
|
||||||
config = {
|
|
||||||
"agent_id": i,
|
|
||||||
"activity_level": 0.5,
|
|
||||||
"posts_per_hour": 0.3,
|
|
||||||
"comments_per_hour": 0.5,
|
|
||||||
"active_hours": list(range(8, 23)),
|
|
||||||
"response_delay_min": 10,
|
|
||||||
"response_delay_max": 60,
|
|
||||||
"sentiment_bias": 0.0,
|
|
||||||
"stance": "neutral",
|
|
||||||
"influence_weight": 2.0
|
|
||||||
}
|
|
||||||
else: # Student, Person, etc.
|
|
||||||
config = {
|
|
||||||
"agent_id": i,
|
|
||||||
"activity_level": 0.7,
|
|
||||||
"posts_per_hour": 0.5,
|
|
||||||
"comments_per_hour": 1.0,
|
|
||||||
"active_hours": list(range(7, 24)),
|
|
||||||
"response_delay_min": 1,
|
|
||||||
"response_delay_max": 20,
|
|
||||||
"sentiment_bias": 0.0,
|
|
||||||
"stance": "neutral",
|
|
||||||
"influence_weight": 1.0
|
|
||||||
}
|
|
||||||
|
|
||||||
agent_configs.append(config)
|
|
||||||
|
|
||||||
return {
|
|
||||||
"time_config": {
|
|
||||||
"total_simulation_hours": 72,
|
|
||||||
"minutes_per_round": 30,
|
|
||||||
"agents_per_hour_min": max(1, len(entities) // 10),
|
|
||||||
"agents_per_hour_max": max(5, len(entities) // 3),
|
|
||||||
"peak_hours": [9, 10, 11, 14, 15, 20, 21, 22],
|
|
||||||
"off_peak_hours": [0, 1, 2, 3, 4, 5]
|
|
||||||
},
|
|
||||||
"agent_configs": agent_configs,
|
|
||||||
"event_config": {
|
|
||||||
"initial_posts": [],
|
|
||||||
"hot_topics": [],
|
|
||||||
"narrative_direction": ""
|
|
||||||
},
|
|
||||||
"reasoning": "使用默认配置(LLM生成失败)"
|
|
||||||
}
|
|
||||||
|
|
||||||
def _build_parameters(
|
|
||||||
self,
|
|
||||||
simulation_id: str,
|
|
||||||
project_id: str,
|
|
||||||
graph_id: str,
|
|
||||||
simulation_requirement: str,
|
|
||||||
entities: List[EntityNode],
|
|
||||||
llm_result: Dict[str, Any],
|
|
||||||
enable_twitter: bool,
|
|
||||||
enable_reddit: bool
|
|
||||||
) -> SimulationParameters:
|
|
||||||
"""根据LLM结果构建SimulationParameters对象"""
|
|
||||||
|
|
||||||
# 时间配置
|
|
||||||
time_cfg = llm_result.get("time_config", {})
|
|
||||||
time_config = TimeSimulationConfig(
|
|
||||||
total_simulation_hours=time_cfg.get("total_simulation_hours", 72),
|
|
||||||
minutes_per_round=time_cfg.get("minutes_per_round", 30),
|
|
||||||
agents_per_hour_min=time_cfg.get("agents_per_hour_min", 5),
|
|
||||||
agents_per_hour_max=time_cfg.get("agents_per_hour_max", 20),
|
|
||||||
peak_hours=time_cfg.get("peak_hours", [9, 10, 11, 14, 15, 20, 21, 22]),
|
|
||||||
off_peak_hours=time_cfg.get("off_peak_hours", [0, 1, 2, 3, 4, 5]),
|
|
||||||
peak_activity_multiplier=time_cfg.get("peak_activity_multiplier", 1.5),
|
|
||||||
off_peak_activity_multiplier=time_cfg.get("off_peak_activity_multiplier", 0.3)
|
|
||||||
)
|
|
||||||
|
|
||||||
# Agent配置
|
|
||||||
agent_configs = []
|
|
||||||
llm_agent_configs = {cfg["agent_id"]: cfg for cfg in llm_result.get("agent_configs", [])}
|
|
||||||
|
|
||||||
|
# 构建AgentActivityConfig对象
|
||||||
|
configs = []
|
||||||
for i, entity in enumerate(entities):
|
for i, entity in enumerate(entities):
|
||||||
cfg = llm_agent_configs.get(i, {})
|
agent_id = start_idx + i
|
||||||
|
cfg = llm_configs.get(agent_id, {})
|
||||||
|
|
||||||
agent_config = AgentActivityConfig(
|
# 如果LLM没有生成,使用规则生成
|
||||||
agent_id=i,
|
if not cfg:
|
||||||
|
cfg = self._generate_agent_config_by_rule(entity)
|
||||||
|
|
||||||
|
config = AgentActivityConfig(
|
||||||
|
agent_id=agent_id,
|
||||||
entity_uuid=entity.uuid,
|
entity_uuid=entity.uuid,
|
||||||
entity_name=entity.name,
|
entity_name=entity.name,
|
||||||
entity_type=entity.get_entity_type() or "Unknown",
|
entity_type=entity.get_entity_type() or "Unknown",
|
||||||
activity_level=cfg.get("activity_level", 0.5),
|
activity_level=cfg.get("activity_level", 0.5),
|
||||||
posts_per_hour=cfg.get("posts_per_hour", 0.5),
|
posts_per_hour=cfg.get("posts_per_hour", 0.5),
|
||||||
comments_per_hour=cfg.get("comments_per_hour", 1.0),
|
comments_per_hour=cfg.get("comments_per_hour", 1.0),
|
||||||
active_hours=cfg.get("active_hours", list(range(8, 23))),
|
active_hours=cfg.get("active_hours", list(range(9, 23))),
|
||||||
response_delay_min=cfg.get("response_delay_min", 5),
|
response_delay_min=cfg.get("response_delay_min", 5),
|
||||||
response_delay_max=cfg.get("response_delay_max", 60),
|
response_delay_max=cfg.get("response_delay_max", 60),
|
||||||
sentiment_bias=cfg.get("sentiment_bias", 0.0),
|
sentiment_bias=cfg.get("sentiment_bias", 0.0),
|
||||||
stance=cfg.get("stance", "neutral"),
|
stance=cfg.get("stance", "neutral"),
|
||||||
influence_weight=cfg.get("influence_weight", 1.0)
|
influence_weight=cfg.get("influence_weight", 1.0)
|
||||||
)
|
)
|
||||||
agent_configs.append(agent_config)
|
configs.append(config)
|
||||||
|
|
||||||
# 事件配置
|
return configs
|
||||||
event_cfg = llm_result.get("event_config", {})
|
|
||||||
event_config = EventConfig(
|
def _generate_agent_config_by_rule(self, entity: EntityNode) -> Dict[str, Any]:
|
||||||
initial_posts=event_cfg.get("initial_posts", []),
|
"""基于规则生成单个Agent配置(中国人作息)"""
|
||||||
scheduled_events=event_cfg.get("scheduled_events", []),
|
entity_type = (entity.get_entity_type() or "Unknown").lower()
|
||||||
hot_topics=event_cfg.get("hot_topics", []),
|
|
||||||
narrative_direction=event_cfg.get("narrative_direction", "")
|
|
||||||
)
|
|
||||||
|
|
||||||
# 平台配置
|
if entity_type in ["university", "governmentagency", "ngo"]:
|
||||||
twitter_config = None
|
# 官方机构:工作时间活动,低频率,高影响力
|
||||||
reddit_config = None
|
return {
|
||||||
|
"activity_level": 0.2,
|
||||||
platform_cfgs = llm_result.get("platform_configs", {})
|
"posts_per_hour": 0.1,
|
||||||
|
"comments_per_hour": 0.05,
|
||||||
if enable_twitter:
|
"active_hours": list(range(9, 18)), # 9:00-17:59
|
||||||
tw_cfg = platform_cfgs.get("twitter", {})
|
"response_delay_min": 60,
|
||||||
twitter_config = PlatformConfig(
|
"response_delay_max": 240,
|
||||||
platform="twitter",
|
"sentiment_bias": 0.0,
|
||||||
recency_weight=tw_cfg.get("recency_weight", 0.4),
|
"stance": "neutral",
|
||||||
popularity_weight=tw_cfg.get("popularity_weight", 0.3),
|
"influence_weight": 3.0
|
||||||
relevance_weight=tw_cfg.get("relevance_weight", 0.3),
|
}
|
||||||
viral_threshold=tw_cfg.get("viral_threshold", 10),
|
elif entity_type in ["mediaoutlet"]:
|
||||||
echo_chamber_strength=tw_cfg.get("echo_chamber_strength", 0.5)
|
# 媒体:全天活动,中等频率,高影响力
|
||||||
)
|
return {
|
||||||
|
"activity_level": 0.5,
|
||||||
if enable_reddit:
|
"posts_per_hour": 0.8,
|
||||||
rd_cfg = platform_cfgs.get("reddit", {})
|
"comments_per_hour": 0.3,
|
||||||
reddit_config = PlatformConfig(
|
"active_hours": list(range(7, 24)), # 7:00-23:59
|
||||||
platform="reddit",
|
"response_delay_min": 5,
|
||||||
recency_weight=rd_cfg.get("recency_weight", 0.3),
|
"response_delay_max": 30,
|
||||||
popularity_weight=rd_cfg.get("popularity_weight", 0.4),
|
"sentiment_bias": 0.0,
|
||||||
relevance_weight=rd_cfg.get("relevance_weight", 0.3),
|
"stance": "observer",
|
||||||
viral_threshold=rd_cfg.get("viral_threshold", 15),
|
"influence_weight": 2.5
|
||||||
echo_chamber_strength=rd_cfg.get("echo_chamber_strength", 0.6)
|
}
|
||||||
)
|
elif entity_type in ["professor", "expert", "official"]:
|
||||||
|
# 专家/教授:工作+晚间活动,中等频率
|
||||||
return SimulationParameters(
|
return {
|
||||||
simulation_id=simulation_id,
|
"activity_level": 0.4,
|
||||||
project_id=project_id,
|
"posts_per_hour": 0.3,
|
||||||
graph_id=graph_id,
|
"comments_per_hour": 0.5,
|
||||||
simulation_requirement=simulation_requirement,
|
"active_hours": list(range(8, 22)), # 8:00-21:59
|
||||||
time_config=time_config,
|
"response_delay_min": 15,
|
||||||
agent_configs=agent_configs,
|
"response_delay_max": 90,
|
||||||
event_config=event_config,
|
"sentiment_bias": 0.0,
|
||||||
twitter_config=twitter_config,
|
"stance": "neutral",
|
||||||
reddit_config=reddit_config,
|
"influence_weight": 2.0
|
||||||
llm_model=self.model_name,
|
}
|
||||||
llm_base_url=self.base_url,
|
elif entity_type in ["student"]:
|
||||||
generation_reasoning=llm_result.get("reasoning", "")
|
# 学生:晚间为主,高频率
|
||||||
)
|
return {
|
||||||
|
"activity_level": 0.8,
|
||||||
|
"posts_per_hour": 0.6,
|
||||||
|
"comments_per_hour": 1.5,
|
||||||
|
"active_hours": [8, 9, 10, 11, 12, 13, 18, 19, 20, 21, 22, 23], # 上午+晚间
|
||||||
|
"response_delay_min": 1,
|
||||||
|
"response_delay_max": 15,
|
||||||
|
"sentiment_bias": 0.0,
|
||||||
|
"stance": "neutral",
|
||||||
|
"influence_weight": 0.8
|
||||||
|
}
|
||||||
|
elif entity_type in ["alumni"]:
|
||||||
|
# 校友:晚间为主
|
||||||
|
return {
|
||||||
|
"activity_level": 0.6,
|
||||||
|
"posts_per_hour": 0.4,
|
||||||
|
"comments_per_hour": 0.8,
|
||||||
|
"active_hours": [12, 13, 19, 20, 21, 22, 23], # 午休+晚间
|
||||||
|
"response_delay_min": 5,
|
||||||
|
"response_delay_max": 30,
|
||||||
|
"sentiment_bias": 0.0,
|
||||||
|
"stance": "neutral",
|
||||||
|
"influence_weight": 1.0
|
||||||
|
}
|
||||||
|
else:
|
||||||
|
# 普通人:晚间高峰
|
||||||
|
return {
|
||||||
|
"activity_level": 0.7,
|
||||||
|
"posts_per_hour": 0.5,
|
||||||
|
"comments_per_hour": 1.2,
|
||||||
|
"active_hours": [9, 10, 11, 12, 13, 18, 19, 20, 21, 22, 23], # 白天+晚间
|
||||||
|
"response_delay_min": 2,
|
||||||
|
"response_delay_max": 20,
|
||||||
|
"sentiment_bias": 0.0,
|
||||||
|
"stance": "neutral",
|
||||||
|
"influence_weight": 1.0
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -238,14 +238,15 @@ class SimulationManager:
|
||||||
document_text: str,
|
document_text: str,
|
||||||
defined_entity_types: Optional[List[str]] = None,
|
defined_entity_types: Optional[List[str]] = None,
|
||||||
use_llm_for_profiles: bool = True,
|
use_llm_for_profiles: bool = True,
|
||||||
progress_callback: Optional[callable] = None
|
progress_callback: Optional[callable] = None,
|
||||||
|
parallel_profile_count: int = 3
|
||||||
) -> SimulationState:
|
) -> SimulationState:
|
||||||
"""
|
"""
|
||||||
准备模拟环境(全程自动化)
|
准备模拟环境(全程自动化)
|
||||||
|
|
||||||
步骤:
|
步骤:
|
||||||
1. 从Zep图谱读取并过滤实体
|
1. 从Zep图谱读取并过滤实体
|
||||||
2. 为每个实体生成OASIS Agent Profile(可选LLM增强)
|
2. 为每个实体生成OASIS Agent Profile(可选LLM增强,支持并行)
|
||||||
3. 使用LLM智能生成模拟配置参数(时间、活跃度、发言频率等)
|
3. 使用LLM智能生成模拟配置参数(时间、活跃度、发言频率等)
|
||||||
4. 保存配置文件和Profile文件
|
4. 保存配置文件和Profile文件
|
||||||
5. 复制预设脚本到模拟目录
|
5. 复制预设脚本到模拟目录
|
||||||
|
|
@ -257,6 +258,7 @@ class SimulationManager:
|
||||||
defined_entity_types: 预定义的实体类型(可选)
|
defined_entity_types: 预定义的实体类型(可选)
|
||||||
use_llm_for_profiles: 是否使用LLM生成详细人设
|
use_llm_for_profiles: 是否使用LLM生成详细人设
|
||||||
progress_callback: 进度回调函数 (stage, progress, message)
|
progress_callback: 进度回调函数 (stage, progress, message)
|
||||||
|
parallel_profile_count: 并行生成人设的数量,默认3
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
SimulationState
|
SimulationState
|
||||||
|
|
@ -314,7 +316,8 @@ class SimulationManager:
|
||||||
total=total_entities
|
total=total_entities
|
||||||
)
|
)
|
||||||
|
|
||||||
generator = OasisProfileGenerator()
|
# 传入graph_id以启用Zep检索功能,获取更丰富的上下文
|
||||||
|
generator = OasisProfileGenerator(graph_id=state.graph_id)
|
||||||
|
|
||||||
def profile_progress(current, total, msg):
|
def profile_progress(current, total, msg):
|
||||||
if progress_callback:
|
if progress_callback:
|
||||||
|
|
@ -330,7 +333,9 @@ class SimulationManager:
|
||||||
profiles = generator.generate_profiles_from_entities(
|
profiles = generator.generate_profiles_from_entities(
|
||||||
entities=filtered.entities,
|
entities=filtered.entities,
|
||||||
use_llm=use_llm_for_profiles,
|
use_llm=use_llm_for_profiles,
|
||||||
progress_callback=profile_progress
|
progress_callback=profile_progress,
|
||||||
|
graph_id=state.graph_id, # 传入graph_id用于Zep检索
|
||||||
|
parallel_count=parallel_profile_count # 并行生成数量
|
||||||
)
|
)
|
||||||
|
|
||||||
state.profiles_count = len(profiles)
|
state.profiles_count = len(profiles)
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue