Enhance OASIS simulation capabilities and profile generation

- Updated README.md to include detailed descriptions of new features, including Zep mixed search functionality and detailed persona generation for individual and group entities.
- Implemented a robust mechanism for checking simulation preparation status to avoid redundant profile generation.
- Added support for parallel profile generation, improving efficiency in creating OASIS Agent Profiles.
- Enhanced the simulation configuration generator to adopt a stepwise approach, ensuring better handling of complex configurations.
- Introduced error handling and retry mechanisms for LLM calls, improving the reliability of profile generation.
- Updated simulation management to support new API parameters for controlling profile generation behavior.
This commit is contained in:
666ghj 2025-12-01 19:40:07 +08:00
parent 5f159f6d88
commit af5c235695
5 changed files with 1602 additions and 408 deletions

View file

@ -1057,9 +1057,165 @@ for node in all_nodes:
| 方法 | 说明 |
|------|------|
| `generate_profile_from_entity(entity, user_id)` | 从实体生成单个Profile |
| `generate_profiles_from_entities(entities)` | 批量生成Profile |
| `save_profiles_to_json(profiles, path, platform)` | 保存到JSON文件 |
| `generate_profile_from_entity(entity, user_id)` | 从实体生成单个Profile带详细人设 |
| `generate_profiles_from_entities(entities, graph_id)` | 批量生成Profile |
| `save_profiles(profiles, path, platform)` | 保存Profile文件 |
| `_search_zep_for_entity(entity_name)` | 调用Zep检索获取额外上下文 |
### 优化特性v2.0
1. **Zep混合搜索功能**:使用多种查询策略获取丰富的实体信息
2. **区分实体类型**:个人实体 vs 群体/机构实体,使用不同的提示词
3. **详细人设生成**生成500字以上的详细人设描述
### Zep混合搜索策略
`_search_zep_for_entity()` 方法采用多种搜索策略获取丰富信息:
**查询策略:**
```python
queries = [
f"总结{entity_name}的全部活动、事件和行为",
f"{entity_name}与其他实体的关系和互动",
f"{entity_name}的背景、历史和重要信息",
f"关于{entity_name}的所有事实和描述",
]
```
**说明:** Zep没有内置的混合搜索接口需要分别搜索edges和nodes。我们使用**并行请求**同时执行两个搜索:
```python
# 并行执行edges和nodes搜索
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
edge_future = executor.submit(search_edges) # scope="edges"
node_future = executor.submit(search_nodes) # scope="nodes"
edge_result = edge_future.result(timeout=30)
node_result = node_future.result(timeout=30)
```
**搜索参数:**
| 搜索类型 | scope | limit | 说明 |
|----------|-------|-------|------|
| 边搜索 | edges | 30 | 获取事实/关系信息 |
| 节点搜索 | nodes | 20 | 获取相关实体摘要 |
**关键参数:**
- 必须传递 `graph_id` 参数否则Zep API会返回400错误
- 使用 `rrf` (Reciprocal Rank Fusion) reranker稳定可靠
- 使用线程池并行执行,提高效率
**返回数据结构:**
```python
{
"facts": [...], # 事实列表来自edges
"node_summaries": [...], # 相关节点摘要来自nodes
"context": "..." # 综合上下文文本
}
```
### LLM生成与JSON修复
为了避免LLM生成的JSON解析失败实现了以下优化
1. **不限制max_tokens**让LLM自由发挥充分利用模型的上下文能力
2. **多次重试机制**最多3次尝试每次降低temperature
3. **截断检测与修复**:检测`finish_reason='length'`自动闭合JSON
4. **完善JSON修复机制**
- `_fix_truncated_json()`: 修复被截断的JSON闭合括号和字符串
- `_try_fix_json()`: 多级修复策略
- 提取JSON部分
- 替换字符串中的换行符
- 移除控制字符
- 从损坏JSON中提取部分信息
5. **字段验证**确保必需字段存在缺失时使用entity_summary填充
**错误处理流程**
```
LLM调用 → 检查截断 → JSON解析 → 修复尝试 → 部分提取 → 规则生成
```
### 并行生成与实时输出
支持并行生成Agent人设提高生成效率
```python
profiles = generator.generate_profiles_from_entities(
entities=filtered.entities,
use_llm=True,
graph_id="mirofish_xxx",
parallel_count=5 # 并行生成数量默认5
)
```
**API参数**
```json
POST /api/simulation/prepare
{
"simulation_id": "sim_xxx",
"parallel_profile_count": 5, // 可选并行生成人设数量默认5
"force_regenerate": false // 可选强制重新生成默认false
}
```
**实时输出**
- 每生成一个人设,立即输出到控制台(完整内容不截断)
- 包含用户名、简介、详细人设、年龄、性别、MBTI等信息
- 方便实时监控生成进度和质量
### 避免重复生成
系统会自动检测已完成的准备工作,避免重复生成:
**检测条件**
1. `state.json` 存在且 `config_generated=true`
2. 必要文件存在:`reddit_profiles.json`, `twitter_profiles.csv`, `simulation_config.json`
**API响应**
```json
// 已准备完成时
{
"success": true,
"data": {
"simulation_id": "sim_xxx",
"status": "ready",
"message": "已有完成的准备工作,无需重复生成",
"already_prepared": true,
"prepare_info": {
"entities_count": 93,
"profiles_count": 93,
"entity_types": ["Student", "Professor", ...],
"existing_files": [...]
}
}
}
```
**强制重新生成**
```json
POST /api/simulation/prepare
{
"simulation_id": "sim_xxx",
"force_regenerate": true // 忽略已有准备,强制重新生成
}
```
### 实体类型分类
```python
# 个人类型实体 - 生成具体人物设定
INDIVIDUAL_ENTITY_TYPES = [
"student", "alumni", "professor", "person", "publicfigure",
"expert", "faculty", "official", "journalist", "activist"
]
# 群体/机构类型实体 - 生成官方账号设定
GROUP_ENTITY_TYPES = [
"university", "governmentagency", "organization", "ngo",
"mediaoutlet", "company", "institution", "group", "community"
]
```
### Profile数据结构
@ -1071,7 +1227,7 @@ class OasisAgentProfile:
user_name: str # 用户名
name: str # 显示名称
bio: str # 简介max 150字符
persona: str # 详细人设描述
persona: str # 详细人设描述500字以上
# Reddit字段
karma: int = 1000
@ -1094,6 +1250,37 @@ class OasisAgentProfile:
source_entity_type: Optional[str] = None
```
### 详细人设生成示例
**个人实体人设结构:**
```markdown
## 一、基本信息
- 姓名/称呼、年龄、职业/身份
- 教育背景、所在地
## 二、人物背景
- 过去的重要经历
- 与事件的关联
- 社会关系网络
## 三、性格特征
- MBTI类型及表现
- 核心性格特点
- 情绪表达方式
## 四、社交媒体行为模式
- 发帖频率和时间
- 内容偏好类型
- 语言风格特点
## 五、立场与观点
- 对核心话题的态度
- 可能被激怒/感动的内容
## 六、独特特征
- 口头禅、个人爱好等
```
### Profile生成策略
**1. LLM生成默认**
@ -1135,12 +1322,31 @@ Generate a social media user profile with:
使用LLM分析模拟需求、文档内容、图谱实体信息自动生成最佳的模拟参数配置。
**采用分步生成策略**(避免一次性生成过长内容导致失败):
1. 生成时间配置(轻量级)
2. 生成事件配置和热点话题
3. 分批生成Agent配置**每批5个**,保证生成质量)
4. 生成平台配置
| 方法 | 说明 |
|------|------|
| `generate_config(...)` | 智能生成完整模拟配置 |
| `_build_context(...)` | 构建LLM上下文最大5万字 |
| `_generate_config_with_llm(...)` | 调用LLM生成配置 |
| `_generate_default_config(...)` | 默认配置LLM失败时 |
| `generate_config(...)` | 智能生成完整模拟配置(分步) |
| `_generate_time_config(...)` | 生成时间配置 |
| `_generate_event_config(...)` | 生成事件配置 |
| `_generate_agent_configs_batch(...)` | 分批生成Agent配置 |
| `_generate_agent_config_by_rule(...)` | 规则生成LLM失败时 |
### 中国人作息时间配置
系统针对中国用户群体,采用符合北京时间的作息习惯:
| 时段 | 时间范围 | 活跃度系数 | 说明 |
|------|----------|------------|------|
| 深夜 | 0:00-5:59 | 0.05 | 几乎无人活动 |
| 早间 | 6:00-8:59 | 0.4 | 逐渐醒来 |
| 工作 | 9:00-18:59 | 0.7 | 工作时段中等活跃 |
| 高峰 | 19:00-22:59 | 1.5 | 晚间最活跃 |
| 夜间 | 23:00-23:59 | 0.5 | 活跃度下降 |
### LLM智能生成的配置内容
@ -1152,10 +1358,14 @@ class TimeSimulationConfig:
minutes_per_round: int = 30 # 每轮代表的时间(分钟)
agents_per_hour_min: int = 5 # 每小时激活Agent数量最小
agents_per_hour_max: int = 20 # 每小时激活Agent数量最大
peak_hours: List[int] # 高峰时段 [9,10,11,14,15,20,21,22]
off_peak_hours: List[int] # 低谷时段 [0,1,2,3,4,5]
peak_hours: List[int] = [19,20,21,22] # 高峰时段(晚间)
off_peak_hours: List[int] = [0,1,2,3,4,5] # 低谷时段(凌晨)
peak_activity_multiplier: float = 1.5 # 高峰活跃度乘数
off_peak_activity_multiplier: float = 0.3 # 低谷活跃度乘数
off_peak_activity_multiplier: float = 0.05 # 凌晨活跃度极低
morning_hours: List[int] = [6,7,8] # 早间时段
morning_activity_multiplier: float = 0.4
work_hours: List[int] = [9-18] # 工作时段
work_activity_multiplier: float = 0.7
```
**2. AgentActivityConfig每个Agent的活动配置**
@ -1178,14 +1388,18 @@ class AgentActivityConfig:
influence_weight: float = 1.0 # 影响力权重
```
**3. 不同实体类型的默认参数差异**
**3. 不同实体类型的默认参数差异(符合中国人作息)**
| 实体类型 | 活跃度 | 发帖频率 | 响应延迟 | 影响力 |
|----------|--------|----------|----------|--------|
| University/GovernmentAgency | 0.2 | 0.1/小时 | 60-240分钟 | 3.0 |
| MediaOutlet | 0.6 | 1.0/小时 | 5-30分钟 | 2.5 |
| PublicFigure/Expert | 0.5 | 0.3/小时 | 10-60分钟 | 2.0 |
| Student/Person | 0.7 | 0.5/小时 | 1-20分钟 | 1.0 |
| 实体类型 | 活跃度 | 发帖频率 | 活跃时段 | 响应延迟 | 影响力 |
|----------|--------|----------|----------|----------|--------|
| University/GovernmentAgency | 0.2 | 0.1/小时 | 9:00-17:59工作时间 | 60-240分钟 | 3.0 |
| MediaOutlet | 0.5 | 0.8/小时 | 7:00-23:59全天 | 5-30分钟 | 2.5 |
| Professor/Expert | 0.4 | 0.3/小时 | 8:00-21:59工作+晚间) | 15-90分钟 | 2.0 |
| Student | 0.8 | 0.6/小时 | 8-13, 18-23上午+晚间) | 1-15分钟 | 0.8 |
| Alumni | 0.6 | 0.4/小时 | 12-13, 19-23午休+晚间) | 5-30分钟 | 1.0 |
| Person普通人 | 0.7 | 0.5/小时 | 9-13, 18-23白天+晚间) | 2-20分钟 | 1.0 |
**注意**凌晨0-5点所有实体类型都几乎不活动符合中国人作息习惯
---
@ -1228,8 +1442,41 @@ uploads/simulations/sim_xxxx/
```
**重要OASIS平台的Profile格式要求不同**
- **Twitter**: 使用CSV格式字段`user_id,user_name,name,bio,friend_count,follower_count,statuses_count,created_at`
- **Reddit**: 使用JSON格式支持详细人设字段`realname,username,bio,persona,age,gender,mbti,country,profession,interested_topics`
**Twitter CSV格式**符合OASIS官方要求
```csv
user_id,name,username,user_char,description
0,张教授,professor_zhang,"完整人设描述LLM内部使用","简短简介(外部显示)"
```
- `user_id`: 从0开始的顺序ID
- `name`: 真实姓名
- `username`: 系统用户名
- `user_char`: 完整人设bio + persona注入LLM系统提示指导Agent行为
- `description`: 简短简介,显示在用户资料页面
**Reddit JSON格式**
```json
[
{
"realname": "张教授",
"username": "professor_zhang",
"bio": "简短简介",
"persona": "详细人设描述",
"age": 42,
"gender": "男",
"mbti": "INTJ",
"country": "中国",
"profession": "教授",
"interested_topics": ["高等教育", "学术诚信"]
}
]
```
**user_char vs description 区别**
| 字段 | 用途 | 可见性 |
|------|------|--------|
| user_char | LLM系统提示决定Agent如何思考和行动 | 内部使用 |
| description | 用户资料页面的简介 | 其他用户可见 |
### 配置文件示例 (simulation_config.json)

View file

@ -213,6 +213,112 @@ def create_simulation():
}), 500
def _check_simulation_prepared(simulation_id: str) -> tuple:
"""
检查模拟是否已经准备完成
检查条件
1. state.json 存在且 status "ready"
2. 必要文件存在reddit_profiles.json, twitter_profiles.csv, simulation_config.json
Args:
simulation_id: 模拟ID
Returns:
(is_prepared: bool, info: dict)
"""
import os
from ..config import Config
simulation_dir = os.path.join(Config.OASIS_SIMULATION_DATA_DIR, simulation_id)
# 检查目录是否存在
if not os.path.exists(simulation_dir):
return False, {"reason": "模拟目录不存在"}
# 必要文件列表
required_files = [
"state.json",
"simulation_config.json",
"reddit_profiles.json",
"twitter_profiles.csv",
"run_reddit_simulation.py",
"run_twitter_simulation.py",
"run_parallel_simulation.py"
]
# 检查文件是否存在
existing_files = []
missing_files = []
for f in required_files:
file_path = os.path.join(simulation_dir, f)
if os.path.exists(file_path):
existing_files.append(f)
else:
missing_files.append(f)
if missing_files:
return False, {
"reason": "缺少必要文件",
"missing_files": missing_files,
"existing_files": existing_files
}
# 检查state.json中的状态
state_file = os.path.join(simulation_dir, "state.json")
try:
import json
with open(state_file, 'r', encoding='utf-8') as f:
state_data = json.load(f)
status = state_data.get("status", "")
# 如果状态是ready或preparing已有文件认为准备完成
if status in ["ready", "preparing"] and state_data.get("config_generated"):
# 获取文件统计信息
profiles_file = os.path.join(simulation_dir, "reddit_profiles.json")
config_file = os.path.join(simulation_dir, "simulation_config.json")
profiles_count = 0
if os.path.exists(profiles_file):
with open(profiles_file, 'r', encoding='utf-8') as f:
profiles_data = json.load(f)
profiles_count = len(profiles_data) if isinstance(profiles_data, list) else 0
# 如果状态是preparing但文件已完成自动更新状态为ready
if status == "preparing":
try:
state_data["status"] = "ready"
from datetime import datetime
state_data["updated_at"] = datetime.now().isoformat()
with open(state_file, 'w', encoding='utf-8') as f:
json.dump(state_data, f, ensure_ascii=False, indent=2)
logger.info(f"自动更新模拟状态: {simulation_id} preparing -> ready")
status = "ready"
except Exception as e:
logger.warning(f"自动更新状态失败: {e}")
return True, {
"status": status,
"entities_count": state_data.get("entities_count", 0),
"profiles_count": profiles_count,
"entity_types": state_data.get("entity_types", []),
"config_generated": state_data.get("config_generated", False),
"created_at": state_data.get("created_at"),
"updated_at": state_data.get("updated_at"),
"existing_files": existing_files
}
else:
return False, {
"reason": f"状态不是ready: {status}",
"status": status,
"config_generated": state_data.get("config_generated", False)
}
except Exception as e:
return False, {"reason": f"读取状态文件失败: {str(e)}"}
@simulation_bp.route('/prepare', methods=['POST'])
def prepare_simulation():
"""
@ -221,17 +327,25 @@ def prepare_simulation():
这是一个耗时操作接口会立即返回task_id
使用 GET /api/simulation/prepare/status 查询进度
特性
- 自动检测已完成的准备工作避免重复生成
- 如果已准备完成直接返回已有结果
- 支持强制重新生成force_regenerate=true
步骤
1. 从Zep图谱读取并过滤实体
2. 为每个实体生成OASIS Agent Profile带重试机制
3. LLM智能生成模拟配置带重试机制
4. 保存配置文件和预设脚本
1. 检查是否已有完成的准备工作
2. 从Zep图谱读取并过滤实体
3. 为每个实体生成OASIS Agent Profile带重试机制
4. LLM智能生成模拟配置带重试机制
5. 保存配置文件和预设脚本
请求JSON
{
"simulation_id": "sim_xxxx", // 必填模拟ID
"entity_types": ["Student", "PublicFigure"], // 可选指定实体类型
"use_llm_for_profiles": true // 可选是否用LLM生成人设
"use_llm_for_profiles": true, // 可选是否用LLM生成人设
"parallel_profile_count": 5, // 可选并行生成人设数量默认5
"force_regenerate": false // 可选强制重新生成默认false
}
返回
@ -239,14 +353,17 @@ def prepare_simulation():
"success": true,
"data": {
"simulation_id": "sim_xxxx",
"task_id": "task_xxxx",
"status": "preparing",
"message": "准备任务已启动"
"task_id": "task_xxxx", // 新任务时返回
"status": "preparing|ready",
"message": "准备任务已启动|已有完成的准备工作",
"already_prepared": true|false // 是否已准备完成
}
}
"""
import threading
import os
from ..models.task import TaskManager, TaskStatus
from ..config import Config
try:
data = request.get_json() or {}
@ -267,6 +384,25 @@ def prepare_simulation():
"error": f"模拟不存在: {simulation_id}"
}), 404
# 检查是否强制重新生成
force_regenerate = data.get('force_regenerate', False)
# 检查是否已经准备完成(避免重复生成)
if not force_regenerate:
is_prepared, prepare_info = _check_simulation_prepared(simulation_id)
if is_prepared:
logger.info(f"模拟 {simulation_id} 已准备完成,跳过重复生成")
return jsonify({
"success": True,
"data": {
"simulation_id": simulation_id,
"status": "ready",
"message": "已有完成的准备工作,无需重复生成",
"already_prepared": True,
"prepare_info": prepare_info
}
})
# 从项目获取必要信息
project = ProjectManager.get_project(state.project_id)
if not project:
@ -288,6 +424,7 @@ def prepare_simulation():
entity_types_list = data.get('entity_types')
use_llm_for_profiles = data.get('use_llm_for_profiles', True)
parallel_profile_count = data.get('parallel_profile_count', 5)
# 创建异步任务
task_manager = TaskManager()
@ -384,7 +521,8 @@ def prepare_simulation():
document_text=document_text,
defined_entity_types=entity_types_list,
use_llm_for_profiles=use_llm_for_profiles,
progress_callback=progress_callback
progress_callback=progress_callback,
parallel_profile_count=parallel_profile_count
)
# 任务完成
@ -414,7 +552,8 @@ def prepare_simulation():
"simulation_id": simulation_id,
"task_id": task_id,
"status": "preparing",
"message": "准备任务已启动,请通过 /api/simulation/prepare/status 查询进度"
"message": "准备任务已启动,请通过 /api/simulation/prepare/status 查询进度",
"already_prepared": False
}
})
@ -438,9 +577,14 @@ def get_prepare_status():
"""
查询准备任务进度
支持两种查询方式
1. 通过task_id查询正在进行的任务进度
2. 通过simulation_id检查是否已有完成的准备工作
请求JSON
{
"task_id": "task_xxxx" // 必填prepare返回的task_id
"task_id": "task_xxxx", // 可选prepare返回的task_id
"simulation_id": "sim_xxxx" // 可选模拟ID用于检查已完成的准备
}
返回
@ -448,21 +592,11 @@ def get_prepare_status():
"success": true,
"data": {
"task_id": "task_xxxx",
"status": "processing", // pending/processing/completed/failed
"progress": 45, // 0-100 总进度
"message": "[2/4] 生成Agent人设: 35/93 - 生成 教授张三 的人设...",
"progress_detail": { // 详细进度信息
"current_stage": "generating_profiles",
"current_stage_name": "生成Agent人设",
"stage_index": 2, // 当前阶段序号
"total_stages": 4, // 总阶段数
"stage_progress": 38, // 阶段内进度 0-100
"current_item": 35, // 当前处理项目序号
"total_items": 93, // 当前阶段总项目数
"item_description": "生成 教授张三 的人设..."
},
"result": null, // 完成后返回结果
"error": null // 失败时返回错误信息
"status": "processing|completed|ready",
"progress": 45,
"message": "...",
"already_prepared": true|false, // 是否已有完成的准备
"prepare_info": {...} // 已准备完成时的详细信息
}
}
"""
@ -472,24 +606,75 @@ def get_prepare_status():
data = request.get_json() or {}
task_id = data.get('task_id')
simulation_id = data.get('simulation_id')
# 如果提供了simulation_id先检查是否已准备完成
if simulation_id:
is_prepared, prepare_info = _check_simulation_prepared(simulation_id)
if is_prepared:
return jsonify({
"success": True,
"data": {
"simulation_id": simulation_id,
"status": "ready",
"progress": 100,
"message": "已有完成的准备工作",
"already_prepared": True,
"prepare_info": prepare_info
}
})
# 如果没有task_id返回错误
if not task_id:
if simulation_id:
# 有simulation_id但未准备完成
return jsonify({
"success": True,
"data": {
"simulation_id": simulation_id,
"status": "not_started",
"progress": 0,
"message": "尚未开始准备,请调用 /api/simulation/prepare 开始",
"already_prepared": False
}
})
return jsonify({
"success": False,
"error": "请提供 task_id"
"error": "请提供 task_id 或 simulation_id"
}), 400
task_manager = TaskManager()
task = task_manager.get_task(task_id)
if not task:
# 任务不存在但如果有simulation_id检查是否已准备完成
if simulation_id:
is_prepared, prepare_info = _check_simulation_prepared(simulation_id)
if is_prepared:
return jsonify({
"success": True,
"data": {
"simulation_id": simulation_id,
"task_id": task_id,
"status": "ready",
"progress": 100,
"message": "任务已完成(准备工作已存在)",
"already_prepared": True,
"prepare_info": prepare_info
}
})
return jsonify({
"success": False,
"error": f"任务不存在: {task_id}"
}), 404
task_dict = task.to_dict()
task_dict["already_prepared"] = False
return jsonify({
"success": True,
"data": task.to_dict()
"data": task_dict
})
except Exception as e:

View file

@ -1,6 +1,11 @@
"""
OASIS Agent Profile生成器
将Zep图谱中的实体转换为OASIS模拟平台所需的Agent Profile格式
优化改进
1. 调用Zep检索功能二次丰富节点信息
2. 优化提示词生成非常详细的人设
3. 区分个人实体和抽象群体实体
"""
import json
@ -10,6 +15,7 @@ from dataclasses import dataclass, field
from datetime import datetime
from openai import OpenAI
from zep_cloud.client import Zep
from ..config import Config
from ..utils.logger import get_logger
@ -137,6 +143,11 @@ class OasisProfileGenerator:
OASIS Profile生成器
将Zep图谱中的实体转换为OASIS模拟所需的Agent Profile
优化特性
1. 调用Zep图谱检索功能获取更丰富的上下文
2. 生成非常详细的人设包括基本信息职业经历性格特征社交媒体行为等
3. 区分个人实体和抽象群体实体
"""
# MBTI类型列表
@ -153,11 +164,25 @@ class OasisProfileGenerator:
"Canada", "Australia", "Brazil", "India", "South Korea"
]
# 个人类型实体(需要生成具体人设)
INDIVIDUAL_ENTITY_TYPES = [
"student", "alumni", "professor", "person", "publicfigure",
"expert", "faculty", "official", "journalist", "activist"
]
# 群体/机构类型实体(需要生成群体代表人设)
GROUP_ENTITY_TYPES = [
"university", "governmentagency", "organization", "ngo",
"mediaoutlet", "company", "institution", "group", "community"
]
def __init__(
self,
api_key: Optional[str] = None,
base_url: Optional[str] = None,
model_name: Optional[str] = None
model_name: Optional[str] = None,
zep_api_key: Optional[str] = None,
graph_id: Optional[str] = None
):
self.api_key = api_key or Config.LLM_API_KEY
self.base_url = base_url or Config.LLM_BASE_URL
@ -171,6 +196,17 @@ class OasisProfileGenerator:
base_url=self.base_url
)
# Zep客户端用于检索丰富上下文
self.zep_api_key = zep_api_key or Config.ZEP_API_KEY
self.zep_client = None
self.graph_id = graph_id
if self.zep_api_key:
try:
self.zep_client = Zep(api_key=self.zep_api_key)
except Exception as e:
logger.warning(f"Zep客户端初始化失败: {e}")
def generate_profile_from_entity(
self,
entity: EntityNode,
@ -245,28 +281,195 @@ class OasisProfileGenerator:
suffix = random.randint(100, 999)
return f"{username}_{suffix}"
def _search_zep_for_entity(self, entity: EntityNode) -> Dict[str, Any]:
"""
使用Zep图谱混合搜索功能获取实体相关的丰富信息
Zep没有内置混合搜索接口需要分别搜索edges和nodes然后合并结果
使用并行请求同时搜索提高效率
Args:
entity: 实体节点对象
Returns:
包含facts, node_summaries, context的字典
"""
import concurrent.futures
if not self.zep_client:
return {"facts": [], "node_summaries": [], "context": ""}
entity_name = entity.name
results = {
"facts": [],
"node_summaries": [],
"context": ""
}
# 必须有graph_id才能进行搜索
if not self.graph_id:
logger.debug(f"跳过Zep检索未设置graph_id")
return results
comprehensive_query = f"关于{entity_name}的所有信息、活动、事件、关系和背景"
def search_edges():
"""搜索边(事实/关系)"""
try:
return self.zep_client.graph.search(
query=comprehensive_query,
graph_id=self.graph_id,
limit=30,
scope="edges",
reranker="rrf"
)
except Exception as e:
logger.debug(f"Zep边搜索失败: {e}")
return None
def search_nodes():
"""搜索节点(实体摘要)"""
try:
return self.zep_client.graph.search(
query=comprehensive_query,
graph_id=self.graph_id,
limit=20,
scope="nodes",
reranker="rrf"
)
except Exception as e:
logger.debug(f"Zep节点搜索失败: {e}")
return None
try:
# 并行执行edges和nodes搜索
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
edge_future = executor.submit(search_edges)
node_future = executor.submit(search_nodes)
# 获取结果
edge_result = edge_future.result(timeout=30)
node_result = node_future.result(timeout=30)
# 处理边搜索结果
all_facts = set()
if edge_result and hasattr(edge_result, 'edges') and edge_result.edges:
for edge in edge_result.edges:
if hasattr(edge, 'fact') and edge.fact:
all_facts.add(edge.fact)
results["facts"] = list(all_facts)
# 处理节点搜索结果
all_summaries = set()
if node_result and hasattr(node_result, 'nodes') and node_result.nodes:
for node in node_result.nodes:
if hasattr(node, 'summary') and node.summary:
all_summaries.add(node.summary)
if hasattr(node, 'name') and node.name and node.name != entity_name:
all_summaries.add(f"相关实体: {node.name}")
results["node_summaries"] = list(all_summaries)
# 构建综合上下文
context_parts = []
if results["facts"]:
context_parts.append("事实信息:\n" + "\n".join(f"- {f}" for f in results["facts"][:20]))
if results["node_summaries"]:
context_parts.append("相关实体:\n" + "\n".join(f"- {s}" for s in results["node_summaries"][:10]))
results["context"] = "\n\n".join(context_parts)
logger.info(f"Zep混合检索完成: {entity_name}, 获取 {len(results['facts'])} 条事实, {len(results['node_summaries'])} 个相关节点")
except concurrent.futures.TimeoutError:
logger.warning(f"Zep检索超时 ({entity_name})")
except Exception as e:
logger.warning(f"Zep检索失败 ({entity_name}): {e}")
return results
def _build_entity_context(self, entity: EntityNode) -> str:
"""构建实体的上下文信息"""
"""
构建实体的完整上下文信息
包括
1. 实体本身的边信息事实
2. 关联节点的详细信息
3. Zep混合检索到的丰富信息
"""
context_parts = []
# 添加相关边信息
# 1. 添加实体属性信息
if entity.attributes:
attrs = []
for key, value in entity.attributes.items():
if value and str(value).strip():
attrs.append(f"- {key}: {value}")
if attrs:
context_parts.append("### 实体属性\n" + "\n".join(attrs))
# 2. 添加相关边信息(事实/关系)
existing_facts = set()
if entity.related_edges:
relationships = []
for edge in entity.related_edges[:10]: # 最多取10条
if edge.get("fact"):
relationships.append(edge["fact"])
for edge in entity.related_edges: # 不限制数量
fact = edge.get("fact", "")
edge_name = edge.get("edge_name", "")
direction = edge.get("direction", "")
if fact:
relationships.append(f"- {fact}")
existing_facts.add(fact)
elif edge_name:
if direction == "outgoing":
relationships.append(f"- {entity.name} --[{edge_name}]--> (相关实体)")
else:
relationships.append(f"- (相关实体) --[{edge_name}]--> {entity.name}")
if relationships:
context_parts.append("Related facts:\n" + "\n".join(f"- {r}" for r in relationships))
context_parts.append("### 相关事实和关系\n" + "\n".join(relationships))
# 添加关联节点信息
# 3. 添加关联节点的详细信息
if entity.related_nodes:
related_names = [n["name"] for n in entity.related_nodes[:5]]
if related_names:
context_parts.append(f"Related to: {', '.join(related_names)}")
related_info = []
for node in entity.related_nodes: # 不限制数量
node_name = node.get("name", "")
node_labels = node.get("labels", [])
node_summary = node.get("summary", "")
# 过滤掉默认标签
custom_labels = [l for l in node_labels if l not in ["Entity", "Node"]]
label_str = f" ({', '.join(custom_labels)})" if custom_labels else ""
if node_summary:
related_info.append(f"- **{node_name}**{label_str}: {node_summary}")
else:
related_info.append(f"- **{node_name}**{label_str}")
if related_info:
context_parts.append("### 关联实体信息\n" + "\n".join(related_info))
# 4. 使用Zep混合检索获取更丰富的信息
zep_results = self._search_zep_for_entity(entity)
if zep_results.get("facts"):
# 去重:排除已存在的事实
new_facts = [f for f in zep_results["facts"] if f not in existing_facts]
if new_facts:
context_parts.append("### Zep检索到的事实信息\n" + "\n".join(f"- {f}" for f in new_facts[:15]))
if zep_results.get("node_summaries"):
context_parts.append("### Zep检索到的相关节点\n" + "\n".join(f"- {s}" for s in zep_results["node_summaries"][:10]))
return "\n\n".join(context_parts)
def _is_individual_entity(self, entity_type: str) -> bool:
"""判断是否是个人类型实体"""
return entity_type.lower() in self.INDIVIDUAL_ENTITY_TYPES
def _is_group_entity(self, entity_type: str) -> bool:
"""判断是否是群体/机构类型实体"""
return entity_type.lower() in self.GROUP_ENTITY_TYPES
def _generate_profile_with_llm(
self,
entity_name: str,
@ -275,64 +478,272 @@ class OasisProfileGenerator:
entity_attributes: Dict[str, Any],
context: str
) -> Dict[str, Any]:
"""使用LLM生成详细人设"""
"""
使用LLM生成非常详细的人设
prompt = f"""Based on the following entity information, generate a detailed social media user profile for simulation purposes.
根据实体类型区分
- 个人实体生成具体的人物设定
- 群体/机构实体生成代表性账号设定
"""
Entity Information:
- Name: {entity_name}
- Type: {entity_type}
- Summary: {entity_summary}
- Attributes: {json.dumps(entity_attributes, ensure_ascii=False)}
is_individual = self._is_individual_entity(entity_type)
Context:
{context}
if is_individual:
prompt = self._build_individual_persona_prompt(
entity_name, entity_type, entity_summary, entity_attributes, context
)
else:
prompt = self._build_group_persona_prompt(
entity_name, entity_type, entity_summary, entity_attributes, context
)
Generate a JSON object with the following fields:
{{
"bio": "A short bio (max 150 chars) suitable for social media",
"persona": "A detailed persona description (2-3 sentences) describing personality, interests, and behavior patterns",
"age": <integer between 18-65, or null if not applicable>,
"gender": "<male/female/other, or null if not applicable>",
"mbti": "<MBTI type like INTJ, ENFP, etc., or null>",
"country": "<country name, or null>",
"profession": "<profession/occupation, or null>",
"interested_topics": ["topic1", "topic2", ...]
}}
Important:
- The profile should be consistent with the entity type and context
- Make the persona feel realistic and suitable for social media simulation
- If the entity is an organization, institution, or non-person, adapt the profile accordingly (e.g., as an official account)
- Return ONLY the JSON object, no additional text"""
# 尝试多次生成,直到成功或达到最大重试次数
max_attempts = 3
last_error = None
for attempt in range(max_attempts):
try:
# 使用重试机制调用LLM API
from ..utils.retry import RetryableAPIClient
retry_client = RetryableAPIClient(max_retries=3, initial_delay=1.0)
def call_llm():
return self.client.chat.completions.create(
response = self.client.chat.completions.create(
model=self.model_name,
messages=[
{"role": "system", "content": "You are a profile generator for social media simulation. Generate realistic user profiles based on entity information."},
{"role": "system", "content": self._get_system_prompt(is_individual)},
{"role": "user", "content": prompt}
],
response_format={"type": "json_object"},
temperature=0.7
temperature=0.7 - (attempt * 0.1) # 每次重试降低温度
# 不设置max_tokens让LLM自由发挥
)
response = retry_client.call_with_retry(call_llm)
result = json.loads(response.choices[0].message.content)
content = response.choices[0].message.content
# 检查是否被截断finish_reason不是'stop'
finish_reason = response.choices[0].finish_reason
if finish_reason == 'length':
logger.warning(f"LLM输出被截断 (attempt {attempt+1}), 尝试修复...")
content = self._fix_truncated_json(content)
# 尝试解析JSON
try:
result = json.loads(content)
# 验证必需字段
if "bio" not in result or not result["bio"]:
result["bio"] = entity_summary[:200] if entity_summary else f"{entity_type}: {entity_name}"
if "persona" not in result or not result["persona"]:
result["persona"] = entity_summary or f"{entity_name}是一个{entity_type}"
return result
except json.JSONDecodeError as je:
logger.warning(f"JSON解析失败 (attempt {attempt+1}): {str(je)[:80]}")
# 尝试修复JSON
result = self._try_fix_json(content, entity_name, entity_type, entity_summary)
if result.get("_fixed"):
del result["_fixed"]
return result
last_error = je
except Exception as e:
logger.warning(f"LLM生成人设失败已重试: {str(e)}, 使用规则生成")
logger.warning(f"LLM调用失败 (attempt {attempt+1}): {str(e)[:80]}")
last_error = e
import time
time.sleep(1 * (attempt + 1)) # 指数退避
logger.warning(f"LLM生成人设失败{max_attempts}次尝试): {last_error}, 使用规则生成")
return self._generate_profile_rule_based(
entity_name, entity_type, entity_summary, entity_attributes
)
def _fix_truncated_json(self, content: str) -> str:
"""修复被截断的JSON输出被max_tokens限制截断"""
import re
# 如果JSON被截断尝试闭合它
content = content.strip()
# 计算未闭合的括号
open_braces = content.count('{') - content.count('}')
open_brackets = content.count('[') - content.count(']')
# 检查是否有未闭合的字符串
# 简单检查:如果最后一个引号后没有逗号或闭合括号,可能是字符串被截断
if content and content[-1] not in '",}]':
# 尝试闭合字符串
content += '"'
# 闭合括号
content += ']' * open_brackets
content += '}' * open_braces
return content
def _try_fix_json(self, content: str, entity_name: str, entity_type: str, entity_summary: str = "") -> Dict[str, Any]:
"""尝试修复损坏的JSON"""
import re
# 1. 首先尝试修复被截断的情况
content = self._fix_truncated_json(content)
# 2. 尝试提取JSON部分
json_match = re.search(r'\{[\s\S]*\}', content)
if json_match:
json_str = json_match.group()
# 3. 处理字符串中的换行符问题
# 找到所有字符串值并替换其中的换行符
def fix_string_newlines(match):
s = match.group(0)
# 替换字符串内的实际换行符为空格
s = s.replace('\n', ' ').replace('\r', ' ')
# 替换多余空格
s = re.sub(r'\s+', ' ', s)
return s
# 匹配JSON字符串值
json_str = re.sub(r'"[^"\\]*(?:\\.[^"\\]*)*"', fix_string_newlines, json_str)
# 4. 尝试解析
try:
result = json.loads(json_str)
result["_fixed"] = True
return result
except json.JSONDecodeError as e:
# 5. 如果还是失败,尝试更激进的修复
try:
# 移除所有控制字符
json_str = re.sub(r'[\x00-\x1f\x7f-\x9f]', ' ', json_str)
# 替换所有连续空白
json_str = re.sub(r'\s+', ' ', json_str)
result = json.loads(json_str)
result["_fixed"] = True
return result
except:
pass
# 6. 尝试从内容中提取部分信息
bio_match = re.search(r'"bio"\s*:\s*"([^"]*)"', content)
persona_match = re.search(r'"persona"\s*:\s*"([^"]*)', content) # 可能被截断
bio = bio_match.group(1) if bio_match else (entity_summary[:200] if entity_summary else f"{entity_type}: {entity_name}")
persona = persona_match.group(1) if persona_match else (entity_summary or f"{entity_name}是一个{entity_type}")
# 如果提取到了有意义的内容,标记为已修复
if bio_match or persona_match:
logger.info(f"从损坏的JSON中提取了部分信息")
return {
"bio": bio,
"persona": persona,
"_fixed": True
}
# 7. 完全失败,返回基础结构
logger.warning(f"JSON修复失败返回基础结构")
return {
"bio": entity_summary[:200] if entity_summary else f"{entity_type}: {entity_name}",
"persona": entity_summary or f"{entity_name}是一个{entity_type}"
}
def _get_system_prompt(self, is_individual: bool) -> str:
"""获取系统提示词"""
base_prompt = "你是社交媒体用户画像生成专家。生成详细、真实的人设用于舆论模拟,最大程度还原已有现实情况。必须返回有效的JSON格式所有字符串值不能包含未转义的换行符。使用中文。"
return base_prompt
def _build_individual_persona_prompt(
self,
entity_name: str,
entity_type: str,
entity_summary: str,
entity_attributes: Dict[str, Any],
context: str
) -> str:
"""构建个人实体的详细人设提示词"""
attrs_str = json.dumps(entity_attributes, ensure_ascii=False) if entity_attributes else ""
context_str = context[:3000] if context else "无额外上下文"
return f"""为实体生成详细的社交媒体用户人设,最大程度还原已有现实情况。
实体名称: {entity_name}
实体类型: {entity_type}
实体摘要: {entity_summary}
实体属性: {attrs_str}
上下文信息:
{context_str}
请生成JSON包含以下字段:
1. bio: 社交媒体简介200
2. persona: 详细人设描述2000字的纯文本需包含:
- 基本信息年龄职业教育背景所在地
- 人物背景重要经历与事件的关联社会关系
- 性格特征MBTI类型核心性格情绪表达方式
- 社交媒体行为发帖频率内容偏好互动风格语言特点
- 立场观点对话题的态度可能被激怒/感动的内容
- 独特特征口头禅特殊经历个人爱好
- 个人记忆人设的重要部分要介绍这个个体与事件的关联以及这个个体在事件中的已有动作与反应
3. age: 年龄数字
4. gender: 性别/
5. mbti: MBTI类型
6. country: 国家
7. profession: 职业
8. interested_topics: 感兴趣话题数组
重要:
- 所有字段值必须是字符串或数字不要使用换行符
- persona必须是一段连贯的文字描述
- 使用中文
- 内容要与实体信息保持一致"""
def _build_group_persona_prompt(
self,
entity_name: str,
entity_type: str,
entity_summary: str,
entity_attributes: Dict[str, Any],
context: str
) -> str:
"""构建群体/机构实体的详细人设提示词"""
attrs_str = json.dumps(entity_attributes, ensure_ascii=False) if entity_attributes else ""
context_str = context[:3000] if context else "无额外上下文"
return f"""为机构/群体实体生成详细的社交媒体账号设定,最大程度还原已有现实情况。
实体名称: {entity_name}
实体类型: {entity_type}
实体摘要: {entity_summary}
实体属性: {attrs_str}
上下文信息:
{context_str}
请生成JSON包含以下字段:
1. bio: 官方账号简介200专业得体
2. persona: 详细账号设定描述2000字的纯文本需包含:
- 机构基本信息正式名称机构性质成立背景主要职能
- 账号定位账号类型目标受众核心功能
- 发言风格语言特点常用表达禁忌话题
- 发布内容特点内容类型发布频率活跃时间段
- 立场态度对核心话题的官方立场面对争议的处理方式
- 特殊说明代表的群体画像运营习惯
- 机构记忆机构人设的重要部分要介绍这个机构与事件的关联以及这个机构在事件中的已有动作与反应
3. age: null机构不适用
4. gender: null机构不适用
5. mbti: 可选用于描述账号风格如ISTJ代表严谨保守
6. country: 国家
7. profession: 机构职能描述
8. interested_topics: 关注领域数组
重要:
- 所有字段值必须是字符串数字或null
- persona必须是一段连贯的文字描述不要使用换行符
- 使用中文
- 机构账号发言要符合其身份定位"""
def _generate_profile_rule_based(
self,
entity_name: str,
@ -398,29 +809,46 @@ Important:
"interested_topics": ["General", "Social Issues"],
}
def set_graph_id(self, graph_id: str):
"""设置图谱ID用于Zep检索"""
self.graph_id = graph_id
def generate_profiles_from_entities(
self,
entities: List[EntityNode],
use_llm: bool = True,
progress_callback: Optional[callable] = None
progress_callback: Optional[callable] = None,
graph_id: Optional[str] = None,
parallel_count: int = 5
) -> List[OasisAgentProfile]:
"""
批量从实体生成Agent Profile
批量从实体生成Agent Profile支持并行生成
Args:
entities: 实体列表
use_llm: 是否使用LLM生成详细人设
progress_callback: 进度回调函数 (current, total, message)
graph_id: 图谱ID用于Zep检索获取更丰富上下文
parallel_count: 并行生成数量默认5
Returns:
Agent Profile列表
"""
profiles = []
total = len(entities)
import concurrent.futures
from threading import Lock
for idx, entity in enumerate(entities):
if progress_callback:
progress_callback(idx + 1, total, f"生成 {entity.name} 的人设...")
# 设置graph_id用于Zep检索
if graph_id:
self.graph_id = graph_id
total = len(entities)
profiles = [None] * total # 预分配列表保持顺序
completed_count = [0] # 使用列表以便在闭包中修改
lock = Lock()
def generate_single_profile(idx: int, entity: EntityNode) -> tuple:
"""生成单个profile的工作函数"""
entity_type = entity.get_entity_type() or "Entity"
try:
profile = self.generate_profile_from_entity(
@ -428,23 +856,115 @@ Important:
user_id=idx,
use_llm=use_llm
)
profiles.append(profile)
# 实时输出生成的人设到控制台和日志
self._print_generated_profile(entity.name, entity_type, profile)
return idx, profile, None
except Exception as e:
logger.error(f"生成实体 {entity.name} 的人设失败: {str(e)}")
# 创建一个基础profile
profiles.append(OasisAgentProfile(
fallback_profile = OasisAgentProfile(
user_id=idx,
user_name=self._generate_username(entity.name),
name=entity.name,
bio=f"{entity.get_entity_type() or 'Entity'}: {entity.name}",
bio=f"{entity_type}: {entity.name}",
persona=entity.summary or f"A participant in social discussions.",
source_entity_uuid=entity.uuid,
source_entity_type=entity.get_entity_type(),
))
source_entity_type=entity_type,
)
return idx, fallback_profile, str(e)
logger.info(f"开始并行生成 {total} 个Agent人设并行数: {parallel_count}...")
print(f"\n{'='*60}")
print(f"开始生成Agent人设 - 共 {total} 个实体,并行数: {parallel_count}")
print(f"{'='*60}\n")
# 使用线程池并行执行
with concurrent.futures.ThreadPoolExecutor(max_workers=parallel_count) as executor:
# 提交所有任务
future_to_entity = {
executor.submit(generate_single_profile, idx, entity): (idx, entity)
for idx, entity in enumerate(entities)
}
# 收集结果
for future in concurrent.futures.as_completed(future_to_entity):
idx, entity = future_to_entity[future]
entity_type = entity.get_entity_type() or "Entity"
try:
result_idx, profile, error = future.result()
profiles[result_idx] = profile
with lock:
completed_count[0] += 1
current = completed_count[0]
if progress_callback:
progress_callback(
current,
total,
f"已完成 {current}/{total}: {entity.name}{entity_type}"
)
if error:
logger.warning(f"[{current}/{total}] {entity.name} 使用备用人设: {error}")
else:
logger.info(f"[{current}/{total}] 成功生成人设: {entity.name} ({entity_type})")
except Exception as e:
logger.error(f"处理实体 {entity.name} 时发生异常: {str(e)}")
with lock:
completed_count[0] += 1
profiles[idx] = OasisAgentProfile(
user_id=idx,
user_name=self._generate_username(entity.name),
name=entity.name,
bio=f"{entity_type}: {entity.name}",
persona=entity.summary or "A participant in social discussions.",
source_entity_uuid=entity.uuid,
source_entity_type=entity_type,
)
print(f"\n{'='*60}")
print(f"人设生成完成!共生成 {len([p for p in profiles if p])} 个Agent")
print(f"{'='*60}\n")
return profiles
def _print_generated_profile(self, entity_name: str, entity_type: str, profile: OasisAgentProfile):
"""实时输出生成的人设到控制台(完整内容,不截断)"""
separator = "-" * 70
# 构建完整输出内容(不截断)
topics_str = ', '.join(profile.interested_topics) if profile.interested_topics else ''
output_lines = [
f"\n{separator}",
f"[已生成] {entity_name} ({entity_type})",
f"{separator}",
f"用户名: {profile.user_name}",
f"",
f"【简介】",
f"{profile.bio}",
f"",
f"【详细人设】",
f"{profile.persona}",
f"",
f"【基本属性】",
f"年龄: {profile.age} | 性别: {profile.gender} | MBTI: {profile.mbti}",
f"职业: {profile.profession} | 国家: {profile.country}",
f"兴趣话题: {topics_str}",
separator
]
output = "\n".join(output_lines)
# 只输出到控制台避免重复logger不再输出完整内容
print(output)
def save_profiles(
self,
profiles: List[OasisAgentProfile],
@ -470,10 +990,18 @@ Important:
def _save_twitter_csv(self, profiles: List[OasisAgentProfile], file_path: str):
"""
保存Twitter Profile为CSV格式
保存Twitter Profile为CSV格式符合OASIS官方要求
OASIS Twitter要求的CSV字段
user_id, user_name, name, bio, friend_count, follower_count, statuses_count, created_at
- user_id: 用户ID根据CSV顺序从0开始
- name: 用户真实姓名
- username: 系统中的用户名
- user_char: 详细人设描述注入到LLM系统提示中指导Agent行为
- description: 简短的公开简介显示在用户资料页面
user_char vs description 区别
- user_char: 内部使用LLM系统提示决定Agent如何思考和行动
- description: 外部显示其他用户可见的简介
"""
import csv
@ -484,28 +1012,32 @@ Important:
with open(file_path, 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
# 写入表头
headers = ['user_id', 'user_name', 'name', 'bio', 'friend_count',
'follower_count', 'statuses_count', 'created_at']
# 写入OASIS要求的表头
headers = ['user_id', 'name', 'username', 'user_char', 'description']
writer.writerow(headers)
# 写入数据行
for profile in profiles:
# bio需要处理换行符和逗号
bio = profile.bio.replace('\n', ' ').replace('\r', ' ')
for idx, profile in enumerate(profiles):
# user_char: 完整人设bio + persona用于LLM系统提示
user_char = profile.bio
if profile.persona and profile.persona != profile.bio:
user_char = f"{profile.bio} {profile.persona}"
# 处理换行符CSV中用空格替代
user_char = user_char.replace('\n', ' ').replace('\r', ' ')
# description: 简短简介,用于外部显示
description = profile.bio.replace('\n', ' ').replace('\r', ' ')
row = [
profile.user_id,
profile.user_name,
profile.name,
bio,
profile.friend_count,
profile.follower_count,
profile.statuses_count,
profile.created_at
idx, # user_id: 从0开始的顺序ID
profile.name, # name: 真实姓名
profile.user_name, # username: 用户名
user_char, # user_char: 完整人设内部LLM使用
description # description: 简短简介(外部显示)
]
writer.writerow(row)
logger.info(f"已保存 {len(profiles)} 个Twitter Profile到 {file_path} (CSV格式)")
logger.info(f"已保存 {len(profiles)} 个Twitter Profile到 {file_path} (OASIS CSV格式)")
def _save_reddit_json(self, profiles: List[OasisAgentProfile], file_path: str):
"""

View file

@ -2,10 +2,17 @@
模拟配置智能生成器
使用LLM根据模拟需求文档内容图谱信息自动生成细致的模拟参数
实现全程自动化无需人工设置参数
采用分步生成策略避免一次性生成过长内容导致失败
1. 生成时间配置
2. 生成事件配置
3. 分批生成Agent配置
4. 生成平台配置
"""
import json
from typing import Dict, Any, List, Optional
import math
from typing import Dict, Any, List, Optional, Callable
from dataclasses import dataclass, field, asdict
from datetime import datetime
@ -17,6 +24,28 @@ from .zep_entity_reader import EntityNode, ZepEntityReader
logger = get_logger('mirofish.simulation_config')
# 中国作息时间配置(北京时间)
CHINA_TIMEZONE_CONFIG = {
# 深夜时段(几乎无人活动)
"dead_hours": [0, 1, 2, 3, 4, 5],
# 早间时段(逐渐醒来)
"morning_hours": [6, 7, 8],
# 工作时段
"work_hours": [9, 10, 11, 12, 13, 14, 15, 16, 17, 18],
# 晚间高峰(最活跃)
"peak_hours": [19, 20, 21, 22],
# 夜间时段(活跃度下降)
"night_hours": [23],
# 活跃度系数
"activity_multipliers": {
"dead": 0.05, # 凌晨几乎无人
"morning": 0.4, # 早间逐渐活跃
"work": 0.7, # 工作时段中等
"peak": 1.5, # 晚间高峰
"night": 0.5 # 深夜下降
}
}
@dataclass
class AgentActivityConfig:
@ -52,7 +81,7 @@ class AgentActivityConfig:
@dataclass
class TimeSimulationConfig:
"""时间模拟配置"""
"""时间模拟配置(基于中国人作息习惯)"""
# 模拟总时长(模拟小时数)
total_simulation_hours: int = 72 # 默认模拟72小时3天
@ -63,13 +92,21 @@ class TimeSimulationConfig:
agents_per_hour_min: int = 5
agents_per_hour_max: int = 20
# 高峰时段(活跃度提升
peak_hours: List[int] = field(default_factory=lambda: [9, 10, 11, 14, 15, 20, 21, 22])
# 高峰时段(晚间19-22点中国人最活跃的时间
peak_hours: List[int] = field(default_factory=lambda: [19, 20, 21, 22])
peak_activity_multiplier: float = 1.5
# 低谷时段(活跃度降低)
off_peak_hours: List[int] = field(default_factory=lambda: [0, 1, 2, 3, 4, 5, 6])
off_peak_activity_multiplier: float = 0.3
# 低谷时段凌晨0-5点几乎无人活动
off_peak_hours: List[int] = field(default_factory=lambda: [0, 1, 2, 3, 4, 5])
off_peak_activity_multiplier: float = 0.05 # 凌晨活跃度极低
# 早间时段
morning_hours: List[int] = field(default_factory=lambda: [6, 7, 8])
morning_activity_multiplier: float = 0.4
# 工作时段
work_hours: List[int] = field(default_factory=lambda: [9, 10, 11, 12, 13, 14, 15, 16, 17, 18])
work_activity_multiplier: float = 0.7
@dataclass
@ -137,12 +174,13 @@ class SimulationParameters:
def to_dict(self) -> Dict[str, Any]:
"""转换为字典"""
time_dict = asdict(self.time_config)
return {
"simulation_id": self.simulation_id,
"project_id": self.project_id,
"graph_id": self.graph_id,
"simulation_requirement": self.simulation_requirement,
"time_config": asdict(self.time_config),
"time_config": time_dict,
"agent_configs": [asdict(a) for a in self.agent_configs],
"event_config": asdict(self.event_config),
"twitter_config": asdict(self.twitter_config) if self.twitter_config else None,
@ -164,10 +202,17 @@ class SimulationConfigGenerator:
使用LLM分析模拟需求文档内容图谱实体信息
自动生成最佳的模拟参数配置
采用分步生成策略
1. 生成时间配置和事件配置轻量级
2. 分批生成Agent配置每批10-15
3. 生成平台配置
"""
# 上下文最大字符数
MAX_CONTEXT_LENGTH = 50000
# 每批生成的Agent数量
AGENTS_PER_BATCH = 15
def __init__(
self,
@ -197,9 +242,10 @@ class SimulationConfigGenerator:
entities: List[EntityNode],
enable_twitter: bool = True,
enable_reddit: bool = True,
progress_callback: Optional[Callable[[int, int, str], None]] = None,
) -> SimulationParameters:
"""
智能生成完整的模拟配置
智能生成完整的模拟配置分步生成
Args:
simulation_id: 模拟ID
@ -210,37 +256,107 @@ class SimulationConfigGenerator:
entities: 过滤后的实体列表
enable_twitter: 是否启用Twitter
enable_reddit: 是否启用Reddit
progress_callback: 进度回调函数(current_step, total_steps, message)
Returns:
SimulationParameters: 完整的模拟参数
"""
logger.info(f"开始智能生成模拟配置: simulation_id={simulation_id}")
logger.info(f"开始智能生成模拟配置: simulation_id={simulation_id}, 实体数={len(entities)}")
# 1. 构建上下文信息截断到50000字符
# 计算总步骤数
num_batches = math.ceil(len(entities) / self.AGENTS_PER_BATCH)
total_steps = 3 + num_batches # 时间配置 + 事件配置 + N批Agent + 平台配置
current_step = 0
def report_progress(step: int, message: str):
nonlocal current_step
current_step = step
if progress_callback:
progress_callback(step, total_steps, message)
logger.info(f"[{step}/{total_steps}] {message}")
# 1. 构建基础上下文信息
context = self._build_context(
simulation_requirement=simulation_requirement,
document_text=document_text,
entities=entities
)
# 2. 调用LLM生成配置
llm_result = self._generate_config_with_llm(
context=context,
entities=entities,
enable_twitter=enable_twitter,
enable_reddit=enable_reddit
reasoning_parts = []
# ========== 步骤1: 生成时间配置 ==========
report_progress(1, "生成时间配置...")
time_config_result = self._generate_time_config(context, len(entities))
time_config = self._parse_time_config(time_config_result)
reasoning_parts.append(f"时间配置: {time_config_result.get('reasoning', '成功')}")
# ========== 步骤2: 生成事件配置 ==========
report_progress(2, "生成事件配置和热点话题...")
event_config_result = self._generate_event_config(context, simulation_requirement)
event_config = self._parse_event_config(event_config_result)
reasoning_parts.append(f"事件配置: {event_config_result.get('reasoning', '成功')}")
# ========== 步骤3-N: 分批生成Agent配置 ==========
all_agent_configs = []
for batch_idx in range(num_batches):
start_idx = batch_idx * self.AGENTS_PER_BATCH
end_idx = min(start_idx + self.AGENTS_PER_BATCH, len(entities))
batch_entities = entities[start_idx:end_idx]
report_progress(
3 + batch_idx,
f"生成Agent配置 ({start_idx + 1}-{end_idx}/{len(entities)})..."
)
# 3. 构建SimulationParameters对象
params = self._build_parameters(
batch_configs = self._generate_agent_configs_batch(
context=context,
entities=batch_entities,
start_idx=start_idx,
simulation_requirement=simulation_requirement
)
all_agent_configs.extend(batch_configs)
reasoning_parts.append(f"Agent配置: 成功生成 {len(all_agent_configs)}")
# ========== 最后一步: 生成平台配置 ==========
report_progress(total_steps, "生成平台配置...")
twitter_config = None
reddit_config = None
if enable_twitter:
twitter_config = PlatformConfig(
platform="twitter",
recency_weight=0.4,
popularity_weight=0.3,
relevance_weight=0.3,
viral_threshold=10,
echo_chamber_strength=0.5
)
if enable_reddit:
reddit_config = PlatformConfig(
platform="reddit",
recency_weight=0.3,
popularity_weight=0.4,
relevance_weight=0.3,
viral_threshold=15,
echo_chamber_strength=0.6
)
# 构建最终参数
params = SimulationParameters(
simulation_id=simulation_id,
project_id=project_id,
graph_id=graph_id,
simulation_requirement=simulation_requirement,
entities=entities,
llm_result=llm_result,
enable_twitter=enable_twitter,
enable_reddit=enable_reddit
time_config=time_config,
agent_configs=all_agent_configs,
event_config=event_config,
twitter_config=twitter_config,
reddit_config=reddit_config,
llm_model=self.model_name,
llm_base_url=self.base_url,
generation_reasoning=" | ".join(reasoning_parts)
)
logger.info(f"模拟配置生成完成: {len(params.agent_configs)} 个Agent配置")
@ -297,122 +413,327 @@ class SimulationConfigGenerator:
return "\n".join(lines)
def _generate_config_with_llm(
def _call_llm_with_retry(self, prompt: str, system_prompt: str) -> Dict[str, Any]:
"""带重试的LLM调用包含JSON修复逻辑"""
import re
max_attempts = 3
last_error = None
for attempt in range(max_attempts):
try:
response = self.client.chat.completions.create(
model=self.model_name,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": prompt}
],
response_format={"type": "json_object"},
temperature=0.7 - (attempt * 0.1) # 每次重试降低温度
# 不设置max_tokens让LLM自由发挥
)
content = response.choices[0].message.content
finish_reason = response.choices[0].finish_reason
# 检查是否被截断
if finish_reason == 'length':
logger.warning(f"LLM输出被截断 (attempt {attempt+1})")
content = self._fix_truncated_json(content)
# 尝试解析JSON
try:
return json.loads(content)
except json.JSONDecodeError as e:
logger.warning(f"JSON解析失败 (attempt {attempt+1}): {str(e)[:80]}")
# 尝试修复JSON
fixed = self._try_fix_config_json(content)
if fixed:
return fixed
last_error = e
except Exception as e:
logger.warning(f"LLM调用失败 (attempt {attempt+1}): {str(e)[:80]}")
last_error = e
import time
time.sleep(2 * (attempt + 1))
raise last_error or Exception("LLM调用失败")
def _fix_truncated_json(self, content: str) -> str:
"""修复被截断的JSON"""
content = content.strip()
# 计算未闭合的括号
open_braces = content.count('{') - content.count('}')
open_brackets = content.count('[') - content.count(']')
# 检查是否有未闭合的字符串
if content and content[-1] not in '",}]':
content += '"'
# 闭合括号
content += ']' * open_brackets
content += '}' * open_braces
return content
def _try_fix_config_json(self, content: str) -> Optional[Dict[str, Any]]:
"""尝试修复配置JSON"""
import re
# 修复被截断的情况
content = self._fix_truncated_json(content)
# 提取JSON部分
json_match = re.search(r'\{[\s\S]*\}', content)
if json_match:
json_str = json_match.group()
# 移除字符串中的换行符
def fix_string(match):
s = match.group(0)
s = s.replace('\n', ' ').replace('\r', ' ')
s = re.sub(r'\s+', ' ', s)
return s
json_str = re.sub(r'"[^"\\]*(?:\\.[^"\\]*)*"', fix_string, json_str)
try:
return json.loads(json_str)
except:
# 尝试移除所有控制字符
json_str = re.sub(r'[\x00-\x1f\x7f-\x9f]', ' ', json_str)
json_str = re.sub(r'\s+', ' ', json_str)
try:
return json.loads(json_str)
except:
pass
return None
def _generate_time_config(self, context: str, num_entities: int) -> Dict[str, Any]:
"""生成时间配置"""
prompt = f"""基于以下模拟需求,生成时间模拟配置。
{context[:5000]}
## 任务
请生成时间配置JSON注意
- 用户群体为中国人需符合北京时间作息习惯
- 凌晨0-5点几乎无人活动活跃度系数0.05
- 早上6-8点逐渐活跃活跃度系数0.4
- 工作时间9-18点中等活跃活跃度系数0.7
- 晚间19-22点是高峰期活跃度系数1.5
- 23点后活跃度下降活跃度系数0.5
当前实体数量: {num_entities}
返回JSON格式不要markdown
{{
"total_simulation_hours": <72-168根据事件性质决定>,
"minutes_per_round": <15-60>,
"agents_per_hour_min": <每小时最少激活Agent数>,
"agents_per_hour_max": <每小时最多激活Agent数>,
"peak_hours": [19, 20, 21, 22],
"off_peak_hours": [0, 1, 2, 3, 4, 5],
"morning_hours": [6, 7, 8],
"work_hours": [9, 10, 11, 12, 13, 14, 15, 16, 17, 18],
"reasoning": "<简要说明>"
}}"""
system_prompt = "你是社交媒体模拟专家。返回纯JSON格式时间配置需符合中国人作息习惯。"
try:
return self._call_llm_with_retry(prompt, system_prompt)
except Exception as e:
logger.warning(f"时间配置LLM生成失败: {e}, 使用默认配置")
return self._get_default_time_config(num_entities)
def _get_default_time_config(self, num_entities: int) -> Dict[str, Any]:
"""获取默认时间配置(中国人作息)"""
return {
"total_simulation_hours": 72,
"minutes_per_round": 30,
"agents_per_hour_min": max(1, num_entities // 15),
"agents_per_hour_max": max(5, num_entities // 5),
"peak_hours": [19, 20, 21, 22],
"off_peak_hours": [0, 1, 2, 3, 4, 5],
"morning_hours": [6, 7, 8],
"work_hours": [9, 10, 11, 12, 13, 14, 15, 16, 17, 18],
"reasoning": "使用默认中国人作息配置"
}
def _parse_time_config(self, result: Dict[str, Any]) -> TimeSimulationConfig:
"""解析时间配置结果"""
return TimeSimulationConfig(
total_simulation_hours=result.get("total_simulation_hours", 72),
minutes_per_round=result.get("minutes_per_round", 30),
agents_per_hour_min=result.get("agents_per_hour_min", 5),
agents_per_hour_max=result.get("agents_per_hour_max", 20),
peak_hours=result.get("peak_hours", [19, 20, 21, 22]),
off_peak_hours=result.get("off_peak_hours", [0, 1, 2, 3, 4, 5]),
off_peak_activity_multiplier=0.05, # 凌晨几乎无人
morning_hours=result.get("morning_hours", [6, 7, 8]),
morning_activity_multiplier=0.4,
work_hours=result.get("work_hours", list(range(9, 19))),
work_activity_multiplier=0.7,
peak_activity_multiplier=1.5
)
def _generate_event_config(self, context: str, simulation_requirement: str) -> Dict[str, Any]:
"""生成事件配置"""
prompt = f"""基于以下模拟需求,生成事件配置。
模拟需求: {simulation_requirement}
{context[:3000]}
## 任务
请生成事件配置JSON
- 提取热点话题关键词
- 描述舆论发展方向
- 设计初始帖子内容
返回JSON格式不要markdown
{{
"hot_topics": ["关键词1", "关键词2", ...],
"narrative_direction": "<舆论发展方向描述>",
"initial_posts": [
{{"content": "帖子内容", "poster_type": "MediaOutlet"}},
...
],
"reasoning": "<简要说明>"
}}"""
system_prompt = "你是舆论分析专家。返回纯JSON格式。"
try:
return self._call_llm_with_retry(prompt, system_prompt)
except Exception as e:
logger.warning(f"事件配置LLM生成失败: {e}, 使用默认配置")
return {
"hot_topics": [],
"narrative_direction": "",
"initial_posts": [],
"reasoning": "使用默认配置"
}
def _parse_event_config(self, result: Dict[str, Any]) -> EventConfig:
"""解析事件配置结果"""
return EventConfig(
initial_posts=result.get("initial_posts", []),
scheduled_events=[],
hot_topics=result.get("hot_topics", []),
narrative_direction=result.get("narrative_direction", "")
)
def _generate_agent_configs_batch(
self,
context: str,
entities: List[EntityNode],
enable_twitter: bool,
enable_reddit: bool
) -> Dict[str, Any]:
"""调用LLM生成配置"""
start_idx: int,
simulation_requirement: str
) -> List[AgentActivityConfig]:
"""分批生成Agent配置"""
# 构建实体列表用于Agent配置
# 构建实体信息
entity_list = []
for i, e in enumerate(entities):
entity_list.append({
"agent_id": i,
"entity_uuid": e.uuid,
"agent_id": start_idx + i,
"entity_name": e.name,
"entity_type": e.get_entity_type() or "Unknown",
"summary": e.summary[:200] if e.summary else ""
"summary": e.summary[:150] if e.summary else ""
})
prompt = f"""你是一个社交媒体舆论模拟专家。请根据以下信息,生成详细的模拟参数配置。
prompt = f"""基于以下信息,为每个实体生成社交媒体活动配置。
{context}
模拟需求: {simulation_requirement}
## 实体列表(需要为每个实体生成活动配置)
## 实体列表
```json
{json.dumps(entity_list, ensure_ascii=False, indent=2)}
```
## 任务
请生成一个JSON配置包含以下部分
为每个实体生成活动配置注意
- **时间符合中国人作息**凌晨0-5点几乎不活动晚间19-22点最活跃
- **官方机构**University/GovernmentAgency活跃度低(0.1-0.3)工作时间(9-17)活动响应慢(60-240分钟)影响力高(2.5-3.0)
- **媒体**MediaOutlet活跃度中(0.4-0.6)全天活动(8-23)响应快(5-30分钟)影响力高(2.0-2.5)
- **个人**Student/Person/Alumni活跃度高(0.6-0.9)主要晚间活动(18-23)响应快(1-15分钟)影响力低(0.8-1.2)
- **公众人物/专家**活跃度中(0.4-0.6)影响力中高(1.5-2.0)
1. **time_config** - 时间模拟配置
- total_simulation_hours: 模拟总时长小时根据事件性质决定短期热点24-72小时长期舆论168-336小时
- minutes_per_round: 每轮代表的时间分钟建议15-60
- agents_per_hour_min/max: 每小时激活的Agent数量范围
- peak_hours: 高峰时段列表0-23
- off_peak_hours: 低谷时段列表
返回JSON格式不要markdown
{{
"agent_configs": [
{{
"agent_id": <必须与输入一致>,
"activity_level": <0.0-1.0>,
"posts_per_hour": <发帖频率>,
"comments_per_hour": <评论频率>,
"active_hours": [<活跃小时列表考虑中国人作息>],
"response_delay_min": <最小响应延迟分钟>,
"response_delay_max": <最大响应延迟分钟>,
"sentiment_bias": <-1.0到1.0>,
"stance": "<supportive/opposing/neutral/observer>",
"influence_weight": <影响力权重>
}},
...
]
}}"""
2. **agent_configs** - 每个Agent的活动配置必须为每个实体生成
对于每个agent_id设置
- activity_level: 活跃度(0.0-1.0)官方机构通常0.1-0.3媒体0.3-0.5个人0.5-0.9
- posts_per_hour: 每小时发帖频率官方机构0.05-0.2媒体0.5-2个人0.1-1
- comments_per_hour: 每小时评论频率
- active_hours: 活跃时间段列表官方通常工作时间个人更分散
- response_delay_min/max: 响应延迟模拟分钟官方较慢(30-180)个人较快(1-30)
- sentiment_bias: 情感倾向(-1到1)根据实体立场设置
- stance: 立场(supportive/opposing/neutral/observer)
- influence_weight: 影响力权重知名人物和媒体较高
3. **event_config** - 事件配置
- initial_posts: 初始帖子列表包含content和poster_agent_id
- hot_topics: 热点话题关键词列表
- narrative_direction: 舆论发展方向描述
4. **platform_configs** - 平台配置如果启用
- viral_threshold: 病毒传播阈值
- echo_chamber_strength: 回声室效应强度(0-1)
5. **reasoning** - 你的推理说明解释为什么这样设置参数
## 重要原则
- 官方机构UniversityGovernmentAgency发言频率低但影响力大
- 媒体MediaOutlet发言频率中等传播速度快
- 个人StudentPublicFigure发言频率高但影响力分散
- 根据模拟需求判断各实体的立场和情感倾向
- 时间配置要符合真实社交媒体的使用规律
请返回JSON格式不要包含markdown代码块标记"""
system_prompt = "你是社交媒体行为分析专家。返回纯JSON配置需符合中国人作息习惯。"
try:
# 使用重试机制调用LLM API
from ..utils.retry import RetryableAPIClient
retry_client = RetryableAPIClient(max_retries=3, initial_delay=2.0, max_delay=60.0)
def call_llm():
return self.client.chat.completions.create(
model=self.model_name,
messages=[
{
"role": "system",
"content": "你是社交媒体舆论模拟专家擅长设计真实的模拟参数。返回纯JSON格式不要markdown。"
},
{"role": "user", "content": prompt}
],
response_format={"type": "json_object"},
temperature=0.7,
max_tokens=8000
)
response = retry_client.call_with_retry(call_llm)
result = json.loads(response.choices[0].message.content)
logger.info(f"LLM配置生成成功")
return result
result = self._call_llm_with_retry(prompt, system_prompt)
llm_configs = {cfg["agent_id"]: cfg for cfg in result.get("agent_configs", [])}
except Exception as e:
logger.error(f"LLM配置生成失败已重试: {str(e)}")
# 返回默认配置
return self._generate_default_config(entities)
logger.warning(f"Agent配置批次LLM生成失败: {e}, 使用规则生成")
llm_configs = {}
def _generate_default_config(self, entities: List[EntityNode]) -> Dict[str, Any]:
"""生成默认配置LLM失败时的fallback"""
agent_configs = []
# 构建AgentActivityConfig对象
configs = []
for i, entity in enumerate(entities):
agent_id = start_idx + i
cfg = llm_configs.get(agent_id, {})
for i, e in enumerate(entities):
entity_type = (e.get_entity_type() or "Unknown").lower()
# 如果LLM没有生成使用规则生成
if not cfg:
cfg = self._generate_agent_config_by_rule(entity)
config = AgentActivityConfig(
agent_id=agent_id,
entity_uuid=entity.uuid,
entity_name=entity.name,
entity_type=entity.get_entity_type() or "Unknown",
activity_level=cfg.get("activity_level", 0.5),
posts_per_hour=cfg.get("posts_per_hour", 0.5),
comments_per_hour=cfg.get("comments_per_hour", 1.0),
active_hours=cfg.get("active_hours", list(range(9, 23))),
response_delay_min=cfg.get("response_delay_min", 5),
response_delay_max=cfg.get("response_delay_max", 60),
sentiment_bias=cfg.get("sentiment_bias", 0.0),
stance=cfg.get("stance", "neutral"),
influence_weight=cfg.get("influence_weight", 1.0)
)
configs.append(config)
return configs
def _generate_agent_config_by_rule(self, entity: EntityNode) -> Dict[str, Any]:
"""基于规则生成单个Agent配置中国人作息"""
entity_type = (entity.get_entity_type() or "Unknown").lower()
# 根据实体类型设置默认参数
if entity_type in ["university", "governmentagency", "ngo"]:
config = {
"agent_id": i,
# 官方机构:工作时间活动,低频率,高影响力
return {
"activity_level": 0.2,
"posts_per_hour": 0.1,
"comments_per_hour": 0.05,
"active_hours": list(range(9, 18)),
"active_hours": list(range(9, 18)), # 9:00-17:59
"response_delay_min": 60,
"response_delay_max": 240,
"sentiment_bias": 0.0,
@ -420,165 +741,69 @@ class SimulationConfigGenerator:
"influence_weight": 3.0
}
elif entity_type in ["mediaoutlet"]:
config = {
"agent_id": i,
"activity_level": 0.6,
"posts_per_hour": 1.0,
"comments_per_hour": 0.5,
"active_hours": list(range(6, 24)),
# 媒体:全天活动,中等频率,高影响力
return {
"activity_level": 0.5,
"posts_per_hour": 0.8,
"comments_per_hour": 0.3,
"active_hours": list(range(7, 24)), # 7:00-23:59
"response_delay_min": 5,
"response_delay_max": 30,
"sentiment_bias": 0.0,
"stance": "observer",
"influence_weight": 2.5
}
elif entity_type in ["publicfigure", "expert"]:
config = {
"agent_id": i,
"activity_level": 0.5,
elif entity_type in ["professor", "expert", "official"]:
# 专家/教授:工作+晚间活动,中等频率
return {
"activity_level": 0.4,
"posts_per_hour": 0.3,
"comments_per_hour": 0.5,
"active_hours": list(range(8, 23)),
"response_delay_min": 10,
"response_delay_max": 60,
"active_hours": list(range(8, 22)), # 8:00-21:59
"response_delay_min": 15,
"response_delay_max": 90,
"sentiment_bias": 0.0,
"stance": "neutral",
"influence_weight": 2.0
}
else: # Student, Person, etc.
config = {
"agent_id": i,
elif entity_type in ["student"]:
# 学生:晚间为主,高频率
return {
"activity_level": 0.8,
"posts_per_hour": 0.6,
"comments_per_hour": 1.5,
"active_hours": [8, 9, 10, 11, 12, 13, 18, 19, 20, 21, 22, 23], # 上午+晚间
"response_delay_min": 1,
"response_delay_max": 15,
"sentiment_bias": 0.0,
"stance": "neutral",
"influence_weight": 0.8
}
elif entity_type in ["alumni"]:
# 校友:晚间为主
return {
"activity_level": 0.6,
"posts_per_hour": 0.4,
"comments_per_hour": 0.8,
"active_hours": [12, 13, 19, 20, 21, 22, 23], # 午休+晚间
"response_delay_min": 5,
"response_delay_max": 30,
"sentiment_bias": 0.0,
"stance": "neutral",
"influence_weight": 1.0
}
else:
# 普通人:晚间高峰
return {
"activity_level": 0.7,
"posts_per_hour": 0.5,
"comments_per_hour": 1.0,
"active_hours": list(range(7, 24)),
"response_delay_min": 1,
"comments_per_hour": 1.2,
"active_hours": [9, 10, 11, 12, 13, 18, 19, 20, 21, 22, 23], # 白天+晚间
"response_delay_min": 2,
"response_delay_max": 20,
"sentiment_bias": 0.0,
"stance": "neutral",
"influence_weight": 1.0
}
agent_configs.append(config)
return {
"time_config": {
"total_simulation_hours": 72,
"minutes_per_round": 30,
"agents_per_hour_min": max(1, len(entities) // 10),
"agents_per_hour_max": max(5, len(entities) // 3),
"peak_hours": [9, 10, 11, 14, 15, 20, 21, 22],
"off_peak_hours": [0, 1, 2, 3, 4, 5]
},
"agent_configs": agent_configs,
"event_config": {
"initial_posts": [],
"hot_topics": [],
"narrative_direction": ""
},
"reasoning": "使用默认配置LLM生成失败"
}
def _build_parameters(
self,
simulation_id: str,
project_id: str,
graph_id: str,
simulation_requirement: str,
entities: List[EntityNode],
llm_result: Dict[str, Any],
enable_twitter: bool,
enable_reddit: bool
) -> SimulationParameters:
"""根据LLM结果构建SimulationParameters对象"""
# 时间配置
time_cfg = llm_result.get("time_config", {})
time_config = TimeSimulationConfig(
total_simulation_hours=time_cfg.get("total_simulation_hours", 72),
minutes_per_round=time_cfg.get("minutes_per_round", 30),
agents_per_hour_min=time_cfg.get("agents_per_hour_min", 5),
agents_per_hour_max=time_cfg.get("agents_per_hour_max", 20),
peak_hours=time_cfg.get("peak_hours", [9, 10, 11, 14, 15, 20, 21, 22]),
off_peak_hours=time_cfg.get("off_peak_hours", [0, 1, 2, 3, 4, 5]),
peak_activity_multiplier=time_cfg.get("peak_activity_multiplier", 1.5),
off_peak_activity_multiplier=time_cfg.get("off_peak_activity_multiplier", 0.3)
)
# Agent配置
agent_configs = []
llm_agent_configs = {cfg["agent_id"]: cfg for cfg in llm_result.get("agent_configs", [])}
for i, entity in enumerate(entities):
cfg = llm_agent_configs.get(i, {})
agent_config = AgentActivityConfig(
agent_id=i,
entity_uuid=entity.uuid,
entity_name=entity.name,
entity_type=entity.get_entity_type() or "Unknown",
activity_level=cfg.get("activity_level", 0.5),
posts_per_hour=cfg.get("posts_per_hour", 0.5),
comments_per_hour=cfg.get("comments_per_hour", 1.0),
active_hours=cfg.get("active_hours", list(range(8, 23))),
response_delay_min=cfg.get("response_delay_min", 5),
response_delay_max=cfg.get("response_delay_max", 60),
sentiment_bias=cfg.get("sentiment_bias", 0.0),
stance=cfg.get("stance", "neutral"),
influence_weight=cfg.get("influence_weight", 1.0)
)
agent_configs.append(agent_config)
# 事件配置
event_cfg = llm_result.get("event_config", {})
event_config = EventConfig(
initial_posts=event_cfg.get("initial_posts", []),
scheduled_events=event_cfg.get("scheduled_events", []),
hot_topics=event_cfg.get("hot_topics", []),
narrative_direction=event_cfg.get("narrative_direction", "")
)
# 平台配置
twitter_config = None
reddit_config = None
platform_cfgs = llm_result.get("platform_configs", {})
if enable_twitter:
tw_cfg = platform_cfgs.get("twitter", {})
twitter_config = PlatformConfig(
platform="twitter",
recency_weight=tw_cfg.get("recency_weight", 0.4),
popularity_weight=tw_cfg.get("popularity_weight", 0.3),
relevance_weight=tw_cfg.get("relevance_weight", 0.3),
viral_threshold=tw_cfg.get("viral_threshold", 10),
echo_chamber_strength=tw_cfg.get("echo_chamber_strength", 0.5)
)
if enable_reddit:
rd_cfg = platform_cfgs.get("reddit", {})
reddit_config = PlatformConfig(
platform="reddit",
recency_weight=rd_cfg.get("recency_weight", 0.3),
popularity_weight=rd_cfg.get("popularity_weight", 0.4),
relevance_weight=rd_cfg.get("relevance_weight", 0.3),
viral_threshold=rd_cfg.get("viral_threshold", 15),
echo_chamber_strength=rd_cfg.get("echo_chamber_strength", 0.6)
)
return SimulationParameters(
simulation_id=simulation_id,
project_id=project_id,
graph_id=graph_id,
simulation_requirement=simulation_requirement,
time_config=time_config,
agent_configs=agent_configs,
event_config=event_config,
twitter_config=twitter_config,
reddit_config=reddit_config,
llm_model=self.model_name,
llm_base_url=self.base_url,
generation_reasoning=llm_result.get("reasoning", "")
)

View file

@ -238,14 +238,15 @@ class SimulationManager:
document_text: str,
defined_entity_types: Optional[List[str]] = None,
use_llm_for_profiles: bool = True,
progress_callback: Optional[callable] = None
progress_callback: Optional[callable] = None,
parallel_profile_count: int = 3
) -> SimulationState:
"""
准备模拟环境全程自动化
步骤
1. 从Zep图谱读取并过滤实体
2. 为每个实体生成OASIS Agent Profile可选LLM增强
2. 为每个实体生成OASIS Agent Profile可选LLM增强支持并行
3. 使用LLM智能生成模拟配置参数时间活跃度发言频率等
4. 保存配置文件和Profile文件
5. 复制预设脚本到模拟目录
@ -257,6 +258,7 @@ class SimulationManager:
defined_entity_types: 预定义的实体类型可选
use_llm_for_profiles: 是否使用LLM生成详细人设
progress_callback: 进度回调函数 (stage, progress, message)
parallel_profile_count: 并行生成人设的数量默认3
Returns:
SimulationState
@ -314,7 +316,8 @@ class SimulationManager:
total=total_entities
)
generator = OasisProfileGenerator()
# 传入graph_id以启用Zep检索功能获取更丰富的上下文
generator = OasisProfileGenerator(graph_id=state.graph_id)
def profile_progress(current, total, msg):
if progress_callback:
@ -330,7 +333,9 @@ class SimulationManager:
profiles = generator.generate_profiles_from_entities(
entities=filtered.entities,
use_llm=use_llm_for_profiles,
progress_callback=profile_progress
progress_callback=profile_progress,
graph_id=state.graph_id, # 传入graph_id用于Zep检索
parallel_count=parallel_profile_count # 并行生成数量
)
state.profiles_count = len(profiles)