MiroFish/txt2graph
666ghj 9657061b26 Add initial implementation of txt2graph tool for knowledge graph generation
- Created a new Streamlit application for visualizing knowledge graphs.
- Implemented text extraction from PDF, Markdown, and TXT files.
- Developed graph building logic using Zep Cloud API.
- Added support for custom entity types and relationships.
- Included interactive HTML visualization for generated graphs.
- Updated .gitignore to include new directories and files.
- Added example environment configuration file (.env.example) for API key setup.
- Created README.md with installation and usage instructions.
- Introduced various utility scripts and styles for enhanced functionality.
2025-11-28 14:07:42 +08:00
..
__pycache__ Add initial implementation of txt2graph tool for knowledge graph generation 2025-11-28 14:07:42 +08:00
lib Add initial implementation of txt2graph tool for knowledge graph generation 2025-11-28 14:07:42 +08:00
.env.example Add initial implementation of txt2graph tool for knowledge graph generation 2025-11-28 14:07:42 +08:00
app.py Add initial implementation of txt2graph tool for knowledge graph generation 2025-11-28 14:07:42 +08:00
graph_builder.py Add initial implementation of txt2graph tool for knowledge graph generation 2025-11-28 14:07:42 +08:00
graph_visualization.html Add initial implementation of txt2graph tool for knowledge graph generation 2025-11-28 14:07:42 +08:00
ontology.py Add initial implementation of txt2graph tool for knowledge graph generation 2025-11-28 14:07:42 +08:00
README.md Add initial implementation of txt2graph tool for knowledge graph generation 2025-11-28 14:07:42 +08:00
render_graph.py Add initial implementation of txt2graph tool for knowledge graph generation 2025-11-28 14:07:42 +08:00
requirements.txt Add initial implementation of txt2graph tool for knowledge graph generation 2025-11-28 14:07:42 +08:00
text_extractor.py Add initial implementation of txt2graph tool for knowledge graph generation 2025-11-28 14:07:42 +08:00
武汉大学品牌声誉深度分析报告.pdf Add initial implementation of txt2graph tool for knowledge graph generation 2025-11-28 14:07:42 +08:00

txt2graph

将文本文件PDF/Markdown/TXT转换为知识图谱的工具。

功能特点

  • 支持多种文件格式PDF、Markdown、TXT
  • 基于 Zep Cloud 的知识图谱构建
  • 自动提取真实存在的实体(人物、公司、组织、地点、产品、事件、媒体)
  • 交互式图谱可视化界面
  • 实体和关系的详细展示

实体类型

本工具只提取现实生活中真实存在的、可以有行动的实体:

类型 说明 示例
Person 真实的人物 马化腾、Elon Musk
Company 注册的公司 腾讯、Apple Inc.
Organization 组织机构 武汉大学、联合国
Location 地理位置 北京、硅谷
Product 具体产品/服务 iPhone、微信
Event 真实事件 2024年巴黎奥运会
Media 媒体机构 人民日报、CNN

安装

1. 激活conda环境

conda activate MiroFish

2. 安装依赖

cd txt2graph
pip install -r requirements.txt

3. 配置环境变量

复制 .env.example.env 并填入你的 Zep API Key

cp .env.example .env
# 编辑 .env 文件,填入 ZEP_API_KEY

获取 API Key: https://app.getzep.com

使用方法

方式1: Web界面推荐

启动 Streamlit 应用:

streamlit run app.py

然后在浏览器中打开显示的URL通常是 http://localhost:8501

方式2: 命令行

from text_extractor import extract_text
from graph_builder import build_graph_from_text

# 从文件提取文本
text = extract_text("your_document.pdf")

# 构建知识图谱
graph_data = build_graph_from_text(
    text=text,
    graph_name="我的知识图谱",
    progress_callback=print
)

# 查看结果
print(f"节点数: {len(graph_data.nodes)}")
print(f"边数: {len(graph_data.edges)}")

项目结构

txt2graph/
├── app.py              # Streamlit Web应用
├── text_extractor.py   # 文本提取模块
├── graph_builder.py    # 图谱构建模块
├── ontology.py         # 实体类型定义
├── requirements.txt    # 依赖列表
├── .env.example        # 环境变量示例
└── README.md           # 说明文档

注意事项

  1. 处理时间:知识图谱构建可能需要几分钟,取决于文本长度
  2. API限制Zep Cloud 有API调用限制大文件建议分批处理
  3. 文本质量:输入文本的质量直接影响实体提取效果
  4. 费用Zep Cloud 可能会产生API调用费用请查看其定价

技术栈