设计原理

本页面深入解释 Karpathy LLM Wiki 的工作方式，重点展示「来源如何被 Ingest」、「实体如何被抽取」，以及这套模式的核心哲学。

Karpathy 的核心观点

Most people's experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question. There's no accumulation.

Karpathy 认为，RAG 的致命缺陷在于「无状态」——每次提问，LLM 都要从零开始重新发现知识。真正有效的方式是让 LLM 增量式地构建并维护一个持久的 wiki：

The idea here is different. Instead of just retrieving from raw documents at query time, the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that sits between you and the raw sources.

当你投入一篇新文章时，LLM 不只是把它索引起来供以后检索。它会阅读、提取关键信息，并将其整合进现有的 wiki——更新实体页、修正概念摘要、标记新旧矛盾、强化或挑战现有的综合结论。

The wiki is a persistent, compounding artifact. The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you've read.

三层架构

LLM Wiki 由三个层次构成，职责分明：

Layer 1

Raw sources

原始来源集合。文章、论文、数据文件。这些文件是不可变的——LLM 只读取，永不修改。这是你的单一真相来源。

Layer 2

The wiki

LLM 生成的 markdown 页面。包括摘要页、实体页、概念页、比较与综合。LLM 完全拥有这一层。你负责阅读，它负责写作。

Layer 3

The schema

治理文档（如 CLAUDE.md）。它告诉 LLM wiki 的结构、约定、Ingest/Query/Lint 的工作流程。人与 LLM 共同演化这份文档。

Ingest：来源如何被「编译」进 wiki

Ingest 是 LLM Wiki 最核心的操作。Karpathy 的原话是：

You drop a new source into the raw collection and tell the LLM to process it. An example flow: the LLM reads the source, discusses key takeaways with you, writes a summary page in the wiki, updates the index, updates relevant entity and concept pages across the wiki, and appends an entry to the log. A single source might touch 10-15 wiki pages.

下面这个交互演示，以 Kenny肯尼《2026 AI 趋势观察》为例，展示了一个真实来源被投入 wiki 后的完整处理流程。

Step 1 / 5

投放原始来源

将原始文章放入 raw/ 目录

raw/260225-AI趋势观察（模型飞轮与应用爆发）-Kenny肯尼-公众号.md

来源文件已放入 raw/260225-AI趋势观察-Kenny肯尼-公众号.md

Entity 抽取：链接的来源

在 Ingest 过程中，最关键的一步之一是实体（Entity）与概念（Concept）抽取。Karpathy 描述的流程是：

When you add a new source, the LLM doesn't just index it for later retrieval. It reads it, extracts the key information, and integrates it into the existing wiki — updating entity pages, revising topic summaries, noting where new data contradicts old claims, strengthening or challenging the evolving synthesis.

实体页（如 Kenny肯尼.md）和概念页（如 数据飞轮.md）构成了 wiki 的「节点」。每一个来源页都会通过 [[双链]] 与这些节点相连。正是这些链接，让知识从「一堆文件」变成了「一张网络」。

我们用一个真实片段来演示 LLM 是如何从原文中「读」出这些节点的：

实体/概念抽取演示

原始文本（来自来源）

Kenny肯尼在 2026 年趋势观察中指出，模型进化分为智力线和多模态生成线两条路径。

背后的核心驱动力是数据飞轮：Agent 执行、用户反馈、高质量数据、模型变强，形成自增强循环。

在应用层，生产力 Agent、垂直Agent、AI 互动平台和 AI硬件都是重要的机会赛道。

抽取结果（生成的 Wiki 页面）

Kenny肯尼entity

AI 观察者、投资人、趋势分析师

数据飞轮concept

Agent 执行 → 反馈 → 数据 → 模型变强

垂直Agentconcept

针对特定行业的深度 AI 代理

AI硬件concept

为 AI Agent 提供物理入口的设备

Query & Lint：对话与保健

Query 是你与已有 wiki 对话的过程。LLM 会搜索相关页面，读取它们，并引用来源地综合答案。更重要的是：

Good answers can be filed back into the wiki as new pages. A comparison you asked for, an analysis, a connection you discovered — these are valuable and shouldn't disappear into chat history.

Lint 则是定期的健康检查。Karpathy 建议定期让 LLM 执行以下检查：

发现页面之间的矛盾
标记已被新来源推翻的陈旧论断
找出没有入链的孤立页面（orphan pages）
识别被反复提及但尚未独立成页的重要概念
查找缺失的交叉引用和数据缺口

在我们的站点中，更新日志就是这些 Ingest/Lint/Query 操作的审计记录。

index.md & log.md：两种导航

随着 wiki 增长，你需要两种不同维度的索引：

index.md

面向内容的目录。按类别（实体、概念、来源等）组织所有页面，每页附带一句话摘要和元数据。LLM 回答查询时先读 index，再定位具体页面。

log.md

面向时间的日志。以追加方式记录每一次 Ingest、Query、Lint。它提供了一条清晰的时间线，帮助你理解 wiki 的演变历史。

在我们的 Demo 中，index.md 渲染为首页目录，log.md 渲染为更新日志页面。它们不是附加的 UI 组件，而是 wiki 内容本身。

为什么这能成功？人机分工

The tedious part of maintaining a knowledge base is not the reading or the thinking — it's the bookkeeping. Updating cross-references, keeping summaries current, noting when new data contradicts old claims, maintaining consistency across dozens of pages. Humans abandon wikis because the maintenance burden grows faster than the value.

LLM 不会厌倦，不会忘记更新交叉引用，也不会因为 wiki 变大就放弃。它可以在一次操作中修改 15 个文件。这正是人类维护 wiki 失败的地方，也是 LLM 能成功的原因。

The human's job is to curate sources, direct the analysis, ask good questions, and think about what it all means. The LLM's job is everything else.

人扮演策展者（Curator），LLM 扮演记账员（Bookkeeper）。这是 Karpathy 模式最本质的洞见。

本 Demo 的实现方式

这个展示站点是基于上述理念构建的一个 reference implementation：

Astro + React Islands：静态生成所有 wiki 页面，React 仅用于搜索、主题切换、图谱和这里的交互演示。
Build-time 管道：Ingest 阶段生成的 markdown 在构建时被解析为 JSON 资产（搜索索引、图谱数据、摘要映射、日志时间线），让网站无需后端即可运行。
双链系统：自定义 remark 插件将 [[标题]] 转换为带 data-wiki-title 的链接，配合全局事件委托实现 Hover 预览。
关系图谱：基于 react-force-graph-2d 可视化页面之间的 wikilink 和 tag 关联。
中文 slug：保留中文 URL（如 /Kenny肯尼），与 Obsidian 的链接习惯保持一致。

你可以从侧边栏进入实体、概念、来源分类页，或直接阅读 src-llm-wiki-宣言查看 Karpathy 原文。