vagor.one — Just a Curious Mind

Note: I had Claude replay back our entire conversation as I struggled with the Neural Composer Ingestion, the models I was using, and the fact that I had to constantly reboot Obsidian to get the Lightrag server running again - invoking it through the plugin in Obsidian didn't garner the results I was hoping.

The promise is clean: my notes, my models, my machine, no cloud. Ask a question, get an answer that actually knows what you have been working on for the last three years. No subscription. No data leaving the building.

Getting there took two days, 2 different AI's, one laptop crash, a wall of Go panic traces, and a single line in a configuration file that nobody — not the plugin docs, not the forum posts, not the AI assistants I used to debug the AI assistant — thought to mention.

This is that story.

What This Is Actually For

Before the how-to, the why — because there are two different tools in play here and conflating them costs you time.

Neural Composer is an Obsidian plugin that builds a knowledge graph from your vault. It reads your notes, extracts entities and relationships using a local LLM, and stores everything in a graph database on your machine. When you ask it something, it searches that graph for relevant context before answering. This is RAG — retrieval-augmented generation. It does not just search for keywords. It understands that the person mentioned in your Monday note is the same person in the contract three folders away, and that both are connected to the project you have been tracking for six months.

Obsidian's built-in wiki (links, graph view, backlinks) is a different thing entirely. It is a navigation and connection tool — it shows you how your notes relate based on the links you created manually. It does not read the content. It does not understand context. It is a map you draw yourself.

Both have their place. The wiki is instant, zero-overhead, and excellent for intentional linking. Neural Composer is slower to set up, costs compute, and gives you something the wiki cannot: the ability to ask questions your notes already know the answer to, even if you never explicitly connected them.

The setup described here runs Neural Composer locally. No API key. No OpenAI bill. No notes leaving your machine.

The Stack

What is running on the machine:

Ollama — serves local LLM models via an API on port 11434

LightRAG — the graph-based RAG engine, accessed via lightrag-server on port 9621

Neural Composer — the Obsidian plugin that manages LightRAG and provides the chat interface

Models in use: qwen3:30b-a3b for chat queries, qwen2.5:14b for document extraction, nomic-embed-text for embeddings

The architecture is: Obsidian talks to Neural Composer, which talks to LightRAG, which talks to Ollama, which runs the models. Four layers. Each one can fail independently and produce an error that looks like it came from a different layer entirely.

The Setup

Install Ollama from ollama.com. Pull the models you need:

ollama pull qwen3:30b-a3b
ollama pull qwen2.5:14b
ollama pull nomic-embed-text

These commands download the model files to your machine. qwen3:30b-a3b is 18.5GB. qwen2.5:14b is 9GB. Plan accordingly. nomic-embed-text is 274MB — the one that turns your text into vectors for semantic search.

Install the Neural Composer plugin in Obsidian via the Community Plugins browser. Point it at your LightRAG binary path — on a Mac with Homebrew, that is /opt/homebrew/bin/lightrag-server. Configure the working directory where it will store the graph data. Set your Ollama models. Save. Click "Start Server."

That is the theory.

What Actually Happened

Neural Composer showed a green heartbeat. The server appeared to be running. Every attempt to use Vault Chat returned a connection error.

The diagnostic path started with the obvious:

curl http://localhost:9621/health

This command asks the LightRAG server if it is alive. It returned a healthy JSON response with full configuration details. So LightRAG was fine.

curl http://localhost:11434/api/tags

This asks Ollama for a list of installed models. It returned nothing. Connection refused. Port 11434 had nothing listening on it.

Ollama was not running — despite the menu bar icon sitting there looking perfectly content.

ps aux | grep -i ollama

ps aux lists every running process on the system. grep -i ollama filters for anything with "ollama" in the name, case-insensitive. The Ollama GUI process was there. The actual server subprocess — the thing that binds to port 11434 and serves model requests — was not.

lsof -i :11434

lsof -i lists open files associated with a network port. This returned nothing. Confirmed: nothing was listening on 11434. The icon was a lie.

What Grok Did — And What It Missed

Before the terminal, there was Grok.

The connection error was already there when Grok got involved. What followed was a methodical tour of everything that was not the problem. Restart the server. Wait for the green heartbeat. Restart Obsidian. Check if the folder icon had turned from orange to green. Toggle between Local, Hybrid, and Global query modes. Try a short test message. Restart again. Check if the ingestion had finished. Re-ingest the folder. Exclude the one file that kept throwing a 409. Re-ingest again.

None of it touched the actual fault — because none of it left the plugin's UI layer.

To be fair to Grok: it started down the right path. It had the correct instinct to test the port directly:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "qwen3:30b", "messages": [{"role": "user", "content": "Hello"}]}'

Connection refused. Right result. The next move should have been: if the port is not listening, find out why the process is not binding to it. Read the logs. Check what the server is actually doing when it starts.

Instead: restart the server. Try again. Restart Obsidian. Try again.

The same fix, repeated with more confidence, is not a different fix. Once a restart fails twice, a third restart is not a strategy — it is running out the clock.

The difference in approach that eventually resolved this was simple: treat it as a Unix service-debugging problem, not an Obsidian plugin problem. Every step either eliminated a possibility or produced a concrete new fact. curl to test the port. ps aux to check the process. lsof to check what was actually listening. The crash log to find the actual error. Each command a question with a specific answer, not a lever to pull and hope.

The fault was never in Neural Composer. It was never in LightRAG. It was never in the plugin configuration, the folder colours, the ingestion queue, or the query mode. It was a Go panic in Ollama's CORS validator, caused by an environment variable set in a LaunchAgent plist, triggered on every single launch attempt, logged clearly in a file nobody thought to check.

Grok is a capable tool. It is also a tool that stayed inside the application layer when the problem was two layers below it. That is worth knowing before you spend two days in the same place.

The Crash Log

cat ~/.ollama/logs/server.log

This reads Ollama's server log — the internal record of what the server process actually tried to do before dying. What came back was the same panic, repeated every second, on a loop:

panic: bad origin: origins must contain '*' or include 
http://,https://,chrome-extension://,safari-extension://,
moz-extension://,ms-browser-extension://

Ollama was starting, hitting this validation check, panicking, dying, being restarted by the wrapper, hitting it again, dying again. Indefinitely. The menu bar icon reflected the wrapper process — still alive. The server process underneath — dead on arrival, every time.

The culprit was an environment variable:

launchctl getenv OLLAMA_ORIGINS

launchctl getenv reads environment variables from macOS's launch daemon — the system that manages background processes and GUI applications. The value returned:

app://obsidian.md

Ollama's CORS validation accepts origins starting with http://, https://, or specific browser extension schemes. app:// is the scheme Obsidian uses internally as an Electron application. It is not on Ollama's accepted list. Ollama treats any unrecognised scheme as a fatal configuration error and refuses to start.

Someone — almost certainly a Neural Composer setup guide, or an earlier attempt at fixing a different problem — had added this value to the system environment via a LaunchAgent plist file. Two of them, actually:

find ~/Library/LaunchAgents -iname "*ollama*"

find searches a directory tree for files matching a pattern. This returned:

/Users/[user]/Library/LaunchAgents/ollama-obsidian.plist
/Users/[user]/Library/LaunchAgents/ollama.plist

ollama-obsidian.plist set OLLAMA_ORIGINS=app://obsidian.md via launchctl setenv at every login. ollama.plist baked the same value directly into the environment of a separate ollama serve process it launched independently — meaning there were potentially two Ollama instances competing for port 11434 on every boot, on top of the Ollama.app GUI.

The fix was to unload and delete both files, then set the correct value immediately:

launchctl unload ~/Library/LaunchAgents/ollama-obsidian.plist
launchctl unload ~/Library/LaunchAgents/ollama.plist
rm -f ~/Library/LaunchAgents/ollama-obsidian.plist
rm -f ~/Library/LaunchAgents/ollama.plist
launchctl setenv OLLAMA_ORIGINS "*"

launchctl unload stops a LaunchAgent and removes it from the active session. rm -f deletes the file permanently. launchctl setenv sets an environment variable for the current session and all GUI applications launched afterward.

To make it survive reboots — since launchctl setenv alone does not persist — a new LaunchAgent with the correct value:

cat > ~/Library/LaunchAgents/com.user.ollama-origins.plist << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" 
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.user.ollama-origins</string>
    <key>ProgramArguments</key>
    <array>
        <string>/bin/launchctl</string>
        <string>setenv</string>
        <string>OLLAMA_ORIGINS</string>
        <string>*</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
</dict>
</plist>
EOF
launchctl load ~/Library/LaunchAgents/com.user.ollama-origins.plist

Restart Ollama. Port 11434 answered. Neural Chat connected. Two days of connection errors resolved by a single environment variable that no guide mentioned.

The .env File — What Actually Needs to Be There

Neural Composer manages LightRAG through a .env file editable from within the plugin settings. The defaults get you running. These additions get you running well:

#####################################
### USER CUSTOM CONFIGURATION     ###
### (Overrides defaults above)    ###
#####################################
AUTH_DISABLED=true
TOKEN_SECRET=localonly
TIMEOUT=1800
LLM_TIMEOUT=600
EMBEDDING_TIMEOUT=120

# Role-specific model configuration
EXTRACT_LLM_BINDING=ollama
EXTRACT_LLM_MODEL=qwen2.5:14b
EXTRACT_LLM_BINDING_HOST=http://localhost:11434

KEYWORD_LLM_BINDING=ollama
KEYWORD_LLM_MODEL=qwen2.5:14b
KEYWORD_LLM_BINDING_HOST=http://localhost:11434

QUERY_LLM_BINDING=ollama
QUERY_LLM_MODEL=qwen3:30b-a3b
QUERY_LLM_BINDING_HOST=http://localhost:11434

The role split deserves explanation.

TIMEOUT=1800 is the global operation timeout — 30 minutes. The default is 15. For large documents this matters.

LLM_TIMEOUT=600 is the per-chunk LLM timeout — 10 minutes per chunk. The default is 3 minutes. A large research document or a lengthy meeting log can exceed 3 minutes per chunk on a 30B model. It fails silently and marks the document as failed. 10 minutes gives the model room to work.

The role split between qwen2.5:14b and qwen3:30b-a3b is the configuration that changed everything about ingestion performance.

14B vs 30B — What You Actually Lose

The 30B model is doing two jobs: ingesting documents (extracting entities, relationships, keywords) and answering queries (synthesising information across the graph to respond to your questions).

These are not the same job.

Ingestion is structured extraction. Given a chunk of text, identify the people, organisations, concepts, dates, and relationships between them. Output them in a specific format. This is well within what a 14B model handles competently — the task is constrained, the format is defined, the LLM is following a template rather than reasoning freely.

Query is synthesis. Given a question and a set of retrieved graph nodes, reason across them, find connections, produce a coherent response that integrates context from multiple sources. This benefits meaningfully from a larger model.

What you lose with 14B on extraction: Subtle entity disambiguation — two people with similar roles in different projects may get conflated. Nuanced relationships in complex legal or technical language may be missed. Long-range context within very large chunks may be partially dropped.

What you keep: Everything else. Factual extraction from meeting notes, project documents, tenders, and research is accurate. Entity recognition is reliable. The graph structure is sound.

The practical result: ingestion that previously ran at walking pace — with the machine hot enough to be uncomfortable — now runs at a reasonable clip. Documents that were timing out at 3 minutes per chunk on the 30B now process cleanly. The queries, which still use the 30B, are unchanged in quality.

The machine runs cooler. The fan is quieter. Overnight ingestion actually finishes overnight.

The Ingestion Problems Nobody Warns You About

Embedded Base64 images in markdown files. Copy a table from a web page or a Word document into Obsidian, and you may be copying an invisible image alongside it. The image gets embedded as a Base64-encoded data URI directly in the markdown file — invisible in Obsidian's rendered view, but thousands of characters of raw text in the actual file. A daily note that looks like 200 words can contain 120,000 characters of encoded PNG data. LightRAG chunks at roughly 1,200 characters per chunk, so that note becomes 100 chunks instead of 2. It either times out or produces a graph full of nonsense extracted from image binary.

Check file sizes in Finder if ingestion counts look wrong. A daily note should be a few kilobytes. If it is several megabytes, open it in a plain text editor — not Obsidian — and look for a line starting with ![](data:image/png;base64,. Delete it. The note shrinks to its real size. Processing time drops from hours to seconds.

Scanned PDFs. Two PDF files in the vault consistently failed with "file content contains only whitespace." Both were scanned documents — images of pages, not text. LightRAG cannot read image-based PDFs. OCR them first using something like Adobe Acrobat or ocrmypdf before ingesting, or accept that they will never process and delete their records from the graph.

The 409 error on re-ingestion. Drop a new file into your watched folder when LightRAG already has a pending or failed record for it — from a previous attempt — and you get a 409 conflict error. LightRAG will not overwrite an existing record without explicit deletion first. The fix:

# Find the document ID
curl -s http://localhost:9621/documents | python3 -m json.tool | grep -B5 "filename"

# Delete the stuck record
curl -X DELETE "http://localhost:9621/documents/delete_document" \
  -H "Content-Type: application/json" \
  -d '{"doc_ids": ["doc-XXXXXXXX"]}'

Then drop the file in again.

Managing the Pipeline

LightRAG exposes a REST API at http://localhost:9621. A few endpoints worth knowing:

# Check overall health and configuration
curl http://localhost:9621/health

# Get document status counts
curl -s http://localhost:9621/documents | python3 -m json.tool | grep '"status"' | sort | uniq -c

# Reprocess all failed documents
curl -X POST http://localhost:9621/documents/reprocess_failed

# Cancel the current pipeline run
curl -X POST http://localhost:9621/documents/cancel_pipeline

# Delete specific documents by ID
curl -X DELETE "http://localhost:9621/documents/delete_document" \
  -H "Content-Type: application/json" \
  -d '{"doc_ids": ["doc-xxx", "doc-yyy"]}'

The WebUI at http://localhost:9621/webui shows document status visually with filtering by state. Useful for seeing what is stuck and what has completed without parsing JSON in a terminal.

The Overnight Strategy

602 documents. A mix of daily notes, research files, strategy documents, meeting logs, book chapters, and tenders. Initial ingestion took multiple sessions across two days — partly because of the Ollama crash loop eating the first day entirely, partly because large documents need real time.

The practical approach: run ingestion overnight on a machine that can stay awake and plugged in. Prevent macOS from sleeping during the run:

caffeinate -i &

caffeinate -i prevents the system from sleeping due to inactivity. The & runs it as a background process so the terminal stays usable. Kill it when done with kill %1.

A shared desktop machine is worth considering for initial ingestion if you have one — the graph data lives in ~/neural-data/ and can be synced to other devices via whatever sync method you use for your vault. Run the heavy lifting on the machine that can handle it overnight. Query from whatever device you are on.

Vault Chat vs Regular Chat

Neural Composer's chat interface has two modes that are easy to miss.

Regular chat (Enter) is scoped to whatever files you have added as context — typically the current note. It does not search the graph. It is fast and useful for working within a single document.

Vault Chat (Cmd+Shift+Enter) triggers a full graph search before answering. It finds relevant nodes across your entire vault, updates any embeddings for recently modified files, and uses that context to answer. This is the mode that makes the whole thing worth setting up.

The distinction is intentional — running a graph search on every casual message would be expensive and slow. Save Vault Chat for questions that benefit from cross-document context.

Alternatively, type @Vault anywhere in your message to trigger vault search inline without switching modes.

One Vault, Multiple Devices, One Machine Doing the Work

The vault is a single Obsidian vault synced across every device via Obsidian Sync — MacBook Pro, iPad, work laptop. Everything in one place. The notes are always current everywhere.

Neural Composer only runs on the MacBook Pro M4 Max with 48GB of unified memory. This is not a choice — it is a hardware reality. The work laptop is not fast enough to run a 30B model without becoming a space heater that also does not finish. The iPad has no local LLM capability at all. The desktop is a Gen 4 Intel with an Nvidia 3070 Ti and 96GB of RAM — capable on paper, but the M4 Max's unified memory architecture handles large model inference in a way that discrete GPU setups do not match for this workload. The MacBook runs the graph. Everything else benefits from it.

The workflow splits cleanly along those lines. Deep research — cross-document queries, pattern finding across tenders, surfacing connections between projects — happens on the MacBook where Neural Composer is running. Daily notes, quick reference, reviewing the vault — that happens on whatever device is at hand, because the vault is always there via sync.

This creates an interesting constraint: the graph intelligence is only queryable when the MacBook is running and Neural Composer is active. On the iPad, you have your notes. You do not have the graph.

That is where the next phase comes in.

The Karpathy LLM Wiki — named after Andrej Karpathy's approach to building a personal knowledge system — is intended to sit on top of what Neural Composer has already built. The goal is a lighter-weight interface that surfaces the connections and linked ideas from the graph in a way that travels with the vault across every device, without requiring a local LLM to be running on that device. You open your notes on the iPad, and the connections Neural Composer found are already there — surfaced, navigable, useful — without the iPad needing to run anything.

This matters particularly for someone who does not link manually. Obsidian's built-in graph view and backlinks are genuinely useful — if you build links as you write. Creating links feels like overhead when you are in the middle of capturing something. The result is a vault where the connections exist in the content but not in the link structure. Neural Composer finds those connections during ingestion. The Wiki makes them accessible without being tied to the machine that runs the models.

Whether the Karpathy Wiki reads LightRAG's graph data directly or builds its own index from the vault's markdown files — which Neural Composer has already processed — is the question the next setup will answer. That is a separate article. What is confirmed: the architecture is intentional, the ingestion is done, and the graph exists. The cross-device layer comes next.

The setup documentation for this stack is scattered, inconsistent across versions, and assumes a level of comfort with Unix process management and macOS internals that most people documenting their Obsidian setups do not have. The gap between "install the plugin and configure it" and "understand why it is not working" is wider than it should be.

What works: a fully local system that knows your vault, runs on your hardware, costs nothing per query, and improves as you add more notes. The graph-based approach finds connections that keyword search and even basic vector search miss — because it understands that two things are related, not just that they share words.

The app://obsidian.md origin string, sitting in a LaunchAgent plist that no guide told me to create and no guide told me to check, is going to break this setup for other people. It probably already has.

Now it is documented.

vagor.one — Shaped by water. Built by hand. Powered by curiosity.

Two Days, One Typo, and a Local Brain