The process

Four steps from text to definition

YomiNinja's pipeline runs in under 500 milliseconds from trigger to overlay display.

A capture is triggered

Two modes: manual hotkey (press a key, get one capture) or Auto OCR (continuous monitoring that fires automatically when screen content changes). Auto OCR uses frame comparison to detect when new text has appeared — typically when a dialogue box updates — and immediately initiates a new capture without requiring any input.

The capture region is either the entire selected application window, or a user-defined OCR Template — a saved rectangle that covers just the dialogue box. Templates are faster and more accurate because they eliminate noise from other screen elements.

Screenshot is sent to the OCR engine

The captured region is passed to the selected OCR engine via YomiNinja's gRPC backend. The engine processes the image and returns two things: the recognized text (the Japanese characters), and character-level bounding box coordinates — the pixel positions of each individual character in the image.

This character-level positioning is what makes hover-to-lookup possible. YomiNinja knows not just what was recognized, but exactly where each character appears on screen.

Local engines (PaddleOCR, MangaOCR, Apple Vision) process the image entirely on your machine. Cloud engines (Google Cloud Vision, Google Lens) send the capture to an external API and receive results back — slightly slower, but often more accurate on difficult fonts.

A transparent overlay is rendered above the game

YomiNinja creates a transparent Electron window positioned above the game window in the display stack. This window has no visible chrome — it's invisible except for the text it renders.

Using the bounding box coordinates from the OCR result, YomiNinja places each character at the exact pixel position where it appears in the game. The overlay text sits precisely on top of the original game text.

Because the overlay is a real HTML document with actual text nodes (not an image), your cursor can interact with it normally. The underlying game receives mouse and keyboard input through the transparent areas. The overlay only intercepts input on the text characters themselves.

Borderless Windowed mode is required because exclusive fullscreen bypasses the operating system's display compositor entirely. Without the compositor, no transparent window can layer above a fullscreen application.

Hover a word — Yomitan fires instantly

Yomitan and 10ten Reader are Chrome extensions embedded inside YomiNinja's Electron Chromium context. They're pre-installed — no browser or Web Store required.

When you move your cursor over any character in the overlay, Yomitan's hover detection triggers on the underlying text node — exactly as it does on a Japanese webpage. It reads the full word (performing Japanese segmentation to find word boundaries), queries your imported dictionary files, and displays a popup with:

Kanji and kana reading
Part of speech and conjugation information
Definitions from JMdict and any other imported dictionaries
Pitch accent pattern (if you've imported a pitch accent dictionary)
Frequency ranking (if you've imported a frequency list)

The popup appears within one frame. No network request is made at lookup time — all dictionary data is stored locally from the files you imported during setup.

Architecture

Why the overlay uses Electron

YomiNinja's UI layer — both the app window and the game overlay — is built with Electron, a framework that packages a Chromium browser with a Node.js runtime into a desktop application.

This design choice is what makes Yomitan integration possible. Yomitan is a Chrome extension. By running Chromium inside the overlay window, YomiNinja gives Yomitan a real browser context to operate in. The extension behaves identically to how it does in Chrome — it simply runs inside a transparent, always-on-top window positioned above your game.

The alternative — browser overlay mode — renders the overlay inside a PWA (Progressive Web App) in your default browser, rather than a transparent Electron window. This can offer better extension compatibility in some configurations, at the cost of requiring your browser to remain open and positioned correctly.

Backend

How gRPC connects the pieces

YomiNinja's OCR engines don't run inside the Electron process directly. They run as separate backend services that communicate with the front end via gRPC — Google's high-performance remote procedure call framework.

This separation is what allows YomiNinja to support multiple OCR engines without entangling them with the UI. Each engine is a discrete service. Switching OCR engines means redirecting the gRPC call to a different service — no restart required for most switches.

The Python-based engines (PaddleOCR via Python.NET, MangaOCR) run in their own process space. The gRPC interface keeps the Electron UI responsive even when OCR processing takes time — the UI never blocks waiting for recognition results.

Comparison

OCR vs text hooking: how they differ technically

Text-hooking tools like Textractor work by injecting a DLL into the game's process and intercepting the function call where the game sends text to the display system — capturing the raw string before it's rendered to pixels. This is highly accurate but requires a compatible hook for each specific game engine.

YomiNinja reads pixels — it never touches the game process at all. This has two significant consequences:

	OCR (YomiNinja)	Text Hooking (Textractor)
Game compatibility	Any game with visible text	Only games with a compatible hook
Accuracy	High, with occasional recognition errors	Exact — reads the source string
Process access	None — only screen capture	DLL injection into game process
Emulator support	Yes — reads from emulator window	Generally no
Anti-cheat risk	Minimal (screen capture only)	Higher (process injection)
Setup complexity	Low — select window, run	Varies — finding the correct hook can be difficult

The practical recommendation: try Textractor or LunaTranslator first for visual novels on supported engines. Use YomiNinja for everything else — JRPGs, action games, emulated games, games where text hooking fails.

Read the full comparison →

Integration

WebSocket output: connecting to external tools

Every time YomiNinja's OCR produces a result, it broadcasts the recognized text over a local WebSocket server (default port: 7331). Any application or script that connects to this WebSocket receives the OCR output in real time.

This is how Anki mining pipelines work: a texthooker page (running in your browser) connects to the WebSocket, receives each captured sentence, displays it as hoverable text, and lets Yomitan export it to Anki. The same interface can be used for:

Logging game dialogue to a file
Piping text to external TTS engines
Feeding a second-screen display showing the current sentence
Custom applications or scripts built on top of YomiNinja's output

Anki mining setup guide →

Try it

See the pipeline in action

Download YomiNinja free and follow the setup guide to go from zero to your first Yomitan popup in under ten minutes.

Download Free Setup guide →

v0.9.3 · GPL-3.0 · GitHub

How YomiNinja Reads Japanese Text From Your Screen