OpenAI Vision (BYOK)

Bring your own OpenAI API key for premium-quality translations from GPT vision models. An alternative to the default Hosted Server backend.

Overview

OpenAI Vision is a BYOK (bring your own key) alternative to the default Hosted Server backend. Pick it when you want full control over the model, direct OpenAI billing, or to avoid our weekly quota. Bubble detection still runs locally in your browser via PaddleOCR; individual bubble crops are then sent to OpenAI for OCR + translation.

Not sure which backend to pick?

The extension ships with the Hosted Server backend selected by default — no API key needed. Only switch to OpenAI Vision if you specifically want to supply your own key.

Setup

Get an OpenAI API key from platform.openai.com
Open the extension popup and go to the Backend tab
Select "OpenAI Vision" as the backend
Paste your API key
Choose a model (see below)
Click "Check Connection" to verify

Available Models

Model	Speed	Quality	Cost
GPT-5.4 Nano	Fast	Good	~$0.01/page
GPT-5.4 Mini	Medium	Excellent	~$0.03/page
GPT-4o	Slower	Best	~$0.05/page

How It Works

Detect — PaddleOCR finds bubble regions in the image (runs locally)
Crop — Each bubble is cropped with 10px padding using OffscreenCanvas
Translate — Up to 30 crops per image are sent to GPT as separate image parts
Render — Translated text is drawn as canvas overlays on the original image

Your API key is stored locally

The API key is saved in your browser's local storage and is never sent to any server other than OpenAI's API. Track your costs in the History tab.

Cost Tracking

The extension tracks estimated costs per translation in the History tab. You can see total spend, number of translations, and tokens used.

← Save & Export Ollama (Local) →