Bulk mode for LLMs.
Drop off a big workload, go do literally anything else,
come back to a finished results file.
Reliable delivery, every run. Predictable turnaround you can plan around.
JSONL in
We run it in bulk
GPUs doing their night shiftJSONL out
Same Intelligence.
Fraction of the Price.
Significantly cheaper than leading providers with comparable model performance.
OpenAI
gpt-4.1-mini (1381 ELO)
Input /MTok
$0.4
Output /MTok
$1.6
Anthropic
claude-sonnet-4 (1389 ELO)
Input /MTok
$4
Output /MTok
$15
Why pay for real-time when you don't need it? Doubleword optimizes for cost and passes the savings on to you.

Same intelligence, delivered async for massive savings.
Qwen3-30B-A3B-Instruct (1382 ELO)
Input /MTok
$0.05
Output /MTok
$0.2
Qwen3-30B-A3B-Instruct (1382 ELO)
Input /MTok
$0.07
Output /MTok
$0.3
* ELO scores from LMArena. Models with similar ELO have comparable intelligence.
Price comparison for other models available in private preview.
Calculate Your
Savings
See how much you could save by switching to Doubleword.
Annual Current Cost
$26,280
Annual Doubleword Cost
$3,285
Annual Savings
$22,995
88% less
Batch Inference, Done Right
We built the infrastructure others didn't bother to optimize. The result? Faster delivery, lower costs, and guarantees you can count on.
Save up to 75% on every batch
We designed our hardware, runtime, and orchestration stack specifically for batch workloads—letting us pass on dramatically lower costs to you.
Guaranteed SLAs, or your money back
Choose 1-hour or 24-hour delivery windows. If we miss it, you don't pay. Simple as that.
1-Hour Batch SLA
The shortest guaranteed batch SLA available. Perfect for chained data processing and offline agent workflows.
Streaming Results
Results flow back as they're processed. No waiting for the entire batch to complete.
One-Line Migration
OpenAI-compatible API. Switch your endpoint, keep your code. Migration in minutes.
api.doubleword.ai/v1Built for Your Highest Volume Use Cases
From async agents to embeddings to image processing, we handle inference when it's at scale.
Async or Offline Agents
Autonomous agents that do long running background multi-step reasoning tasks.
Data Processing Pipelines
Process large datasets with LLM-powered analysis at scale.
Try Use Case GuideCommon Questions
It's simple: 1) Submit your batch via API and pick a 1hr or 24hr SLA. 2) We process it on our optimized batch infrastructure. 3) Receive results streamed as they complete—guaranteed on time.
We're not running at a loss or burning VC cash. Doubleword has built an inference stack optimized for high throughput and low cost from the ground up. By optimizing at every layer—hardware, runtime, and orchestration—we achieve significantly better unit economics than providers who bolt batch features onto real-time infrastructure.
We guarantee delivery. Unlike other providers who may expire your batch, we commit to your chosen SLA. If in the unlikely event we fail to meet it, we won't expire your request and you won't be charged.
Yes! Results are streamed as they're processed—you don't have to wait for every single request to complete before getting your first results.
We currently support popular open source LLMs and embedding models of various sizes in our private preview. We're actively adding more models and modalities based on user demand. Join the waitlist and let us know what models matter most to you.
Join our private preview waitlist below. We're onboarding users in batches (pun intended) and providing free credits to early adopters so you can test the platform at no risk.
Want to dive deeper into how we achieve these results?
Read our CEO's technical deep-diveStop overpaying
for inference.
The Doubleword Batched public preview is ideal for teams running batch or async inference—workloads where a 1-hour or 24-hour SLA works.
You're a good fit if you:
- Want to trial open-source models for batch use cases
- Have workloads ready to test now
- Are open to feedback & user interviews
Start building now
Get instant access to the platform.