Cloude.ai by MARCIANO tma
0 tokens saved
Sign in
Token Minimization Architecture

We make money when you save tokens.

Every other AI platform makes more money when you use more tokens. That's the inversion. The compression engine, multi-model routing, and lifecycle intelligence behind xCloude.ai are designed so the math runs the other way — you get more value per dollar, we get paid out of what you would have wasted.

01

Compression

Sixteen strategies that shrink the request without losing the signal. Defined-terms dictionaries, prompt macros, duplicate-source detection, stuck-loop preemption.

02

Routing

Right model for the question. Easy queries go to fast cheap tiers. Hard queries to the heavy hitters. Refusals and overcaution tracked per model, not per platform.

03

Lifecycle

Models change behavior weekly. We log every refusal, every deprecation, every silent quality shift. You see what's working and what's drifting.

04

Receipts

Every prompt comes with a savings number. Compressed from 8,400 to 3,200 tokens. Saved $0.026. Visible. Measurable. Auditable.

How it works

xCloude sits between you and the four major AI providers — Anthropic, OpenAI, Google, xAI — running every request through a compression and routing layer designed to extract more output per token spent.

The compression pipeline

A request comes in. Before it ever leaves our edge functions, it passes through a layered compression sequence: defined-terms substitution (turning recurring phrases into short tokens), corpus-adaptive context inclusion (pruning what the model already knows), duplicate-source detection (flagging when you've pasted the same document twice without realizing), and stuck-loop preemption (catching iterative prompts that aren't going to break through). The provider receives a request that's faithful to your intent but typically smaller in tokens than what you typed.

Multi-model routing

Different questions deserve different models. A factual lookup against current pricing data doesn't need Opus 4.7 — Haiku 4.5 is faster, cheaper, and equally accurate. A nuanced legal analysis or strategic synthesis is where the heavier models earn their cost. The routing logic — which is itself a Pro-tier configurable layer — picks based on your prompt's complexity, your historical preferences, and the per-model price/latency profile we keep current.

Lifecycle awareness

Models change. Anthropic deprecates a Sonnet variant. Google retires Gemini 1.x and silently returns 404 to anyone still pointing at it. xAI announces a six-day window before Grok Imagine Pro becomes Grok Imagine Quality. Most platforms find out when their users complain. We find out when our model_lifecycle table tells us — and we route around the deprecation before it touches your prompt.

"Token efficiency isn't a feature. It's the entire business model."

Architecture stack

  • Vercel frontend (Next.js + static HTML), single-page interactive surfaces
  • Supabase Postgres with row-level security on every user-scoped table
  • Three core edge functions: chat, multimodel, imagegen — each instrumented for full event logging
  • RAG-ready knowledge base with pgvector for project memory and corpus reuse
  • OAuth via Google + Spotify; Apple in queue; SSO for enterprise on the roadmap
  • Failure tracking via model_events, model_lifecycle, and isolated CSAM alerting in csam_alerts

The patent posture

The compression-strategy layer is patent-pending. Several of the sixteen strategies are operable today against any provider with zero buy-in — meaning they work for you whether or not Anthropic, OpenAI, Google, or xAI ever endorses them. The strategic frame for provider conversations is capacity extension, not licensing: a 40-80% efficiency improvement extends the useful life of every data center they've built. We are interested in conversations with provider partnerships teams.

What we measure

Every chat call, every compare query, every image generation, every refusal. Logged to a database you can query. Some reports are open to all signed-in users. The advanced and historical analytics are Pro-tier.

Live model health

Free
Real-time status across the four providers. Last-known-good model versions, average latency, recent error rates.
Sign in to see live values from the model_lifecycle feed.

Your monthly token spend

Free
Your usage broken down by model, by surface (chat / compare / image), by date. Compared against what the same prompts would have cost going direct.

Per-prompt compression receipt

Free
After every call: original token count, compressed token count, savings in dollars. Surfaced inline in the chat UI.

Asymmetric refusal analytics

Pro
Where do the four major providers refuse the same prompt differently? Which models refuse "female X" but permit "male X"? Which providers refuse historical content the others answer? Aggregated weekly with diff against each provider's posted policy on the relevant date.

Model deprecation early warning

Pro
When a provider's model starts behaving differently — answer drift, latency creep, refusal pattern shift — we see it in the event log before they announce it. Pro subscribers get the alert; everyone else finds out from the broken production app.

Custom benchmark suites

Pro
Run your own prompt set against all four providers, on a schedule, and watch the response quality drift over time. Useful if your domain (tax, medical, legal, insurance) has a sensitivity the public benchmarks don't capture.

Provider TOS snapshot timeline

Pro
Every change to every provider's published acceptable use policy, captured the day it changed, indexed by topic. Line up the policy timeline against your own refusal timeline. Defensible primary-source archive.

The sixteen strategies

Compression is not one trick. It's a portfolio. The strategies stack: applying the first five typically yields a 25-40% reduction; applying the full sixteen plus the routing layer is where the 40-80% range lives. Five are open and available to all signed-in users. Eleven are Pro-tier.

STR-01
Defined-Terms Dictionary Free
Recurring multi-word phrases get bound to short tokens at session start. "Renewable energy investment tax credit" becomes {ITC}. The model sees the short form; you see the long one.
STR-02
Prompt Macros Free
User-defined templates with variable injection. /audit-memo {company} expands to a structured 800-token request that consistently produces a usable analysis.
STR-03
Duplicate Source Detection Free
Hashes every uploaded document. If you paste the same PDF twice in a session — or a near-identical version — the second one is silently dropped from the request and the model is told it already saw it.
STR-04
Stuck-Loop Detection Free
When a prompt gets refused or returns junk three times in a row with the same approach, the engine flags it. You won't burn another 2,000 tokens on the fourth attempt.
STR-05
Connection Quality Pre-flight Free
Before sending a 50K-token request, we check whether your network can sustain the round-trip. Saves the call from timing out at the provider's end with you charged anyway.
STR-06
Corpus-Adaptive Inclusion Pro
Pruning context the model already demonstrably knows. Never-remove rules guarantee critical anchors stay.
STR-07
Source-Quality Routing Pro
Different sources are worth different amounts of context budget. Primary sources get full inclusion; aggregators get summary inclusion; low-quality sources get cited but not included.
STR-08
Unnecessary Tool-Use Prevention Pro
Models love to call search even when the answer is in their training data. The router suppresses the call when the question doesn't need fresh data.
STR-09
Abort Preservation Pro
If a generation is going to fail, we save the partial work and the prompt context so the next attempt resumes instead of starting over.
STR-10
Deliverable-Type Enforcement Pro
Asking for a memo? The model's tendency to drift into expository preamble is suppressed. You get the deliverable, not the warm-up.
STR-11
Defined-Term Hierarchy Pro
Multi-tier dictionaries that compose. Project-level terms inherit from organization-level terms which inherit from industry-level terms.
STR-12
Cross-Session Memory Compression Pro
Long-running projects accumulate state. Instead of replaying it, we maintain a compressed running summary that gets refreshed when material new context arrives.
STR-13
Provider Preamble Stripping Pro
"I'd be happy to help you with that..." Every provider has a tic. We strip them before they hit your bill.
STR-14
Speculative Routing Pro
For ambiguous prompts, send a short test to a fast model first. If the answer is good enough, ship it. Otherwise escalate to the heavy tier with the test result as priming context.
STR-15
Image Iteration Compression Pro
Iterating on a generated image? We don't resend the seed prompt. We send the delta against the previous version.
STR-16
Refusal Reroute Pro
When a model refuses a prompt that another provider would have permitted, the engine offers a one-click reroute to a model that can handle it. No retyping.

This is what xCloude.ai is, and what it wants to be.

A platform where the math is on your side. Where laws and rules and policies aren't excuses to refuse, they're frames to work within. Where the family at the kitchen table that built it gets paid out of what you'd otherwise waste, not out of how much you spend.

Open chat → Read the Declaration