Blog

Google just shipped a free AI model that runs on your own laptop. For most of your automation, the per-token cloud bill is now optional. Here is what changed, how it works under the hood, and how Codoro turns it into a working system for your business.
Blog Contents
Every AI automation you run today sends your data to the cloud and bills you for it. For one task it is pennies. At thousands of tasks a day, that meter never stops, and it grows every time your business does.
As of June 2026, that constraint is optional. Google shipped a free model that runs on your own hardware.
The 30-Second Version
BENCHMARK · GOOGLE DEEPMIND, JUNE 2026
16GB
The laptop RAM needed to run Gemma 4 12B locally, for free, with no per-token cloud bill. Released June 3, 2026 under the open Apache 2.0 license.
Every AI task you run today sends your data to the cloud and bills you per token. The more your automation works, the more it costs.
For one task it is pennies. At thousands of tasks a day, reading invoices, sorting tickets, tagging leads, it becomes a utility bill that scales up every time your business does. The shape of that cost curve is brutal: the harder your automation works, the more you pay.
And there is a second, quieter problem. Every one of those tasks sends your data out of your building. For most businesses that is an annoyance. For anyone in healthcare, legal, or finance, it is a compliance headache that no cloud privacy policy fully removes.
On June 3, Google DeepMind released Gemma 4 12B, a small, open-source model that runs on your own hardware for free.
Two things make it matter for business, not just researchers.
Google was not alone. A day earlier, Microsoft introduced its own local Aion models built to run on its Surface dev hardware. Two of the biggest cloud providers on earth, the same week, both pushing AI onto the device. That is the direction, not a coincidence.
Cost, privacy, and speed all change at once.
Flip 01
Cost
Rental becomes ownership. Cloud is pay-per-use forever. Local is set up once, then near-zero per run. The higher your volume, the bigger the win.
Flip 02
Privacy
Data stays in the building. It runs on your hardware, so sensitive information never leaves it. Often the difference between being allowed to use AI on that data or not.
Flip 03
Speed
No round-trip to a data center means near-instant responses. That matters for anything real-time: a live call, a camera feed, a customer waiting on a screen.
A local model handles high-volume, repetitive tasks brilliantly. It will not do your deep, open-ended thinking.
Point it at invoices, camera feeds, or support tickets and it extracts, flags, and sorts instantly. Ask it to write a five-year strategy and it falls short. That still needs a large cloud model.
So the smart setup is not local or cloud. It is both: local for the repetitive 90%, cloud for the 10% that needs real reasoning. Deciding what runs where is the actual engineering, and it has to be built for your business.
The choice is not just a cost decision. It is an architecture decision that compounds over every task your automation runs.
| Approach | Cost | Data | Best For |
|---|---|---|---|
| Cloud-only | Per token, grows with volume | Leaves your building every call | Deep reasoning, low-volume complex tasks |
| Local-only | One-time setup, near-zero per run | Stays on your hardware | High-volume, sensitive, real-time tasks |
| Hybrid (routed) | Local for the 90%, cloud when needed | Sensitive work stays local | Almost every real business at scale |
The mistake is treating this as a binary. The businesses that win run a hybrid system where most of the work happens locally and the cloud is reserved for the few tasks that earn its cost. Building that routing layer correctly is where the savings actually get captured.
CODORO · AI AUTOMATION SYSTEMS · US AND UK
The model is free. The system that turns it into working automation is what we build.
Gemma 4 is an engine. On its own it does nothing. It sits idle until someone builds the car around it: the data feeding in, the routing deciding what runs local versus cloud, the answer triggering a real action, and the connection to the tools you already use.
Codoro builds that system from scratch, configured specifically for your business. We handle every layer of the stack:
If your cloud AI bill is climbing or your sensitive data has to stay in-house, that is a solvable architecture problem. Get in touch and we will build the system that fixes it.
Google and Microsoft just made capable AI cheap enough to run on your own hardware. For most of your work, the recurring per-token bill is now optional, and your sensitive data no longer has to leave the building.
The opportunity is not the free model itself. It is the hybrid system that routes work between local and cloud automatically, capturing the savings without losing reasoning power where you need it. The technology is here and proven.
The question, as always, is whether your setup is built to take advantage of it.
What is Gemma 4 12B?
Can a local model replace cloud models like GPT or Claude?
What does it cost to run AI locally?
How does Codoro deploy this for businesses?
Summary
Google just shipped a free AI model that runs on your own laptop, making the per-token cloud bill optional for most of your automation. Here is what changed, why local models are workers not strategists, and how Codoro builds the hybrid system that routes work between local and cloud.