What local models can actually handle
Sergei Pak · designs AI operating systems · русская версия
The question this started from: can a business run its real operations – mail, bank, contracts – on a model that sits on your own hardware, without sending the data out? And if so, where is the line: what does a local model handle on its own, and what still needs a big cloud one?
A year on a live system that runs four companies' data produced the numbers. Here is what they show.
What a local model handles on its own
81% of requests go to a local model on ordinary hardware and never leave the machine. Not exotic tasks, but exactly the routine operations are made of: triaging incoming mail, classification, summaries of long threads, reconciling statement lines against the books. A model of the level that installs on a work laptop is already enough for this – a year ago it carried noticeably less.
Where the cloud is still needed
The remaining ~19% are tasks where a small model falls short: a dense legal document, a long chain of reasoning, a rare phrasing. That's when a big cloud model comes in. Even then, what goes out isn't the whole pile but a trimmed slice for that one task, through a single gateway, with every call logged: who, which model, how many tokens.
The proportion shows in the bill: the system spends about half a dollar a week on the cloud. If every email and every statement went out to someone's API, that bill would be hundreds of times larger. How that economy works is in a separate post.
What this means for the data
Since most of the work never leaves the hardware, the question of where the data lives is settled by arithmetic, not a privacy policy: the main flow simply doesn't travel. The system installs as a signed package on your server or laptop, and files and statements stay with you. For the first client the whole thing landed on the manager's laptop in a day, the way in was a Telegram bot, and nothing left.
Compare it to the usual way people try AI: you paste a contract into ChatGPT, and now the copy is there, and where it lives is no longer your call. A local model sees nothing past your machine. There's no quiet background channel either: any direct call that skips the gateway is caught at commit time, and once a week the log can be pulled as a list, showing exactly what left.
The line keeps moving toward local
Open models improve faster than the cloud gets cheaper. The share you can keep on your own hardware grows with them, while the share that forces you to reach out shrinks. That is the answer to the data question: the more a local model handles, the fewer reasons to send anything out at all.
The modules, meanwhile, are signed packages on your side, not a subscription to a service. Stop working with me and they keep running, the data and the logic staying yours.
FAQ
- Where is business data physically stored with an AIOS?
- On your own server or laptop. The system installs as a signed package inside your infrastructure rather than running as a service in someone else's cloud. Files, chats and statements stay with you and aren't copied anywhere by default.
- Does the data go to ChatGPT or another cloud?
- A local model on your hardware handles 81% of requests, and that data never leaves the machine. Only a trimmed slice goes to the cloud for hard tasks, through a single gateway and with every call logged. That's why the whole cloud bill is about half a dollar a week.
- What happens to the data if I stop using the system?
- The data and the logic stay with you. The modules are signed packages on your machine and keep running without me. There's no lock-in and no 'our cloud with your data' in this architecture.