Empowering Innovation: Deploying interal AI Node with Gemma 4 and NVIDIA Blackwell

This wasn't about chasing a trend. It was a deliberate choice about where our data lives, what our developers can rely on day to day, and how predictable our costs are. Here's what we built and why.

Why In-House: Data Residency and Privacy

The main driver was control over data. As a Belgian MSP handling client information, we operate under GDPR, and "where is this processed, and by whom?" is a question we have to answer concretely — not wave away.

Running a model on our own hardware means sensitive inputs — client code, internal documents, support tickets — are processed within our own infrastructure rather than sent to a third-party API. The node doesn't sit on some open internal network either: it runs inside our zero-trust architecture, where nothing is implicitly trusted just for being "inside." Every request to it is authenticated and authorized, the same way we'd treat traffic from outside. That doesn't make security someone else's problem; it makes it ours, which is exactly the point. We control the perimeter, the patching, and the access policy, and we can say precisely where a given piece of data went.

For some workloads we'll still use external providers where they're clearly the better tool. But for anything touching sensitive material, keeping it local removes an entire category of "what happens to this once it leaves us?" questions.

Guardrails for Autonomous Agents

Running a model locally is only half the work; the harder part is making an autonomous agent behave predictably. An agent that can read files, run commands, or touch internal systems needs boundaries that don't depend on it choosing to respect them.

We're putting a policy layer in front of our agents so that what they can access and what they can do is enforced by configuration, not left to the model's judgment in the moment. The goal is simple: an agent should be able to do its job and nothing beyond it, and we should be able to audit exactly what it did. This part is still being hardened — it's the piece we're most deliberate about before letting any agent near production data.

A Coding Agent That Runs Offline

The most immediate payoff has been a coding assistant wired directly into our developers' IDEs. Because the model runs on our own node, it works offline and carries no per-seat licensing cost — two things that matter when you want a tool used freely rather than rationed.

We're realistic about the trade-off: an open-weight model running on a single workstation GPU won't match the absolute frontier on every task. But for the bulk of day-to-day work — boilerplate, refactors, explaining unfamiliar code, first-draft tests — it's fast, private, and always available. We reserve heavier external models for the cases that genuinely need them.

First Pilot: An AI-Assisted Writing Workflow

To put the node through its paces, our first internal project is an AI-assisted writing workflow for this very blog — yes, the kind of post you're reading. The honest goal isn't to replace writing; it's to cut the editing time between "we shipped something technical" and "it's published and readable," which is usually where these posts die.

It's an early pilot, and a human still reviews everything before it goes out. But it's a good test case: low risk, clear before/after, and a fast feedback loop on how well the local setup handles real work.

What's Next

With the node in place, the plan is to add more narrow, task-specific agents where the privacy and cost arguments are strongest — and to keep the guardrail layer ahead of whatever we let them touch. We'll share what works and, just as usefully, what doesn't.