Announcing Kotlin bindings for NobodyWho

You can now add NobodyWho as a Gradle dependency and ship an LLM that runs entirely on your users' devices. No API keys, no servers to babysit, no per-token bill at the end of the month — just a .gguf file on the device and a chat loop in your app.
Why on-device?
Most AI features in mobile apps today route every request through a hosted API. Running the model directly on the user's device is a different shape of product, and it brings real benefits:
- Privacy by design — user data never leaves the device
- Works offline — no internet connection required
- Low latency — no network round trip on every interaction
- No cloud costs — inference is free, no per-token billing
The tradeoff is raw capability — local models are smaller than frontier cloud models — but for chat, summarization, classification, and many agentic workflows they're more than enough.
What you get
The Kotlin bindings expose the same core API our Godot, Rust, Python, Flutter, Swift, and React Native users already know, including:
- Streaming chat with full token-by-token output via Kotlin
Flow - Tool calling with automatic parameter extraction via Kotlin reflection
- Sampling controls (temperature, constrained/JSON output, ...)
- Embeddings and a cross-encoder for RAG
- Feed image and audio inputs directly to your LLM
- Any model in
.ggufformat, powered by llama.cpp under the hood
It works on Android and anywhere else the JVM runs.
Getting started
A minimal chat looks like this:
import ai.nobodywho.Chat
import kotlinx.coroutines.runBlocking
fun main() = runBlocking {
val chat = Chat.fromPath(modelPath = "./model.gguf")
val response = chat.ask("Is water wet?").completed()
println(response)
}
For streaming, use asFlow():
chat.ask("Is water wet?").asFlow().collect { token ->
print(token)
}
For the full setup — picking a model, getting the .gguf onto the device, wiring up a streaming chat UI — see the Kotlin documentation.
Tool calling with reflection
One of the nicest things about the Kotlin bindings is tool calling. Kotlin's reflection API lets NobodyWho automatically extract parameter names and types from your function, so you don't have to declare them twice.
fun getWeather(city: String, unit: String): String {
return """{"temp": 22, "unit": "$unit"}"""
}
val weatherTool = Tool(
name = "get_weather",
description = "Get the current weather for a city",
function = ::getWeather
)
This is similar to how our Flutter bindings work — both use runtime reflection to derive the tool schema directly from the function signature. No redundant schema definitions needed.
One core, many languages
Kotlin joins a growing list of first-class NobodyWho targets. The same Rust core — wrapping llama.cpp — now powers bindings across:
- Godot — drop-in nodes for game dialogue, NPCs, and tooling
- Rust — the native API the rest are built on
- Python — for scripting, prototyping, and ML workflows
- Flutter — cross-platform mobile and desktop apps
- React Native — the JavaScript/TypeScript mobile ecosystem
- Swift — native iOS, macOS, watchOS, and visionOS apps
- Kotlin — Android and JVM apps
That's the whole point of NobodyWho: one well-maintained inference core, with idiomatic bindings for whichever language or framework you actually want to ship in. Every binding gets the same feature set — streaming, tool calling, sampling, embeddings, RAG — so you don't have to give up capabilities to use the language you prefer.
Join the community
We'd love to hear what you build with the new Kotlin bindings — and meet the people building with NobodyWho across all the other languages too.
- Discord — the best place to ask questions, share what you're working on, and chat with the team and other NobodyWho users.
- GitHub — open an issue if you hit a bug, or a discussion if you have an idea. And if you like what we're building, give us a star!
Happy hacking!
Published Jun 3, 2026