Local LLM inference on desktop and mobile

Run real-time AI on users' devices

Engineered for real-time speed, data security and cost reduction.

Fast, free and open source

NobodyWho is an on-device inference engine that runs LLMs locally on any device. Developers can embed fast, scalable models into mobile and desktop apps with just a few lines of code. Because inference runs entirely on-device, no servers are required and applications scale automatically with user adoption. This significantly reduces both costs and overall footprint. The result is offline, privacy-preserving and sustainable AI. The library is open source under EUPL 1.2 and is free for both individuals and companies.

  • Fully offline to protect your data
  • Choose from thousands of open-weights LLMs
  • Hardware acceleration on any hardware
  • Don't pay for inference
  • Works on all 5 major operating systems
  • Dead-simple to use

Cut costs. Shrink your footprint. Keep your data private.

  • Local, offline

    Everything runs on the end-user device. Great for software resilience, data privacy, and eliminating running costs. No more stressing about server capacity.

  • Fast on any hardware

    Powered by Vulkan to run on any GPU, Metal to run fast on Apple devices, and SIMD instructions to fully utilize CPUs.

  • Keep your data private

    User data never has to leave the user's device. Provide hard privacy guarantees to your end-users.

  • Free for commercial use

    Want to build a company using the NobodyWho inference lib? Go ahead!

  • Thousands of LLMs

    By using the GGUF open standard, you can choose between thousands of pre-trained LLMs of various sizes and capabilities. Pick one that suits your specific task!

  • Boilerplate-free tool calling

    Just pass in a function. NobodyWho will figure out the rest, and even generate a formal grammar to guarantee that the types match!