Kimari Logo

Kimari

Local AI Runtime

Server running
v0.1.82-alpha — gate BLOCKED

RunusefullocalAIonolderNVIDIAGPUs

Kimari is the framework. Kimari-4B is not released. No public weights, adapters, or GGUF files are available.

Kimari is the framework. Kimari-4B is not released yet. No public weights, adapters, or GGUF.

Why Kimari?

Built for the GPUs you already own.

Older GPU support

Designed for GTX 1060 and GTX 1080 — not just the latest cards.

Zero cloud dependency

Everything runs locally. No subscriptions, API keys, or telemetry.

OpenAI-compatible

Drop-in endpoint for existing tools, agents, and integrations.

CLI-first

One command to install, start, and diagnose. No npm needed.

Honest status

No inflated benchmarks, no "coming soon" claims. Alpha means alpha.

Future Kimari models

Private training/eval infrastructure for future Kimari models.

IntegrationCode

Connect your apps in seconds with the OpenAI-compatible API.

Integration CodeOpenAI-compatible

0+

GPU Profiles

0 tok/s

Peak Prompt Speed

0%

Local-first

0

Integration Guides

Current Status

Transparent project health at a glance.

Project Status
Framework / CLIWorking
Local GGUF runtimeWorking
OpenAI-compatible endpointWorking
GTX 1060 validationWorking
Gateway DashboardPreview
Open WebUI / Continue configsWorking
Kimari SFT/private adapterPrivate
Public Kimari-4B weightsBlocked
Public GGUF Kimari modelBlocked
Release gateBlocked
Model Status
TinyLlama test profileWorking
Kimari Runtime 1.5BPrivate
Kimari Core 3BPrivate
Kimari-4BBlocked
Official Kimari GGUFBlocked

Safety Defaults

Host127.0.0.1
Public bindDisabled
GateBLOCKED
Tokens in UINo
Public uploadNo
Public GGUFNo

Quick Start

From zero to running in under 5 minutes.

1

Install

$curl -fsSL https://raw.githubusercontent.com/smouj/kimari-local-ai/main/install.sh | bash

One-command install on Linux/WSL2

2

Open Console

$kimari console

Guided setup experience

3

Diagnose

$kimari doctor --deep

14 diagnostic checks

4

Download Model

$kimari pull test

TinyLlama 1.1B test model

5

Start API

$kimari start

OpenAI-compatible endpoint at 127.0.0.1:11435/v1

6

Open Dashboard

$kimari gateway start --open

Local Gateway Dashboard

Custom Install Command

$curl -fsSL https://raw.githubusercontent.com/smouj/kimari-local-ai/main/install.sh | bash

CLI Command Explorer

Every command at your fingertips.

$ kimari doctor --deepessential

14 deep diagnostic checks

$ kimari info

Installation info & version

$ kimari status

Server status

View full CLI reference

Command Builder

Visually construct Kimari CLI commands with the right flags.

Command Builder

Select an action above to build a command...

System Requirements

Check if your system is ready to run Kimari.

System Requirements Check
Simulated compatibility check
Python 3.10+PASS
NVIDIA GPU (6GB+ VRAM)PASS
CUDA 12.1+PASS
4GB free disk spacePASS
WSL2 / LinuxRECOMMENDED
llama.cpp binaryPASS
Compatibility Score85%

This is a simulated check. Run kimari doctor --deep for real diagnostics.

Detailed Requirements

Minimum and recommended specs for running Kimari.

ComponentMinimumRecommendedStatus
GPUNVIDIA GTX 1060 6GBNVIDIA RTX 3060 12GB+
VRAM6 GB12 GB+
CPU4 cores, 2.5 GHz8+ cores, 3.5 GHz+
RAM8 GB16 GB+
Disk4 GB free20 GB+ SSD
OSUbuntu 20.04+ / WSL2Ubuntu 22.04+ native
CUDA12.112.4+
Click any row to toggle simulated compatibility status.

Find Your Profile

Answer 3 questions and get a personalized GPU recommendation.

GPU Compatibility Quiz

Step 1 of 3

Which GPU do you have?

Local Runtime Validation

Tested on a real NVIDIA GTX 1060 6GB under WSL2. Measured in tok/s with CUDA acceleration.

CUDA vs CPU
NVIDIA GTX 1060 6GB — TinyLlama 1.1B Q4_K_M
Prompt processingtok/s
228
CUDA
77
CPU
Token generationtok/s
73
CUDA
33
CPU
TinyLlama test model — NOT Kimari-4B. Local validation only.
GPU Profiles
Pre-tuned for specific hardware — VRAM and Quantization optimized
testDefault
GPU

Any 6 GB+

VRAM

6 GB

Quant

Q4_K_M

gtx1060
Requires Kimari-4B
GPU

GTX 1060

VRAM

6 GB

Quant

Q4_K_M

gtx1080
Requires Kimari-4B
GPU

GTX 1080

VRAM

8 GB

Quant

Q5_K_M

turbo
Requires Kimari-4B
GPU

6 GB+

VRAM

6 GB

Quant

IQ4_XS

GPU

GTX 1060 6GB

OS

WSL2 Ubuntu 24.04

Backend

llama-server CUDA

Test Model

TinyLlama 1.1B Q4_K_M

GPUPerformanceComparison

Estimated benchmark performance across different NVIDIA GPUs.

Prompt Processing (tok/s)

GTX 1060 6GB228 tok/s
GTX 1080 8GB310 tok/s
RTX 3060 12GB520 tok/s
RTX 4070 12GB890 tok/s

Token Generation (tok/s)

GTX 1060 6GB73 tok/s
GTX 1080 8GB98 tok/s
RTX 3060 12GB145 tok/s
RTX 4070 12GB230 tok/s
GTX 1060 results are validated; others are projected estimates.
GTX 1060 6GB (TinyLlama 1.1B)GTX 1080 8GB (TinyLlama 1.1B)RTX 3060 12GB (Qwen3-4B)RTX 4070 12GB (Qwen3-4B)

BenchmarkHistory

Performance progression across Kimari versions on GTX 1060.

All measurements on GTX 1060 6GB with TinyLlama 1.1B Q4_K_M. Prompt and generation speeds measured separately.

GPU VRAM Calculator

Check if your GPU can run a specific model — see VRAM usage at a glance.

GPU VRAM Calculator

VRAM Usage1.2 GB / 6 GB

Model fits!

Runs comfortably on GTX 1060 6GB

Recommended Quantization

IQ4_XS

Max Context Length

4,096 tokens

Optimization Tips

Reduce context length to 2048

Saves ~0.3 GB VRAM for faster inference on 6 GB cards

Use IQ4_XS quantization

Saves 15% VRAM vs Q4_K_M with minimal quality loss

Close other GPU applications

Free VRAM for larger batch sizes and smoother generation

Estimated VRAM values. Actual usage depends on context length and batch size.

Model Performance Estimator

Estimate how a model will perform on your GPU for different tasks.

Model Performance Estimator

Latency180ms
Tokens/sec73 tok/s
Time to First Token120ms
Max Context4,096 tokens

Estimated Performance

These are projected estimates based on GPU specs and model size, not actual benchmarks. Real performance depends on quantization, context length, system config, and batch size. Only GTX 1060 results have been validated on real hardware.

ModelComparison

Compare models side-by-side across speed, quality, VRAM, and context metrics.

Model Comparison Tool

MetricTinyLlama 1.1BQwen3-4BKimari-4B preview
Parameters1.1B4B4B
Gen Speed73 tok/s42 tok/s38 tok/s
VRAM Required1.2 GB3.8 GB3.2 GB
Context Length4,0968,1928,192
QuantizationQ4_K_MQ5_K_MQ4_K_M
Quality Score35/10065/10070/100

Quality scores are estimated. Only GTX 1060 speed results are validated. Select 2-3 models to compare.

Local vs Cloud

How running Kimari locally compares to cloud API services.

Local vs Cloud Performance

Simulated comparison. Kimari values based on GTX 1060 6GB benchmarks. Cloud values are typical for API services.

Resource Monitor

Simulated real-time GPU, memory, and CPU usage gauges.

Resource Monitor

Live (simulated)
45%
GPU
20%
VRAM
12%
CPU
38%
RAM
62°C
Temp
85W
Power

GPU

GTX 1060 6GB

Driver

CUDA 12.1

Model

TinyLlama Q4_K_M

Uptime

4h 23m

Simulated metrics for demonstration. Real values available via kimari status.

GPUTemperatureSimulator

See how GPU temperature changes under different workloads.

20°
40°
60°
80°
38°
Idle Mode

System idle, no model loaded

Temperature38°C
Fan Speed30%
Power Draw15W
Simulated temperatures. Actual values depend on ambient temp, airflow, and thermal paste.

ModelDownloadSimulator

Simulate downloading different model quantizations with speed and progress.

Simulated download. Actual speed depends on network and HuggingFace CDN.

DeploymentChecklist

Track your progress from install to production.

Deployment Checklist

0%
Prerequisites0/3
Install0/2
Configure0/2
Test0/3
Deploy0/2

Progress saved to your browser localStorage.

ErrorTroubleshooter

Search for common errors and find solutions quickly.

Error Troubleshooter

CUDA out of memoryRuntime

Reduce context length with --ctx-size flag, use a smaller quantization (IQ4_XS or Q4_K_M), or close other GPU applications. Run kimari optimize for recommended settings.

Model not foundModels

Run kimari pull test or kimari pull recommended to download the model. Check that the model file exists in ~/.config/kimari/models/

Port already in useServer

Another process is using port 11435. Run kimari stop first, or use --port flag to specify a different port.

llama-server binary not foundInstall

Re-run the install script to download llama-server. Ensure /usr/local/bin/llama-server exists and is executable. Run kimari doctor to check.

Slow token generationPerformance

Ensure CUDA is properly installed and the GPU is being used (check for "CUDA" in kimari status output). CPU-only mode is significantly slower. Run kimari optimize for GPU-specific settings.

Connection refusedServer

Make sure the server is running with kimari start. Check the endpoint URL matches the running server (default: http://127.0.0.1:11435/v1).

GGUF file corruptModels

Re-download the model with kimari pull. Verify integrity with kimari models hash <path>. The SHA256 should match the published hash.

Gate BLOCKEDModels

This is expected. The Kimari-4B release gate is BLOCKED — no public weights are available. Use test or recommended profiles with community models instead.

Gateway dashboard blankGateway

Run kimari gateway setup to install dependencies, then kimari gateway start --open. Ensure Node.js 18+ is installed.

Context length overflowRuntime

The requested context length exceeds available VRAM. Reduce --ctx-size or use a smaller model. GTX 1060 6GB supports up to 8192 context with Q4_K_M on TinyLlama.

Common errors and solutions. For detailed diagnostics, run kimari doctor --deep.

PromptTemplateLibrary

Ready-to-use system prompts for local LLMs — copy and customize.

Code Assistant

Development

Full-stack coding helper with best practices

You are an expert software engineer. Help me write clean, efficient, and well-documented code. When I describe a problem...

Creative Writing

Creative

Story and narrative generation assistant

You are a creative writing assistant. Help me craft engaging stories, poems, and narratives. Focus on vivid descriptions...

Code Reviewer

Development

Thorough code review with security focus

You are a senior code reviewer. Analyze my code for bugs, security vulnerabilities, performance issues, and style proble...

Data Analyst

Analysis

Data interpretation and insight generation

You are a data analysis expert. Help me interpret data, identify trends, and create insights. When I provide data, give ...

Technical Writer

Writing

Documentation and technical content creation

You are a technical writer. Help me create clear, accurate documentation for software projects. Write API docs, README f...

Debug Assistant

Development

Step-by-step debugging and error resolution

You are a debugging expert. When I share an error message or describe unexpected behavior, help me identify the root cau...

Smart Summarizer

Analysis

Multi-length text summarization

You are an expert summarizer. When I provide text, create a concise summary that captures the key points, main arguments...

Translation Expert

Language

Context-aware translation with cultural notes

You are a professional translator. Translate text between languages while preserving tone, context, and cultural nuances...

ConfigurationWizard

Get a personalized configuration in 3 steps.

1
2
3

Which GPU do you have?

How Kimari Compares

See how Kimari stacks up against other local LLM tools.

Feature
Kimari
OllamaLM Studiotext-gen-webui
Old GPU focus
CLI-first
OpenAI-compatible
Gateway Dashboard
GPU Profiles
Model Hashing
Local-only default
Open Source

HowKimariWorks

A look inside the local AI runtime architecture.

Gateway Dashboard

http://127.0.0.1:3105

Your Applications

Open WebUI · Continue.dev

OpenAI-Compatible API

http://127.0.0.1:11435/v1

Kimari Runtime

llama-server · CUDA · GGUF

NVIDIA Hardware

GPU · VRAM · CUDA Cores

CLI Commands

kimari doctor · start · pull

Try the Terminal

Interactive simulator — type a Kimari command and see the output. No real server needed.

kimari-console
Kimari Terminal Simulator v0.1.82-alpha
Type "help" for available commands.
$

Ask Kimari AI

Chat with an AI assistant that knows everything about Kimari.

Ask Kimari AI

Powered by local AI knowledge

Ask anything about Kimari

APIHealthCheck

Simulate a health check on the local API endpoint.

API Health Check

Click "Run Check" to simulate an API health check

Gateway Dashboard

Local monitoring and management from your browser.

http://127.0.0.1:3105

Overview

Runtime Active

GPU

GTX 1060

VRAM

1.2 / 6 GB

Model

TinyLlama

Gate

BLOCKED

VRAM Usage

1221 MiB / 6144 MiB (20%)

$ kimari gateway start --open

Frequently Asked Questions

Common questions about Kimari, answered honestly.

APIEndpointExplorer

OpenAI-compatible endpoints ready for your integrations.

POST/v1/chat/completions

Send a chat message and get a completion response

Request Body
{
  "model": "tinyllama-1.1b-chat-v1.0.Q4_K_M",
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 256
}
Response
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "tinyllama-1.1b-chat-v1.0.Q4_K_M",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I am doing well, thank you for asking. How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 18,
    "total_tokens": 30
  }
}

CompatibleModels

Click any card to see detailed specifications.

🦙

TinyLlama 1.1B

1.1B params

Click to see specs

TinyLlama 1.1B

QuantizationQ4_K_M
VRAM1.2 GB
Speed73 tok/s
Context4096

Fast, compact model for basic chat and testing. Great starting point.

🧠

Qwen3-4B

4B params

Click to see specs

Qwen3-4B

QuantizationQ5_K_M
VRAM3.8 GB
Speed42 tok/s
Context8192

Balanced model with good quality. Recommended for most use cases.

🔮

Kimari-4B

4B params

Click to see specs

Kimari-4B

QuantizationQ4_K_M
VRAM3.2 GB
Speed38 tok/s
Context8192

Future Kimari model. Not released — gate BLOCKED.

🦙

Llama 2 7B

7B params

Click to see specs

Llama 2 7B

QuantizationQ4_K_M
VRAM5.5 GB
Speed18 tok/s
Context4096

Larger model requiring more VRAM. Best on 8GB+ cards.

Phi-3 Mini

3.8B params

Click to see specs

Phi-3 Mini

QuantizationQ4_K_M
VRAM2.8 GB
Speed55 tok/s
Context4096

Microsoft's compact model. Strong reasoning for its size.

💎

Gemma 2 2B

2B params

Click to see specs

Gemma 2 2B

QuantizationQ5_K_M
VRAM1.8 GB
Speed62 tok/s
Context8192

Google's efficient model. Good quality-to-size ratio.

What'sNew

Latest changes and improvements — click entries to expand.

ProjectTimeline

Major milestones on the path to local AI.

ReleaseJan 2026

v0.1.0Initial CLI framework

First working CLI with doctor, start, pull commands

FeatureFeb 2026

v0.1.4Gateway Dashboard preview

Local web dashboard for monitoring runtime

MilestoneMar 2026

v0.1.6GTX 1060 validated

228 tok/s prompt processing confirmed on real hardware

FeatureMar 2026

v0.1.7Integration guides

Open WebUI, Continue.dev, OpenClaw configurations

FeatureApr 2026

v0.1.8GPU profiles & hashing

Pre-tuned profiles and SHA256 model verification

CurrentMay 2026

v0.1.82Current alpha

Gate BLOCKED — quality and safety review in progress

Safety & Privacy

Local-first and conservative by default.

  • Localhost-only defaults
  • No public bind unless explicitly requested
  • No token storage in dashboard
  • No public model upload
  • No public GGUF
  • No benchmark claims without reproducible validation
  • No automatic gate transitions

Roadmap

Honest, incremental progress.

Current

Local runtime + Gateway Dashboard polish

Next

Private adapter runtime preview

Next

Agent Gateway tools & web-search dry-run

Next

Manual review of private outputs

Later

Private GGUF export

Later

GTX 1060 / GTX 1080 validation

Later

Public preview decision

Hugging Face

The Space is a compatibility/demo tool. It does not run Kimari-4B.