Kimari is the framework. Kimari-4B is not released yet. No public weights, adapters, or GGUF.
Why Kimari?
Built for the GPUs you already own.
Designed for GTX 1060 and GTX 1080 — not just the latest cards.
Everything runs locally. No subscriptions, API keys, or telemetry.
Drop-in endpoint for existing tools, agents, and integrations.
One command to install, start, and diagnose. No npm needed.
No inflated benchmarks, no "coming soon" claims. Alpha means alpha.
Private training/eval infrastructure for future Kimari models.
IntegrationCode
Connect your apps in seconds with the OpenAI-compatible API.
0+
GPU Profiles
0 tok/s
Peak Prompt Speed
0%
Local-first
0
Integration Guides
Current Status
Transparent project health at a glance.
Safety Defaults
Quick Start
From zero to running in under 5 minutes.
Install
One-command install on Linux/WSL2
Open Console
Guided setup experience
Diagnose
14 diagnostic checks
Download Model
TinyLlama 1.1B test model
Start API
OpenAI-compatible endpoint at 127.0.0.1:11435/v1
Open Dashboard
Local Gateway Dashboard
Custom Install Command
CLI Command Explorer
Every command at your fingertips.
14 deep diagnostic checks
Installation info & version
Server status
Command Builder
Visually construct Kimari CLI commands with the right flags.
Command Builder
System Requirements
Check if your system is ready to run Kimari.
This is a simulated check. Run kimari doctor --deep for real diagnostics.
Detailed Requirements
Minimum and recommended specs for running Kimari.
| Component | Minimum | Recommended | Status |
|---|---|---|---|
| GPU | NVIDIA GTX 1060 6GB | NVIDIA RTX 3060 12GB+ | |
| VRAM | 6 GB | 12 GB+ | |
| CPU | 4 cores, 2.5 GHz | 8+ cores, 3.5 GHz+ | |
| RAM | 8 GB | 16 GB+ | |
| Disk | 4 GB free | 20 GB+ SSD | |
| OS | Ubuntu 20.04+ / WSL2 | Ubuntu 22.04+ native | |
| CUDA | 12.1 | 12.4+ |
Find Your Profile
Answer 3 questions and get a personalized GPU recommendation.
GPU Compatibility Quiz
Which GPU do you have?
Local Runtime Validation
Tested on a real NVIDIA GTX 1060 6GB under WSL2. Measured in tok/s with CUDA acceleration.
testDefaultAny 6 GB+
6 GB
Q4_K_M
gtx1060GTX 1060
6 GB
Q4_K_M
gtx1080GTX 1080
8 GB
Q5_K_M
turbo6 GB+
6 GB
IQ4_XS
GPU
GTX 1060 6GB
OS
WSL2 Ubuntu 24.04
Backend
llama-server CUDA
Test Model
TinyLlama 1.1B Q4_K_M
GPUPerformanceComparison
Estimated benchmark performance across different NVIDIA GPUs.
Prompt Processing (tok/s)
Token Generation (tok/s)
BenchmarkHistory
Performance progression across Kimari versions on GTX 1060.
GPU VRAM Calculator
Check if your GPU can run a specific model — see VRAM usage at a glance.
GPU VRAM Calculator
Model fits!
Runs comfortably on GTX 1060 6GB
Recommended Quantization
IQ4_XS
Max Context Length
4,096 tokens
Optimization Tips
Reduce context length to 2048
Saves ~0.3 GB VRAM for faster inference on 6 GB cards
Use IQ4_XS quantization
Saves 15% VRAM vs Q4_K_M with minimal quality loss
Close other GPU applications
Free VRAM for larger batch sizes and smoother generation
Estimated VRAM values. Actual usage depends on context length and batch size.
Model Performance Estimator
Estimate how a model will perform on your GPU for different tasks.
Model Performance Estimator
Estimated Performance
These are projected estimates based on GPU specs and model size, not actual benchmarks. Real performance depends on quantization, context length, system config, and batch size. Only GTX 1060 results have been validated on real hardware.
ModelComparison
Compare models side-by-side across speed, quality, VRAM, and context metrics.
Model Comparison Tool
| Metric | TinyLlama 1.1B | Qwen3-4B | Kimari-4B preview |
|---|---|---|---|
| Parameters | 1.1B | 4B | 4B |
| Gen Speed | 73 tok/s | 42 tok/s | 38 tok/s |
| VRAM Required | 1.2 GB | 3.8 GB | 3.2 GB |
| Context Length | 4,096 | 8,192 | 8,192 |
| Quantization | Q4_K_M | Q5_K_M | Q4_K_M |
| Quality Score | 35/100 | 65/100 | 70/100 |
Quality scores are estimated. Only GTX 1060 speed results are validated. Select 2-3 models to compare.
Local vs Cloud
How running Kimari locally compares to cloud API services.
Local vs Cloud Performance
Resource Monitor
Simulated real-time GPU, memory, and CPU usage gauges.
Resource Monitor
Live (simulated)GPU
GTX 1060 6GB
Driver
CUDA 12.1
Model
TinyLlama Q4_K_M
Uptime
4h 23m
Simulated metrics for demonstration. Real values available via kimari status.
GPUTemperatureSimulator
See how GPU temperature changes under different workloads.
System idle, no model loaded
ModelDownloadSimulator
Simulate downloading different model quantizations with speed and progress.
Simulated download. Actual speed depends on network and HuggingFace CDN.
DeploymentChecklist
Track your progress from install to production.
Deployment Checklist
0%Progress saved to your browser localStorage.
ErrorTroubleshooter
Search for common errors and find solutions quickly.
Error Troubleshooter
Reduce context length with --ctx-size flag, use a smaller quantization (IQ4_XS or Q4_K_M), or close other GPU applications. Run kimari optimize for recommended settings.
Run kimari pull test or kimari pull recommended to download the model. Check that the model file exists in ~/.config/kimari/models/
Another process is using port 11435. Run kimari stop first, or use --port flag to specify a different port.
Re-run the install script to download llama-server. Ensure /usr/local/bin/llama-server exists and is executable. Run kimari doctor to check.
Ensure CUDA is properly installed and the GPU is being used (check for "CUDA" in kimari status output). CPU-only mode is significantly slower. Run kimari optimize for GPU-specific settings.
Make sure the server is running with kimari start. Check the endpoint URL matches the running server (default: http://127.0.0.1:11435/v1).
Re-download the model with kimari pull. Verify integrity with kimari models hash <path>. The SHA256 should match the published hash.
This is expected. The Kimari-4B release gate is BLOCKED — no public weights are available. Use test or recommended profiles with community models instead.
Run kimari gateway setup to install dependencies, then kimari gateway start --open. Ensure Node.js 18+ is installed.
The requested context length exceeds available VRAM. Reduce --ctx-size or use a smaller model. GTX 1060 6GB supports up to 8192 context with Q4_K_M on TinyLlama.
Common errors and solutions. For detailed diagnostics, run kimari doctor --deep.
PromptTemplateLibrary
Ready-to-use system prompts for local LLMs — copy and customize.
Code Assistant
DevelopmentFull-stack coding helper with best practices
Creative Writing
CreativeStory and narrative generation assistant
Code Reviewer
DevelopmentThorough code review with security focus
Data Analyst
AnalysisData interpretation and insight generation
Technical Writer
WritingDocumentation and technical content creation
Debug Assistant
DevelopmentStep-by-step debugging and error resolution
Smart Summarizer
AnalysisMulti-length text summarization
Translation Expert
LanguageContext-aware translation with cultural notes
ConfigurationWizard
Get a personalized configuration in 3 steps.
Which GPU do you have?
How Kimari Compares
See how Kimari stacks up against other local LLM tools.
| Feature | Kimari | Ollama | LM Studio | text-gen-webui |
|---|---|---|---|---|
| Old GPU focus | ||||
| CLI-first | ||||
| OpenAI-compatible | ||||
| Gateway Dashboard | ||||
| GPU Profiles | ||||
| Model Hashing | ||||
| Local-only default | ||||
| Open Source |
HowKimariWorks
A look inside the local AI runtime architecture.
Gateway Dashboard
http://127.0.0.1:3105
Your Applications
Open WebUI · Continue.dev
OpenAI-Compatible API
http://127.0.0.1:11435/v1
Kimari Runtime
llama-server · CUDA · GGUF
NVIDIA Hardware
GPU · VRAM · CUDA Cores
CLI Commands
kimari doctor · start · pull
Try the Terminal
Interactive simulator — type a Kimari command and see the output. No real server needed.
Ask Kimari AI
Chat with an AI assistant that knows everything about Kimari.
Ask Kimari AI
Powered by local AI knowledge
APIHealthCheck
Simulate a health check on the local API endpoint.
Click "Run Check" to simulate an API health check
Frequently Asked Questions
Common questions about Kimari, answered honestly.
APIEndpointExplorer
OpenAI-compatible endpoints ready for your integrations.
/v1/chat/completionsSend a chat message and get a completion response
{
"model": "tinyllama-1.1b-chat-v1.0.Q4_K_M",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
],
"temperature": 0.7,
"max_tokens": 256
}{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "tinyllama-1.1b-chat-v1.0.Q4_K_M",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I am doing well, thank you for asking. How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 18,
"total_tokens": 30
}
}CompatibleModels
Click any card to see detailed specifications.
TinyLlama 1.1B
1.1B paramsClick to see specs
TinyLlama 1.1B
Fast, compact model for basic chat and testing. Great starting point.
Qwen3-4B
4B paramsClick to see specs
Qwen3-4B
Balanced model with good quality. Recommended for most use cases.
Kimari-4B
4B paramsClick to see specs
Kimari-4B
Future Kimari model. Not released — gate BLOCKED.
Llama 2 7B
7B paramsClick to see specs
Llama 2 7B
Larger model requiring more VRAM. Best on 8GB+ cards.
Phi-3 Mini
3.8B paramsClick to see specs
Phi-3 Mini
Microsoft's compact model. Strong reasoning for its size.
Gemma 2 2B
2B paramsClick to see specs
Gemma 2 2B
Google's efficient model. Good quality-to-size ratio.
ProjectTimeline
Major milestones on the path to local AI.
v0.1.0 — Initial CLI framework
First working CLI with doctor, start, pull commands
v0.1.4 — Gateway Dashboard preview
Local web dashboard for monitoring runtime
v0.1.6 — GTX 1060 validated
228 tok/s prompt processing confirmed on real hardware
v0.1.7 — Integration guides
Open WebUI, Continue.dev, OpenClaw configurations
v0.1.8 — GPU profiles & hashing
Pre-tuned profiles and SHA256 model verification
v0.1.82 — Current alpha
Gate BLOCKED — quality and safety review in progress
Safety & Privacy
Local-first and conservative by default.
- Localhost-only defaults
- No public bind unless explicitly requested
- No token storage in dashboard
- No public model upload
- No public GGUF
- No benchmark claims without reproducible validation
- No automatic gate transitions
Roadmap
Honest, incremental progress.
Local runtime + Gateway Dashboard polish
Private adapter runtime preview
Agent Gateway tools & web-search dry-run
Manual review of private outputs
Private GGUF export
GTX 1060 / GTX 1080 validation
Public preview decision
Documentation
Everything you need, organized by category.
Hugging Face
Official org page
GPU compatibility checker
Reference community models
The Space is a compatibility/demo tool. It does not run Kimari-4B.
