Reversible lossless compression for AI models.
Storage, transfer, and verification — without modifying weights.
Tensor-aware codecs, partial access, selective extraction, and more — designed specifically for safetensors, GGUF, LoRA, and training checkpoints.
Process large model files without loading everything into memory. Streaming reads and writes keep memory usage flat.
Compress blocks in parallel with --jobs N. Utilize multiple CPU cores for faster packing on large models.
Read individual files or tensors from an archive without full decompression. Built-in block, file, and tensor indexes.
Extract only what you need with --only and --tensor flags. No full unpack required.
Three-level indexing for efficient lookups. Block offsets, file-to-block maps, and tensor-to-block maps built on open.
BytePlane and FloatPlane codecs exploit tensor structure for better compression on FP32, BF16, and FP16 data.
Full metadata parsing for safetensors (shards, dtypes, shapes) and GGUF (quantization types, tensor layout).
Specialized workflows for LoRA adapters and training checkpoints with artifact-type detection and metadata.
Every byte preserved exactly. SHA-256 verification at file and block level. No pickle used anywhere.
Install KMC and start compressing your AI models in minutes.
# Clone and install
git clone https://github.com/smouj/kimari-microcompress.git
cd kimari-microcompress
pip install -e ".[all]"
# Or with optional dependencies
pip install -e ".[safetensors]" # Enhanced safetensors parsing
pip install -e ".[zipnn]" # ZipNN benchmark comparison
# Pack a model directory
kmc pack ./my-model ./my-model.kmc
# Pack with tensor-aware mode
kmc pack ./my-model ./my-model.kmc --tensor-aware
# Verify integrity
kmc verify ./my-model.kmc
# Unpack archive
kmc unpack ./my-model.kmc ./restored-model/
Read individual files or tensors without decompressing the entire archive.
from kmc.reader import KMCReader
with KMCReader("model.kmc") as reader:
# List available files and tensors
files = reader.list_files()
tensors = reader.list_tensors()
# Read a single file (only needed blocks decompressed)
config = reader.read_file("config.json")
# Read a specific tensor by name
weight_bytes = reader.read_tensor("model.layers.0.mlp.down_proj.weight")
# Extract a file to disk
reader.extract_file("config.json", "./output/")
# Extract a tensor to disk
reader.extract_tensor("model.layers.0.mlp.down_proj.weight", "./output/")
# Get file/tensor metadata
file_info = reader.get_file_info("config.json")
tensor_info = reader.get_tensor_info("model.layers.0.mlp.down_proj.weight")
All major commands and flags at a glance.
# Core commands
kmc pack SOURCE OUTPUT # Compress files/dirs into .kmc
kmc pack-lora SOURCE OUTPUT # Compress LoRA adapter
kmc pack-checkpoint SOURCE OUTPUT # Compress training checkpoint
kmc unpack ARCHIVE OUTPUT # Decompress .kmc archive
kmc verify ARCHIVE # Full integrity verification
kmc inspect TARGET # Inspect archive or model
kmc bench SOURCE OUTPUT # Benchmark compression
# Selective extraction
kmc unpack archive.kmc ./out --only "*.safetensors"
kmc unpack archive.kmc ./out --tensor "model.layer.0.weight"
# Inspect with flags
kmc inspect ./model/ --tensors # Show tensor details
kmc inspect ./model/ --lora # LoRA inspection
kmc inspect ./model/ --checkpoint # Checkpoint inspection
kmc inspect model.gguf --gguf # GGUF inspection
kmc inspect archive.kmc --compression # Compression summary
| Flag | Command | Description |
|---|---|---|
--tensor-aware |
pack | Align blocks to tensor boundaries for safetensors files |
--gguf-aware |
pack | Adjust codec selection for quantized GGUF tensors |
--codec |
pack, bench | Codec: auto, byteplane, floatplane, zstd, zlib, raw |
--jobs N |
pack | Parallel compression with N workers |
--only |
unpack | Selective file extraction by glob pattern |
--tensor |
unpack | Selective tensor extraction by name |
--tensors |
inspect | Show detailed tensor information |
--compression |
inspect | Show compression summary with codec usage |
--json |
inspect, bench | Output as JSON for scripting |
--compare-codecs |
bench | Compare all available codecs |
KMC is a storage and transfer tool. Understanding these limitations is critical.
KMC does NOT perform compressed inference. It cannot reduce VRAM during model execution. It is designed for storage, transfer, and verification only.
Do not use KMC as a replacement for quantization. If you need smaller models for inference, use quantization (GGUF Q4_K, GPTQ, AWQ, etc.) instead.
Compression is lossless and reversible. Every byte is preserved exactly. There is no lossy mode and no weight modification of any kind.
KMC never deserializes pickle-based files. Only presence, size, and hash are recorded. No arbitrary code execution risk from pickle.
Dive deeper into the project with these resources.