Kimari MicroCompress (KMC) — Reversible Lossless Compression for AI Models

★ Features

Built for AI Model Workflows

Tensor-aware codecs, partial access, selective extraction, and more — designed specifically for safetensors, GGUF, LoRA, and training checkpoints.

📤

Streaming I/O

Process large model files without loading everything into memory. Streaming reads and writes keep memory usage flat.

⚡

Parallel Compression

Compress blocks in parallel with --jobs N. Utilize multiple CPU cores for faster packing on large models.

👁

Partial Access via KMCReader

Read individual files or tensors from an archive without full decompression. Built-in block, file, and tensor indexes.

🎯

Selective Extraction

Extract only what you need with --only and --tensor flags. No full unpack required.

📑

Block / File / Tensor Indexes

Three-level indexing for efficient lookups. Block offsets, file-to-block maps, and tensor-to-block maps built on open.

🎨

Tensor-Aware Codecs

BytePlane and FloatPlane codecs exploit tensor structure for better compression on FP32, BF16, and FP16 data.

📄

safetensors & GGUF Metadata

Full metadata parsing for safetensors (shards, dtypes, shapes) and GGUF (quantization types, tensor layout).

🔀

LoRA / PEFT & Checkpoint Workflows

Specialized workflows for LoRA adapters and training checkpoints with artifact-type detection and metadata.

🔒

Deterministic & Lossless

Every byte preserved exactly. SHA-256 verification at file and block level. No pickle used anywhere.

▶ Quick Start

Get Up and Running

Install KMC and start compressing your AI models in minutes.

Installation

# Clone and install
git clone https://github.com/smouj/kimari-microcompress.git
cd kimari-microcompress
pip install -e ".[all]"

# Or with optional dependencies
pip install -e ".[safetensors]"  # Enhanced safetensors parsing
pip install -e ".[zipnn]"         # ZipNN benchmark comparison

Basic Usage

# Pack a model directory
kmc pack ./my-model ./my-model.kmc

# Pack with tensor-aware mode
kmc pack ./my-model ./my-model.kmc --tensor-aware

# Verify integrity
kmc verify ./my-model.kmc

# Unpack archive
kmc unpack ./my-model.kmc ./restored-model/

💻 Python API

KMCReader — Partial Access API

Read individual files or tensors without decompressing the entire archive.

KMCReader Example

from kmc.reader import KMCReader

with KMCReader("model.kmc") as reader:
    # List available files and tensors
    files = reader.list_files()
    tensors = reader.list_tensors()

    # Read a single file (only needed blocks decompressed)
    config = reader.read_file("config.json")

    # Read a specific tensor by name
    weight_bytes = reader.read_tensor("model.layers.0.mlp.down_proj.weight")

    # Extract a file to disk
    reader.extract_file("config.json", "./output/")

    # Extract a tensor to disk
    reader.extract_tensor("model.layers.0.mlp.down_proj.weight", "./output/")

    # Get file/tensor metadata
    file_info = reader.get_file_info("config.json")
    tensor_info = reader.get_tensor_info("model.layers.0.mlp.down_proj.weight")

Tip: KMCReader builds block, file, and tensor indexes on open, enabling efficient partial reads. Only the blocks needed for a requested file or tensor are read and decompressed.

⌨ CLI Reference

Command-Line Interface

All major commands and flags at a glance.

Common Commands

# Core commands
kmc pack SOURCE OUTPUT               # Compress files/dirs into .kmc
kmc pack-lora SOURCE OUTPUT          # Compress LoRA adapter
kmc pack-checkpoint SOURCE OUTPUT    # Compress training checkpoint
kmc unpack ARCHIVE OUTPUT            # Decompress .kmc archive
kmc verify ARCHIVE                   # Full integrity verification
kmc inspect TARGET                   # Inspect archive or model
kmc bench SOURCE OUTPUT              # Benchmark compression

# Selective extraction
kmc unpack archive.kmc ./out --only "*.safetensors"
kmc unpack archive.kmc ./out --tensor "model.layer.0.weight"

# Inspect with flags
kmc inspect ./model/ --tensors        # Show tensor details
kmc inspect ./model/ --lora           # LoRA inspection
kmc inspect ./model/ --checkpoint     # Checkpoint inspection
kmc inspect model.gguf --gguf        # GGUF inspection
kmc inspect archive.kmc --compression  # Compression summary

Flag	Command	Description
`--tensor-aware`	pack	Align blocks to tensor boundaries for safetensors files
`--gguf-aware`	pack	Adjust codec selection for quantized GGUF tensors
`--codec`	pack, bench	Codec: auto, byteplane, floatplane, zstd, zlib, raw
`--jobs N`	pack	Parallel compression with N workers
`--only`	unpack	Selective file extraction by glob pattern
`--tensor`	unpack	Selective tensor extraction by name
`--tensors`	inspect	Show detailed tensor information
`--compression`	inspect	Show compression summary with codec usage
`--json`	inspect, bench	Output as JSON for scripting
`--compare-codecs`	bench	Compare all available codecs

⚠️ Important Limitations

Please Read Before Using

KMC is a storage and transfer tool. Understanding these limitations is critical.

🚫

Not Compressed Inference

KMC does NOT perform compressed inference. It cannot reduce VRAM during model execution. It is designed for storage, transfer, and verification only.

💬

No VRAM Reduction Claims

Do not use KMC as a replacement for quantization. If you need smaller models for inference, use quantization (GGUF Q4_K, GPTQ, AWQ, etc.) instead.

🔒

Lossless Only — No Weight Modification

Compression is lossless and reversible. Every byte is preserved exactly. There is no lossy mode and no weight modification of any kind.

🙁

No Pickle Used Anywhere

KMC never deserializes pickle-based files. Only presence, size, and hash are recorded. No arbitrary code execution risk from pickle.

Kimari MicroCompress
(KMC)