v0.7.0-alpha

Kimari MicroCompress
(KMC)

Reversible lossless compression for AI models.
Storage, transfer, and verification — without modifying weights.

🔒 Lossless & Deterministic Parallel Compression 👁 Partial Access 🚀 No Pickle

Built for AI Model Workflows

Tensor-aware codecs, partial access, selective extraction, and more — designed specifically for safetensors, GGUF, LoRA, and training checkpoints.

📤

Streaming I/O

Process large model files without loading everything into memory. Streaming reads and writes keep memory usage flat.

Parallel Compression

Compress blocks in parallel with --jobs N. Utilize multiple CPU cores for faster packing on large models.

👁

Partial Access via KMCReader

Read individual files or tensors from an archive without full decompression. Built-in block, file, and tensor indexes.

🎯

Selective Extraction

Extract only what you need with --only and --tensor flags. No full unpack required.

📑

Block / File / Tensor Indexes

Three-level indexing for efficient lookups. Block offsets, file-to-block maps, and tensor-to-block maps built on open.

🎨

Tensor-Aware Codecs

BytePlane and FloatPlane codecs exploit tensor structure for better compression on FP32, BF16, and FP16 data.

📄

safetensors & GGUF Metadata

Full metadata parsing for safetensors (shards, dtypes, shapes) and GGUF (quantization types, tensor layout).

🔀

LoRA / PEFT & Checkpoint Workflows

Specialized workflows for LoRA adapters and training checkpoints with artifact-type detection and metadata.

🔒

Deterministic & Lossless

Every byte preserved exactly. SHA-256 verification at file and block level. No pickle used anywhere.

Get Up and Running

Install KMC and start compressing your AI models in minutes.

Installation
# Clone and install
git clone https://github.com/smouj/kimari-microcompress.git
cd kimari-microcompress
pip install -e ".[all]"

# Or with optional dependencies
pip install -e ".[safetensors]"  # Enhanced safetensors parsing
pip install -e ".[zipnn]"         # ZipNN benchmark comparison
Basic Usage
# Pack a model directory
kmc pack ./my-model ./my-model.kmc

# Pack with tensor-aware mode
kmc pack ./my-model ./my-model.kmc --tensor-aware

# Verify integrity
kmc verify ./my-model.kmc

# Unpack archive
kmc unpack ./my-model.kmc ./restored-model/

KMCReader — Partial Access API

Read individual files or tensors without decompressing the entire archive.

KMCReader Example
from kmc.reader import KMCReader

with KMCReader("model.kmc") as reader:
    # List available files and tensors
    files = reader.list_files()
    tensors = reader.list_tensors()

    # Read a single file (only needed blocks decompressed)
    config = reader.read_file("config.json")

    # Read a specific tensor by name
    weight_bytes = reader.read_tensor("model.layers.0.mlp.down_proj.weight")

    # Extract a file to disk
    reader.extract_file("config.json", "./output/")

    # Extract a tensor to disk
    reader.extract_tensor("model.layers.0.mlp.down_proj.weight", "./output/")

    # Get file/tensor metadata
    file_info = reader.get_file_info("config.json")
    tensor_info = reader.get_tensor_info("model.layers.0.mlp.down_proj.weight")
Tip: KMCReader builds block, file, and tensor indexes on open, enabling efficient partial reads. Only the blocks needed for a requested file or tensor are read and decompressed.

Command-Line Interface

All major commands and flags at a glance.

Common Commands
# Core commands
kmc pack SOURCE OUTPUT               # Compress files/dirs into .kmc
kmc pack-lora SOURCE OUTPUT          # Compress LoRA adapter
kmc pack-checkpoint SOURCE OUTPUT    # Compress training checkpoint
kmc unpack ARCHIVE OUTPUT            # Decompress .kmc archive
kmc verify ARCHIVE                   # Full integrity verification
kmc inspect TARGET                   # Inspect archive or model
kmc bench SOURCE OUTPUT              # Benchmark compression

# Selective extraction
kmc unpack archive.kmc ./out --only "*.safetensors"
kmc unpack archive.kmc ./out --tensor "model.layer.0.weight"

# Inspect with flags
kmc inspect ./model/ --tensors        # Show tensor details
kmc inspect ./model/ --lora           # LoRA inspection
kmc inspect ./model/ --checkpoint     # Checkpoint inspection
kmc inspect model.gguf --gguf        # GGUF inspection
kmc inspect archive.kmc --compression  # Compression summary
Flag Command Description
--tensor-aware pack Align blocks to tensor boundaries for safetensors files
--gguf-aware pack Adjust codec selection for quantized GGUF tensors
--codec pack, bench Codec: auto, byteplane, floatplane, zstd, zlib, raw
--jobs N pack Parallel compression with N workers
--only unpack Selective file extraction by glob pattern
--tensor unpack Selective tensor extraction by name
--tensors inspect Show detailed tensor information
--compression inspect Show compression summary with codec usage
--json inspect, bench Output as JSON for scripting
--compare-codecs bench Compare all available codecs

Please Read Before Using

KMC is a storage and transfer tool. Understanding these limitations is critical.

🚫

Not Compressed Inference

KMC does NOT perform compressed inference. It cannot reduce VRAM during model execution. It is designed for storage, transfer, and verification only.

💬

No VRAM Reduction Claims

Do not use KMC as a replacement for quantization. If you need smaller models for inference, use quantization (GGUF Q4_K, GPTQ, AWQ, etc.) instead.

🔒

Lossless Only — No Weight Modification

Compression is lossless and reversible. Every byte is preserved exactly. There is no lossy mode and no weight modification of any kind.

🙁

No Pickle Used Anywhere

KMC never deserializes pickle-based files. Only presence, size, and hash are recorded. No arbitrary code execution risk from pickle.