These are the docs for v1.4.0, the current stable release. Earlier versions stay available from the version menu.

Support LibreYOLO

The best way to help is to star the repo. If you hit a problem or have a suggestion, open an issue, and code contributions are very welcome.

GitHub r/LibreYOLO

Two companion guides go deeper on specialized topics. The LibreVLM guide covers the vision-language tier (Qwen3-VL, Florence-2), which generates text that LibreYOLO parses into boxes. That is a different thing from open-vocabulary detection, new in v1.3.1, which uses purpose-built detectors conditioned on text and is documented on this page. The experimental tasks guide covers additional experimental workflows, including LoRA / DoRA fine-tuning.

Introduction

v1.4.0 validation scope

The heavily tested path is detection, training and inference for YOLO9 and RF-DETR, including RF-DETR segmentation.

Other model families and tasks are available but experimental. Multi-GPU training got a correctness overhaul in v1.4.0 and is much stronger than in v1.3.x, but it is still outside the validated scope.

LibreYOLO is an MIT-licensed computer-vision toolkit. v1.4.0 ships a broad catalogue across detection, segmentation, classification, depth, restoration, OCR and more, but the validated support surface is intentionally narrow:

YOLO9 detection - the CNN path.
RF-DETR detection - the transformer path.
RF-DETR segmentation - the heavily tested segmentation path.

We recommend those paths as the default choice for new projects because they receive the heaviest testing around detection, training and inference. Other supported families and tasks work through the same unified LibreYOLO() factory, but they are experimental in v1.4.0. Use them if you have a specific reason.

python

1 from libreyolo import LibreYOLO, SAMPLE_IMAGE
2 
3 # Default: YOLO9 detection
4 model = LibreYOLO("LibreYOLO9c.pt")
5 result = model(SAMPLE_IMAGE, conf=0.25, save=True)
6 
7 print(f"Detected {len(result)} objects")
8 print(result.boxes.xyxy)
9 print(result.saved_path)

Key features

Heavy testing and recommended defaults for YOLO9 detection, RF-DETR detection, and RF-DETR segmentation
Unified LibreYOLO() factory for checkpoints, exported artifacts, and runtime loading
Detection, instance / semantic / panoptic segmentation, pose, classification, depth, restoration, background removal, OCR, point localization, and gaze through one consistent API
Image, directory, and video inference (with optional tiled inference for large frames)
Built-in multi-object tracking: ByteTrack, OC-SORT, BoT-SORT, and Deep OC-SORT with ReID
Per-family training augmentation control with a declarative support spec that warns when a knob is ignored
PyTorch-native quantization: fp16 / bf16 / fp8 / int8 / int4 recipes with QAT and QAD recovery
ONNX, TorchScript, TensorRT, OpenVINO, NCNN, CoreML, and TFLite export with embedded metadata, plus matching runtime backends
COCO-compatible validation with mAP metrics, plus segmentation, pose, panoptic, matte, and OCR validators
A libreyolo command-line tool for predict / train / val / export / quantize
Accepts any image format: file paths, URLs, PIL, NumPy, PyTorch tensors, raw bytes

What's new in v1.4.0

15 new model families, including SegFormer (semantic), SwinIR and Real-ESRGAN (super-resolution), BiRefNet (background removal), ZipDepth and Depth Anything 3 (depth), PP-OCR (text), SigLIP2 (zero-shot classification), SAM 3, EdgeTAM and PicoSAM3 (promptable segmentation), and OmDet-Turbo and OV-DEIM (open-vocabulary detection).
Three new tasks: panoptic, matte, and ocr, each with its own result payload and validator.
A documented augmentation system. Every training augmentation knob now has a per-family support spec, the CLI warns when a family ignores a parameter you set, and this page finally has a full Data Augmentation section.
A quantization stack: model.quantize() and libreyolo quantize with nine recipes, honest simulation-based accuracy, QAT / QAD recovery, and packed low-bit checkpoints.
Two new trackers: BoT-SORT and Deep OC-SORT (appearance ReID), alongside ByteTrack and OC-SORT.
A multi-GPU correctness overhaul: correct DDP sharding everywhere, globally reduced loss normalizers, SyncBatchNorm defaults for BatchNorm-heavy families, and loud setup errors instead of silently wrong runs.
YOLOv7 training (the family was inference-only in v1.3.1), LoRA fine-tuning across seven more families, DINOv2 foundation-teacher distillation, and a TFLite runtime backend.

Compatibility notes in v1.4.0

Checkpoints only move forward. Checkpoints that use the new task strings (panoptic, matte, ocr) or finalized quantization state are not loadable by v1.3.1. Everything v1.3.x wrote still loads in v1.4.0, and Results and LibreEoMT keep full v1.3 positional-argument compatibility.
Some fine-tune defaults changed because the old ones were harmful: PicoDet (lr0 0.1 to 0.01) and DEIM (lr0 4e-4 to 1e-4, min_lr_ratio 0.5 to 0.05). Pass the old values explicitly to reproduce upstream COCO recipes.
Training results can shift where augmentation defaults were fixed: semantic segmentation now applies HSV jitter by default, restoration training adds coupled vertical flip and rot90, and AdamW no longer decays BatchNorm / bias parameters.
model.train(profile=True) keeps training after the profiled window instead of stopping. Pass profile_then_stop=True for the old behavior.
libreyolo models --json schema changed (task-suffixed CLI names, new keys); formats and info JSON gained keys. Update scripts that parse them.

Compatibility

Use this matrix as the quick v1.4.0 support map. ✓ marks a supported path, prev is a research preview, and empty cells are not currently supported. YOLO9 and RF-DETR detection (plus RF-DETR segmentation) get the heaviest testing and are the recommended starting point; the other families are supported too, so please report an issue if something misbehaves.

Model family	Notes	Inference	Training	Detect	Segment	Semantic	Panoptic	Classify	Pose	OBB	Depth	Point	Restore	Matte	OCR	Gaze	ONNX	TorchScript	TensorRT	OpenVINO	NCNN	CoreML	TFLite
YOLO9	Recommended detect path; int8 / fp8 quantization	✓	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	✓	✓	✓	✓	✓
RF-DETR	Recommended detect + segment; pose / OBB preview; TFLite detect experimental, segment / pose blocked	✓	✓	✓	✓	Not currently supported	Not currently supported	Not currently supported	prev	prev	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	✓	✓	Not currently supported	✓	✓
YOLOX		✓	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	✓	✓	✓	✓	✓
YOLOv7	Trainable as of v1.4.0 (SimOTA); was inference-only	✓	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	✓	✓	✓	Not currently supported	Not currently supported
YOLO9-E2E		✓	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	✓	✓	✓	Not currently supported	Not currently supported
YOLO9-P2	Small objects; VisDrone weights only (non-commercial)	✓	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	✓	✓	✓	Not currently supported	Not currently supported
YOLO-NAS	multi-class pose training new in v1.4.0	✓	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	✓	✓	✓	Not currently supported	Not currently supported
D-FINE	segmentation + dynamic eval sizes new in v1.4.0; LoRA	✓	✓	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	✓	✓	Not currently supported	Not currently supported	Not currently supported
DEIM	fine-tune defaults fixed in v1.4.0; LoRA	✓	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	✓	✓	Not currently supported	Not currently supported	Not currently supported
DEIMv2	LoRA	✓	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	✓	✓	Not currently supported	Not currently supported	Not currently supported
RT-DETR	LoRA	✓	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	✓	✓	Not currently supported	✓	Not currently supported
RT-DETRv2	LoRA	✓	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	✓	✓	Not currently supported	Not currently supported	Not currently supported
RT-DETRv4	dynamic eval sizes new in v1.4.0; LoRA	✓	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	✓	✓	Not currently supported	Not currently supported	Not currently supported
PicoDet	fine-tune defaults fixed in v1.4.0	✓	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	✓	✓	✓	Not currently supported	Not currently supported
RTMDet	RTMDet-Ins segmentation (inference + val) new in v1.4.0	✓	✓	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	✓	✓	Not currently supported	Not currently supported	Not currently supported
EC	LoRA (detect)	✓	✓	✓	✓	Not currently supported	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	✓	✓	Not currently supported	Not currently supported	Not currently supported
EoMT	Semantic + instance + panoptic; instance and panoptic new in v1.4.0; inference and val only	✓	Not currently supported	Not currently supported	✓	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported
SegFormer	New in v1.4.0; semantic b0-b5; ADE20K weights non-commercial	✓	✓	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported
PIDNet	Semantic; inference and val only; ONNX / TorchScript / NCNN / TFLite	✓	Not currently supported	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	Not currently supported	Not currently supported	✓	Not currently supported	✓
DINOv2	semantic / classify / detect (needs transformers)	✓	✓	✓	Not currently supported	✓	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported
MobileNetV4	Classifier (Apache-2.0)	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	Not currently supported	Not currently supported	✓	Not currently supported	✓
ConvNeXt	Classifier (Apache-2.0); LoRA	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	Not currently supported	Not currently supported	✓	Not currently supported	✓
EfficientNetV2	Classifier (Apache-2.0)	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	Not currently supported	Not currently supported	✓	Not currently supported	✓
ResNet	Classifier	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	Not currently supported	Not currently supported	✓	Not currently supported	✓
CLIP	Zero-shot classification	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported
SigLIP2	New in v1.4.0; zero-shot classification, inference-only	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported
Depth Anything V2	ONNX export new in v1.4.0 (fixed resolution, batch 1)	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported
Depth Anything 3	New in v1.4.0; size l at 504, Apache-2.0	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported
ZipDepth	New in v1.4.0; b / bnpu at 384, MIT	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported
FOMO	no auto-download; ONNX export new in v1.4.0	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported
NAFNet	Restoration (denoise / deblur); SIDD denoise weights	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	✓	✓	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported
SwinIR	New in v1.4.0; 4x super-resolution s / m / l, Apache-2.0; inference and val	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported
Real-ESRGAN	New in v1.4.0; x4 / x2 / x4t super-resolution; inference and val	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	✓	✓	Not currently supported	Not currently supported	✓	Not currently supported	✓
BiRefNet	New in v1.4.0; background removal (matte) t / l at 1024	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported
PP-OCR	New in v1.4.0; text detection + recognition; inference and val	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported
YOLO1 / YOLO2 / YOLO3 / YOLO4	Museum tier, inference-only (YOLO1 new in v1.4.0)	✓	Not currently supported	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported
L2CS	Inference-only	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	✓	✓	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported	Not currently supported

The export columns summarize the canonical support matrix that ships in v1.4.0; query it exactly with libreyolo formats --family ... or libreyolo info --model ... --json (see Export). The promptable-segmentation tier (SAM, SAM 2, SAM 3, MobileSAM, EdgeTAM, PicoSAM3), the open-vocabulary tier (Grounding DINO, OWLv2, OmDet-Turbo, OV-DEIM), and the VLM tier live outside the LibreYOLO() factory and are not rows here; see their sections. CoreML exports produce .mlpackage bundles and require libreyolo[coreml]: macOS only, no INT8, and no embedded NMS for RF-DETR, D-FINE, DEIM, DEIMv2, or EC.

Installation

Requirements

Python 3.10+
PyTorch 2.4+ and torchvision 0.19+

From PyPI

bash

1 pip install libreyolo

v1.4.0 is the current release on PyPI, and it is what these docs describe. Everything on this page works from the published package: you do not need a source install.

From source

bash

1 git clone https://github.com/LibreYOLO/libreyolo.git
2 cd libreyolo
3 pip install -e .

Optional dependencies

bash

1 # ONNX export and inference
2 pip install libreyolo[onnx]
3 # or: pip install onnx onnxsim onnxruntime
4 
5 # RT-DETR compatibility extra (currently no extra packages)
6 pip install libreyolo[rtdetr]
7 
8 # RF-DETR support
9 pip install libreyolo[rfdetr]
10 # or: pip install transformers
11 
12 # TensorRT export and inference (NVIDIA GPU)
13 pip install libreyolo[tensorrt]
14 # Installs TensorRT CUDA 12 Python packages on Linux/Windows.
15 # Host driver/CUDA compatibility still matters.
16 
17 # OpenVINO export and inference (Intel CPU/GPU/VPU)
18 pip install libreyolo[openvino]
19 # INT8 export also needs: pip install nncf
20 
21 # NCNN export and inference
22 pip install libreyolo[ncnn]
23 # or: pip install pnnx ncnn
24 
25 # TFLite export + LiteRT runtime backend (Python 3.12+)
26 pip install libreyolo[tflite]
27 # "litert" is an alias extra: pip install libreyolo[litert]
28 
29 # Tracking API compatibility extra
30 pip install libreyolo[tracking]
31 # Tracking dependencies are part of the base install; Deep OC-SORT's ReID
32 # embedder weights auto-download on first use.
33 
34 # CoreML export and inference (macOS only for runtime)
35 pip install libreyolo[coreml]
36 # or: pip install coremltools
37 
38 # L2CS gaze optional auto-download helper
39 pip install libreyolo[gaze]
40 
41 # Promptable segmentation (LibreSAM: SAM-1, SAM-2, SAM 3, MobileSAM,
42 # EdgeTAM, PicoSAM3)
43 pip install libreyolo[sam]
44 
45 # Open-vocabulary detection (Grounding DINO, OWLv2, OmDet-Turbo, OV-DEIM)
46 pip install libreyolo[openvocab]
47 
48 # LibreLabel AI assist (SAM click-to-mask)
49 pip install libreyolo[label]
50 
51 # Zero-shot classification
52 pip install libreyolo[clip]       # CLIP
53 pip install libreyolo[siglip2]    # SigLIP2 tokenizer (SentencePiece)
54 
55 # Validation and training plots
56 pip install libreyolo[plots]
57 
58 # SenseNova Vision preview
59 pip install libreyolo[sensenova]
60 
61 # Converter-only dependencies for CLIP and SigLIP2 checkpoints
62 pip install libreyolo[clip-convert]
63 pip install libreyolo[siglip2-convert]
64 
65 # LoRA fine-tuning (peft)
66 pip install libreyolo[lora]
67 
68 # Experiment loggers
69 pip install libreyolo[tensorboard]   # or [mlflow], [wandb]
70 
71 # EoMT instance / panoptic segmentation
72 pip install libreyolo[eomt]
73 
74 # Install every optional LibreYOLO extra
75 pip install libreyolo[all]

If using uv, the most reliable path is an isolated venv per extra:

bash

1 # ONNX environment
2 uv venv .venv-onnx
3 uv pip install --python .venv-onnx/bin/python -e '.[onnx]'
4 
5 # RT-DETR environment
6 uv venv .venv-rtdetr
7 uv pip install --python .venv-rtdetr/bin/python -e '.[rtdetr]'
8 
9 # Repeat with .[rfdetr], .[openvino], .[ncnn], .[coreml], .[gaze], .[tracking], or .[tensorrt] as needed

This avoids mutating the project environment and keeps optional dependencies isolated. Vendor-specific extras such as TensorRT, OpenVINO, NCNN, and CoreML may still require platform-specific native packages.

Quickstart

For the most tested path, pick single-GPU YOLO9 detection, RF-DETR detection, or RF-DETR segmentation. They load through the same factory, accept the same inputs, and return the same Results object, so you can swap between them without changing surrounding code.

YOLO9 - CNN flagship

python

1 from libreyolo import LibreYOLO, SAMPLE_IMAGE
2 
3 # Use the official checkpoint name and let the factory resolve the details
4 model = LibreYOLO("LibreYOLO9c.pt")
5 
6 # Run on a single image (SAMPLE_IMAGE ships with the package)
7 result = model(SAMPLE_IMAGE)
8 
9 print(f"Found {len(result)} objects")
10 print(result.boxes.xyxy)   # bounding boxes (N, 4)
11 print(result.boxes.conf)   # confidence scores (N,)
12 print(result.boxes.cls)    # class IDs (N,)

RF-DETR - transformer flagship

python

1 from libreyolo import LibreYOLO, SAMPLE_IMAGE
2 
3 # Same factory, same call shape - just point at an RF-DETR checkpoint
4 model = LibreYOLO("LibreRFDETRs.pt")
5 result = model(SAMPLE_IMAGE)
6 
7 print(f"Found {len(result)} objects")
8 print(result.boxes.xyxy)

Save annotated output

python

1 result = model(SAMPLE_IMAGE, save=True)
2 print(result.saved_path)   # e.g. runs/detect/predict/parkour.jpg

Process a directory

python

1 results = model("images/", save=True, batch=4)
2 for r in results:
3     print(f"{r.path}: {len(r)} detections")

Available Models

Recommended validated path: YOLO9 detection or RF-DETR detection / segmentation

Detection, training and inference for these models receive the heaviest testing. Treat other families, tasks, and multi-GPU workflows as experimental in v1.4.0.

LibreYOLO v1.4.0 ships two validated flagship families plus a broad catalogue of supported models: fifteen families are new in this release alone. Every checkpoint-based model loads through the same LibreYOLO() factory, but only the validated paths below should be treated as heavily tested.

YOLO9 - CNN flagship

Recommended

Default: LibreYOLO9c.ptHeavily tested: detection, training and inferenceDetect-only in v1.4.0Quantizable: int8 / fp8

Size	Code	Input size	Use case	Detection checkpoint
Tiny	`"t"`	640	Fast inference	LibreYOLO9t.pt
Small	`"s"`	640	Balanced	LibreYOLO9s.pt
Medium	`"m"`	640	Higher accuracy	LibreYOLO9m.pt
Compact	`"c"`	640	Best accuracy	LibreYOLO9c.pt

YOLO9 is detection-only in v1.4.0. The non-detect flagship variants (including the old -seg checkpoints) were removed in v1.3.0; for segmentation use RF-DETR or the experimental segmentation families below.

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("LibreYOLO9c.pt")   # detection

RF-DETR - transformer flagship

Recommended

Recommended transformer pathHeavily tested: detection, segmentation, training and inferenceResearch preview: pose, OBBLoRA + all quantization recipes

Size	Code	Input size	Use case	Detection checkpoint
Nano	`"n"`	384	Edge	LibreRFDETRn.pt
Small	`"s"`	512	Balanced	LibreRFDETRs.pt
Medium	`"m"`	576	Higher accuracy	LibreRFDETRm.pt
Large	`"l"`	704	Maximum accuracy	LibreRFDETRl.pt

LibreYOLO ships the Apache-clean RF-DETR detect sizes N/S/M/L on the Hugging Face org. The XL/2XL tiers are intentionally not shipped.

Heavily tested Segmentation: LibreRFDETRn-seg.pt, LibreRFDETRs-seg.pt, LibreRFDETRm-seg.pt, LibreRFDETRl-seg.pt. The larger -seg sizes (x, xx) carry the upstream RF-DETR seg-XL / seg-2XL weights under a non-commercial license: check the model card before commercial use. See the Segmentation section.

Research preview Pose: LibreRFDETRx-pose.pt (ported from RF-DETR GroupPose). OBB: LibreRFDETRn-obb.pt, LibreRFDETRs-obb.pt, LibreRFDETRm-obb.pt, LibreRFDETRl-obb.pt (oriented boxes, uses detection input sizes). These checkpoints are trained for six vehicle classes: bike, bus, car, other_vehicle, taxi, and truck. They are not COCO-80 models. Treat both pose and OBB as research previews, not validated paths.

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("LibreRFDETRs.pt")           # detect (validated)
4 # model = LibreYOLO("LibreRFDETRs-seg.pt")     # segment (validated)
5 # model = LibreYOLO("LibreRFDETRx-pose.pt")    # pose  (research preview)
6 # model = LibreYOLO("LibreRFDETRn-obb.pt")     # obb   (research preview)

Additional detection families

Detection-capable families that share the same factory and API surface as the validated paths. These are experimental in v1.4.0. Each checkpoint name links to its model card on the LibreYOLO org; pass any name to LibreYOLO() and the factory fetches it on first use.

Family	Status	Tasks	Checkpoints
YOLOX	Experimental	detect	LibreYOLOXn.pt, LibreYOLOXt.pt, LibreYOLOXs.pt, LibreYOLOXm.pt, LibreYOLOXl.pt, LibreYOLOXx.pt
YOLOv7	Experimental	detect (trainable as of v1.4.0)	LibreYOLO7b.pt
YOLO9-E2E	Experimental	detect	LibreYOLO9E2Et.pt, LibreYOLO9E2Es.pt, LibreYOLO9E2Em.pt, LibreYOLO9E2Ec.pt
YOLO-NAS	Experimental	detect, pose	LibreYOLONASs.pt, LibreYOLONASm.pt, LibreYOLONASl.pt, LibreYOLONASn-pose.pt, LibreYOLONASs-pose.pt, LibreYOLONASm-pose.pt, LibreYOLONASl-pose.pt
D-FINE	Experimental	detect, segment (new in v1.4.0)	LibreDFINEn.pt, LibreDFINEs.pt, LibreDFINEm.pt, LibreDFINEl.pt, LibreDFINEx.pt, LibreDFINEn-seg.pt, LibreDFINEs-seg.pt, LibreDFINEm-seg.pt, LibreDFINEl-seg.pt, LibreDFINEx-seg.pt
DEIM	Experimental	detect	LibreDEIMn.pt, LibreDEIMs.pt, LibreDEIMm.pt, LibreDEIMl.pt, LibreDEIMx.pt
DEIMv2	Experimental	detect	LibreDEIMv2atto.pt, LibreDEIMv2femto.pt, LibreDEIMv2pico.pt, LibreDEIMv2n.pt, LibreDEIMv2s.pt, LibreDEIMv2m.pt, LibreDEIMv2l.pt, LibreDEIMv2x.pt
RT-DETR	Experimental	detect	LibreRTDETRr18.pt, LibreRTDETRr34.pt, LibreRTDETRr50.pt, LibreRTDETRr50m.pt, LibreRTDETRr101.pt, LibreRTDETRl.pt, LibreRTDETRx.pt
RT-DETRv2	Experimental	detect	LibreRTDETRv2r18.pt, LibreRTDETRv2r34.pt, LibreRTDETRv2r50.pt, LibreRTDETRv2r50m.pt, LibreRTDETRv2r101.pt
RT-DETRv4	Experimental	detect	LibreRTDETRv4s.pt, LibreRTDETRv4m.pt, LibreRTDETRv4l.pt, LibreRTDETRv4x.pt
PicoDet	Experimental	detect	LibrePICODETs.pt, LibrePICODETm.pt, LibrePICODETl.pt
RTMDet	Experimental	detect, segment (RTMDet-Ins, inference + val)	LibreRTMDett.pt, LibreRTMDets.pt, LibreRTMDetm.pt, LibreRTMDetl.pt, LibreRTMDetx.pt, LibreRTMDett-seg.pt, LibreRTMDets-seg.pt, LibreRTMDetm-seg.pt, LibreRTMDetl-seg.pt, LibreRTMDetx-seg.pt
EdgeCrafter	Experimental	detect, pose, segment	LibreECs.pt, LibreECm.pt, LibreECl.pt, LibreECx.pt, LibreECs-pose.pt, LibreECm-pose.pt, LibreECl-pose.pt, LibreECx-pose.pt, LibreECs-seg.pt, LibreECm-seg.pt, LibreECl-seg.pt, LibreECx-seg.pt

Hosting note: YOLO-NAS checkpoints (plain text above) are hosted on Deci's CDN under their proprietary weights license, not on the LibreYOLO Hugging Face org. The factory still downloads them automatically on first use. DAMO-YOLO was removed in v1.3.0 and is no longer loadable.

New model families in v1.4.0

v1.4.0 adds fifteen model families. The checkpoint-based ones load through the same LibreYOLO() factory; the promptable-segmentation and open-vocabulary entries are constructed directly (see their sections). All of them are experimental.

Family	Task	Sizes	Checkpoints / how to load
SegFormer	semantic	b0-b5 (512; b5 at 640)	LibreSegformerb0-sem.pt, LibreSegformerb1-sem.pt, LibreSegformerb2-sem.pt, LibreSegformerb3-sem.pt, LibreSegformerb4-sem.pt, LibreSegformerb5-sem.pt
SwinIR	restore (4x super-resolution)	s / m / l	LibreSwinIRs-restore.pt, LibreSwinIRm-restore.pt, LibreSwinIRl-restore.pt
Real-ESRGAN	restore (super-resolution)	x4 / x2 / x4t	LibreRealESRGANx4-restore.pt, LibreRealESRGANx2-restore.pt, LibreRealESRGANx4t-restore.pt
BiRefNet	matte (background removal)	t / l (1024)	LibreBiRefNetl-matte.pt; the t weights are not rehosted yet
ZipDepth	depth	b / bnpu (384)	LibreZipDepthb-depth.pt, LibreZipDepthbnpu-depth.pt
Depth Anything 3	depth	l (504)	LibreDepthAnything3l-depth.pt
PP-OCR (v5)	ocr	t / l (960)	LibrePPOCRt-ocr.pt, LibrePPOCRl-ocr.pt
SigLIP2	zero-shot classify	b16 / so400m	LibreSigLIP2b16-cls.pt, LibreSigLIP2so400m-cls.pt
YOLOv1	detect (museum tier)	t / b (448, VOC-20)	LibreYOLO1b.pt; the tiny weights are lost upstream
SAM 3	promptable segmentation	large (1008)	`LibreSAM3()`: gated Hugging Face weights, Meta SAM license
EdgeTAM	promptable segmentation	edge (1024)	`LibreEdgeTAM()`: image inference only, Apache-2.0
PicoSAM3	promptable ROI segmentation	96 px	`LibrePicoSAM3()`: native port, ONNX-only export
OmDet-Turbo	open-vocabulary detect	t	`LibreOpenVocab("omdet-turbo")`
OV-DEIM	open-vocabulary detect (NMS-free)	s / m / l	`LibreOpenVocab("ov-deim")`; weights CC BY-NC 4.0
SenseNova Vision	7-task multimodal preview	7B	experimental: not yet in the CLI, UI, or model inventory; weights CC BY-NC 4.0

Existing families also grew tasks: EoMT adds instance segmentation and panoptic checkpoints, RTMDet adds RTMDet-Ins instance segmentation (inference and validation), and D-FINE adds experimental segmentation with published weights.
Licensing varies per family. SwinIR, EdgeTAM and Depth Anything 3 are Apache-2.0 end to end. SegFormer code is Apache-2.0 but the converted NVIDIA ADE20K weights are non-commercial (the download shows a license notice first). OV-DEIM and SenseNova weights are CC BY-NC 4.0. SAM 3 weights are gated on Hugging Face under the Meta SAM license. Check the model card before commercial use.

Task families beyond detection

Families carried over from earlier releases, each documented in its task section. DINOv2 needs pip install libreyolo[rfdetr] (transformers).

Family	Task	Documented in
MobileNetV4 / ConvNeXt / EfficientNetV2 / ResNet	classify	Classification
CLIP / SigLIP2	zero-shot classify	Classification
DINOv2	semantic, classify, detect	Semantic Segmentation
PIDNet / EoMT / SegFormer	semantic	Semantic Segmentation
EoMT	panoptic	Panoptic Segmentation
Depth Anything V2 / Depth Anything 3 / ZipDepth	depth	Depth Estimation
NAFNet / SwinIR / Real-ESRGAN	restore	Restoration & Upscaling
BiRefNet	matte	Background Removal
PP-OCR	ocr	OCR
FOMO	point	Point Localization

Promptable, open-vocabulary and VLM tiers: LibreSAM (promptable segmentation, libreyolo[sam]), LibreOpenVocab (open-vocabulary detection, libreyolo[openvocab]) and the LibreVLM tier of vision-language detectors (libreyolo[vlm]) are separate categories that load Hugging Face snapshots and are not routed through the LibreYOLO() checkpoint factory. Their weights inherit each upstream model's license.

Specialized models

Family	Status	Tasks	Checkpoints
L2CS	Experimental	gaze (inference-only) - see Gaze Estimation	LibreL2CSr50.pt

L2CS architecture sizes include r18, r34, r50, r101, and r152, but the upstream-published Gaze360 checkpoint is ResNet-50. Install libreyolo[gaze] for the optional download helper, or pass a local checkpoint path for other sizes. L2CS weights are not hosted by LibreYOLO (the Gaze360 dataset license forbids redistribution).

Factory function

Use the LibreYOLO() factory for every model and runtime. Give it an official checkpoint name or exported artifact path, then let it choose the right model family, task, class count, and runtime:

python

1 from libreyolo import LibreYOLO
2 
3 # Default: YOLO9 detection
4 model = LibreYOLO("LibreYOLO9c.pt")
5 
6 # Flagship transformer: RF-DETR
7 model = LibreYOLO("LibreRFDETRs.pt")
8 model = LibreYOLO("LibreRFDETRs-seg.pt")        # validated segmentation
9 
10 # The task suffix selects the task
11 model = LibreYOLO("LibreMobileNetV4s-cls.pt")   # classification (Apache, ImageNet-1k)
12 model = LibreYOLO("LibreDINOv2n.pt")            # semantic segmentation
13 model = LibreYOLO("LibreDepthAnythingV2s-depth.pt")  # monocular depth
14 model = LibreYOLO("LibreFOMOs-point.pt")        # point localization (local weights)
15 
16 # New in v1.4.0
17 model = LibreYOLO("LibreSegformerb2-sem.pt")    # semantic segmentation
18 model = LibreYOLO("LibreEoMTb-panoptic.pt")     # panoptic segmentation
19 model = LibreYOLO("LibreSwinIRm-restore.pt")    # 4x super-resolution
20 model = LibreYOLO("LibreBiRefNetl-matte.pt")    # background removal
21 model = LibreYOLO("LibrePPOCRt-ocr.pt")         # OCR (text detection + recognition)
22 model = LibreYOLO("LibreZipDepthb-depth.pt")    # depth
23 
24 # Exported deployment formats
25 model = LibreYOLO("model.onnx")                 # ONNX Runtime
26 model = LibreYOLO("model.engine")               # TensorRT
27 model = LibreYOLO("model.mlpackage")            # CoreML (macOS)
28 model = LibreYOLO("model_openvino/")            # OpenVINO (directory)
29 model = LibreYOLO("model_ncnn/")                # NCNN (directory)
30 model = LibreYOLO("model.tflite")               # TFLite / LiteRT (new in v1.4.0)

For recognized official checkpoint filenames, LibreYOLO can auto-download missing weights. For custom filenames, point at an explicit local path. Keep new projects on YOLO9 detection or RF-DETR detection / segmentation; other families, tasks, and the new families are experimental in v1.4.0.

Tasks & Filenames

LibreYOLO uses a uniform filename convention so the factory can detect family, size, and task from the checkpoint name alone:

text

1 Libre<FAMILY><size>[-<task>].pt

Task suffixes

Task	Canonical name	Filename suffix	Owned by
Detection	`"detect"`	(none - implicit)	most families (default)
Instance segmentation	`"segment"`	`-seg`	RF-DETR, EdgeCrafter, RTMDet-Ins, D-FINE, EoMT
Semantic segmentation	`"semantic"`	`-sem`	DINOv2, PIDNet, EoMT, SegFormer
Panoptic segmentation	`"panoptic"`	`-panoptic`	EoMT (new in v1.4.0)
Pose estimation	`"pose"`	`-pose`	YOLO-NAS, EdgeCrafter, RF-DETR (preview)
Oriented boxes	`"obb"`	`-obb`	RF-DETR (preview)
Classification	`"classify"`	`-cls`	MobileNetV4, ConvNeXt, EfficientNetV2, ResNet, DINOv2; CLIP and SigLIP2 zero-shot
Monocular depth	`"depth"`	`-depth`	Depth Anything V2 / 3, ZipDepth
Image restoration	`"restore"`	`-restore`	NAFNet, SwinIR, Real-ESRGAN
Background removal	`"matte"`	`-matte`	BiRefNet (new in v1.4.0)
OCR	`"ocr"`	`-ocr`	PP-OCR (new in v1.4.0)
Point localization	`"point"`	`-point`	FOMO
Gaze estimation	`"gaze"`	`-gaze`	L2CS

Detection is implicit (no suffix), following the common YOLO convention. The factory accepts aliases at the API boundary ("detection", "seg","keypoints", "cls", etc.); only the canonical names above appear in filenames. A task is available only when it is in that family's supported-task set.

Resolution precedence

When you load a model, the task is resolved in this order:

text

1 explicit task=  →  checkpoint["task"]  →  filename suffix  →  family default

python

1 from libreyolo import LibreYOLO
2 
3 # 1. Filename suffix decides → segment
4 model = LibreYOLO("LibreRFDETRs-seg.pt")
5 
6 # 2. Override regardless of filename
7 model = LibreYOLO("custom_weights.pt", task="segment")
8 
9 # 3. Detection is implicit
10 model = LibreYOLO("LibreYOLO9c.pt")  # task="detect"

Per-family task support

Family	v1.3.1 status	Default	Supported tasks
YOLO9	detect single-GPU heavily tested; multi-GPU experimental	detect	detect
RF-DETR	detect and segment single-GPU heavily tested; pose and OBB research preview	detect	detect, segment, pose, obb
YOLOX	experimental	detect	detect
YOLOv7	experimental; trainable as of v1.4.0	detect	detect
YOLO9-E2E	experimental	detect	detect
YOLO9-P2	experimental (small objects)	detect	detect
YOLO-NAS	experimental; multi-class pose training new in v1.4.0	detect	detect, pose
D-FINE	experimental; segment new in v1.4.0	detect	detect, segment
DEIM / DEIMv2	experimental	detect	detect
RT-DETR / RT-DETRv2 / RT-DETRv4	experimental	detect	detect
PicoDet	experimental	detect	detect
RTMDet	experimental; RTMDet-Ins segment (inference and val) new in v1.4.0	detect	detect, segment
EdgeCrafter (EC)	experimental	detect	detect, pose, segment
YOLO1 / YOLO2 / YOLO3 / YOLO4	museum tier (inference-only); YOLO1 new in v1.4.0	detect	detect
PIDNet	experimental	semantic	semantic (inference and val only)
EoMT	experimental; instance and panoptic new in v1.4.0	semantic	semantic, segment, panoptic (inference and val only)
SegFormer	new in v1.4.0, experimental	semantic	semantic
DINOv2	experimental	semantic	semantic, classify, detect
MobileNetV4 / ConvNeXt / EfficientNetV2 / ResNet	experimental	classify	classify
CLIP / SigLIP2	experimental; SigLIP2 new in v1.4.0	classify	zero-shot classify (inference-only)
Depth Anything V2 / Depth Anything 3 / ZipDepth	experimental; DA3 and ZipDepth new in v1.4.0	depth	depth (inference and val only)
NAFNet	experimental	restore	restore
SwinIR / Real-ESRGAN	new in v1.4.0, experimental	restore	restore (inference and val only)
BiRefNet	new in v1.4.0, experimental	matte	matte (inference and val only)
PP-OCR	new in v1.4.0, experimental	ocr	ocr (inference and val only)
FOMO	experimental	point	point
L2CS	experimental	gaze	gaze (inference-only)

Three tiers sit outside the LibreYOLO() factory and are imported directly instead: LibreSAM (promptable segmentation, now including SAM 3, EdgeTAM and PicoSAM3), LibreOpenVocab (open-vocabulary detection, now including OmDet-Turbo and OV-DEIM), and LibreVLM. They are not checkpoint families, so LibreYOLO("sam_b") and friends will not resolve.

Legacy YOLO baselines

The museum tier holds the historical lineage so you can reproduce old baselines against modern ones with one API. These are inference-only: none of them can be trained in LibreYOLO, and they are not the path to pick for new work. Reach for YOLO9 or RF-DETR instead.

Family	Checkpoints	Input size	Weights license
LibreYOLO1 (new in v1.4.0)	LibreYOLO1b.pt	448 (fixed)	Public domain
LibreYOLO2	LibreYOLO2{t,b}.pt	416 / 608	Public domain
LibreYOLO3	LibreYOLO3{t,b,spp}.pt	416 / 416 / 608	Public domain
LibreYOLO4	LibreYOLO4{t,b}.pt	416 / 608	Public domain

YOLO1 predicts the 20 VOC classes; YOLO2 / 3 / 4 are COCO-80. YOLO1 ships size b only: the original tiny-yolov1 weights are lost upstream. YOLOv7 left this list in v1.4.0: it is now trainable (SimOTA loss) and lives with the detection families above. It is ported from the MIT-licensed MultimediaTechLab/YOLO, deliberately not from the GPL-3.0 reference implementation, so it is safe to use commercially.

YOLO9-P2, for small objects

LibreYOLO9P2 adds a stride-4 detection scale to YOLO9. That extra high-resolution head is what makes it worth the cost when your objects are tiny in frame, which is the classic aerial and drone-footage problem. It trains and exports like YOLO9.

One published checkpoint ships: LibreYOLO9P2s-visdrone.pt, trained on VisDrone. There is no COCO-pretrained P2 checkpoint. Note the licence carefully: the VisDrone weights are CC BY-NC-SA 3.0, so they are non-commercial. Train your own P2 weights on a permissive dataset if you need commercial use.

Two rough edges remain in v1.4.0. TFLite export is not available for P2 or the museum families, and the CLI cannot resolve the variant filename, so load the VisDrone checkpoint from Python.

Examples

text

1 # Detection (implicit)
2 LibreYOLO9c.pt
3 LibreRFDETRs.pt
4 LibreRTDETRr50.pt
5 
6 # Instance segmentation (-seg)
7 LibreRFDETRs-seg.pt
8 LibreECm-seg.pt
9 
10 # Semantic segmentation (-sem)
11 LibreDINOv2n.pt          # semantic is DINOv2's default; -sem optional
12 LibreSegformerb0-sem.pt  # SegFormer requires the -sem suffix
13 
14 # Panoptic segmentation (-panoptic)
15 LibreEoMTs-panoptic.pt
16 
17 # Pose (-pose)
18 LibreYOLONASn-pose.pt
19 LibreECs-pose.pt
20 LibreRFDETRx-pose.pt     # preview
21 
22 # Oriented boxes (-obb)
23 LibreRFDETRn-obb.pt      # preview
24 
25 # Classification (-cls)
26 LibreMobileNetV4s-cls.pt
27 LibreConvNeXtt-cls.pt
28 LibreEfficientNetV2b0-cls.pt
29 # LibreDINOv2 classify checkpoints are not publicly shipped in v1.4.0
30 LibreSigLIP2b16-cls.pt   # zero-shot
31 
32 # Depth (-depth)
33 LibreDepthAnythingV2s-depth.pt
34 LibreZipDepthb-depth.pt
35 
36 # Restoration / super-resolution (-restore)
37 LibreNAFNetl-restore-sidd.pt
38 LibreSwinIRm-restore.pt
39 LibreRealESRGANx4-restore.pt
40 
41 # Background removal (-matte)
42 LibreBiRefNetl-matte.pt
43 
44 # OCR (-ocr)
45 LibrePPOCRt-ocr.pt
46 
47 # Point (-point)
48 LibreFOMOs-point.pt
49 
50 # Gaze (-gaze optional; only task for L2CS)
51 LibreL2CSr50.pt

Deprecated aliases

LibreYOLORTDETR and LibreYOLORFDETR are old names for LibreRTDETR and LibreRFDETR respectively. They still resolve with a DeprecationWarning - update imports when convenient.

Prediction

The single-GPU prediction path is heavily tested for YOLO9 detection, RF-DETR detection, and RF-DETR segmentation. Other families and tasks use the same API but are experimental in v1.4.0.

Basic prediction

python

1 result = model("image.jpg")

All prediction parameters

python

1 result = model(
2     "image.jpg",
3     conf=0.25,            # confidence threshold (default: 0.25)
4     iou=0.45,             # NMS IoU threshold (default: 0.45)
5     imgsz=640,            # input size override (default: model's native)
6     device="auto",        # "auto", "cpu", "mps", "0", "cuda:0", ...
7     classes=[0, 2, 5],    # filter to specific class IDs (default: all)
8     max_det=300,          # max detections per image (default: 300)
9     augment=False,        # test-time augmentation where implemented
10     save=True,            # save annotated image (default: False)
11     batch=4,              # directory batch size
12     stream=False,         # video only: yield frame results instead of a list
13     vid_stride=1,         # video only: process every N-th frame
14     show=False,           # video only: display annotated frames
15     tiling=False,         # large-image tiled detection
16     overlap_ratio=0.2,    # tile overlap ratio
17     output_path="out/",   # images: directory; video: final file path
18     color_format="auto",  # "auto", "rgb", or "bgr"
19     output_file_format="png",  # output format: "jpg", "png", "webp"
20 )

model.predict(...) is an alias for model(...).

Supported input formats

LibreYOLO accepts images in any of these formats:

python

1 # File path (string or pathlib.Path)
2 result = model("photo.jpg")
3 result = model(Path("photo.jpg"))
4 
5 # URL
6 result = model("https://example.com/image.jpg")
7 result = model("s3://bucket/image.jpg")
8 result = model("gs://bucket/image.jpg")
9 
10 # PIL Image
11 from PIL import Image
12 img = Image.open("photo.jpg")
13 result = model(img)
14 
15 # NumPy array (HWC or CHW, RGB or BGR, uint8 or float32)
16 import numpy as np
17 arr = np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8)
18 result = model(arr)
19 
20 # OpenCV (BGR) - specify color_format
21 import cv2
22 frame = cv2.imread("photo.jpg")
23 result = model(frame, color_format="bgr")
24 
25 # PyTorch tensor (CHW or NCHW)
26 import torch
27 tensor = torch.randn(3, 640, 640)
28 result = model(tensor)
29 
30 # Raw bytes
31 with open("photo.jpg", "rb") as f:
32     result = model(f.read())
33 
34 # BytesIO
35 from io import BytesIO
36 result = model(BytesIO(open("photo.jpg", "rb").read()))
37 
38 # Directory of images
39 results = model("images/", batch=4)

Working with results

Every prediction returns a Results object (or a list of them for directories):

python

1 result = model("image.jpg")
2 
3 # Number of detections
4 len(result)  # e.g., 5
5 
6 # Bounding boxes in xyxy format (x1, y1, x2, y2)
7 result.boxes.xyxy        # tensor of shape (N, 4)
8 
9 # Bounding boxes in xywh format (center_x, center_y, width, height)
10 result.boxes.xywh        # tensor of shape (N, 4)
11 
12 # Confidence scores
13 result.boxes.conf        # tensor of shape (N,)
14 
15 # Class IDs
16 result.boxes.cls         # tensor of shape (N,)
17 
18 # Combined data: [x1, y1, x2, y2, conf, cls]
19 # Tracking adds a track_id column before conf/cls.
20 result.boxes.data        # shape (N, 6), or (N, 7) when tracked
21 
22 # Metadata
23 result.orig_shape        # (height, width) of original image
24 result.path              # source file path (or None)
25 result.names             # {0: "person", 1: "bicycle", ...}
26 
27 # Move to CPU / convert to numpy
28 result_cpu = result.cpu()
29 boxes_np = result.boxes.numpy()

Class filtering

Filter detections to specific class IDs:

python

1 # Only detect people (class 0) and cars (class 2)
2 result = model("image.jpg", classes=[0, 2])

Batched in-memory inference

model.predict() accepts a list or tuple of in-memory images (NumPy arrays, PIL images, or tensors) and runs them as a true stacked-forward batch. Set batch > 1 to actually batch the forward pass on families that support it; a list of results is returned, one per input.

python

1 import numpy as np
2 from libreyolo import LibreYOLO
3 
4 model = LibreYOLO("LibreYOLO9c.pt")
5 
6 frames = [
7     np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8),
8     np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8),
9     np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8),
10 ]
11 
12 results = model(frames, batch=4)   # list/tuple -> true batched inference
13 for r in results:
14     print(len(r), r.boxes.xyxy.shape)

Model info

model.info() returns a JSON-friendly dict of family, size, task, parameter counts, input size, and class names, and logs a human-readable summary when verbose=True.

python

1 meta = model.info(detailed=False, verbose=True)
2 # meta -> {"family": ..., "size": ..., "task": ..., "params": ..., "imgsz": ..., "names": {...}, ...}

Tiled Inference

For images much larger than the model's input size (e.g., satellite imagery, drone footage), tiled inference splits the image into overlapping tiles, runs detection on each, and merges results.

Tiling is detection-only in v1.4.0. It rejects segmentation masks, and it cannot be combined with augment=True.

python

1 result = model(
2     "large_aerial_image.jpg",
3     tiling=True,
4     overlap_ratio=0.2,   # 20% overlap between tiles (default)
5     save=True,
6 )
7 
8 # Extra metadata on tiled results
9 result.tiled           # True
10 result.num_tiles       # number of tiles used
11 result.saved_path      # output directory when save=True
12 result.tiles_path      # directory containing per-tile crops
13 result.grid_path       # grid visualization image

When save=True with tiling, LibreYOLO saves:

final_image.jpg - full image with all merged detections drawn
grid_visualization.jpg - image showing tile grid overlay
tiles/ - individual tile crops
metadata.json - tiling parameters and detection counts

If the image is already smaller than the model's input size, tiling is skipped automatically.

Video Inference

Pass any video file to a flagship and LibreYOLO auto-detects the format from the extension. Supported: .mp4, .avi, .mov, .mkv, .webm, .gif, and other common containers.

Save annotated video

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("LibreYOLO9c.pt")
4 results = model("clip.mp4", save=True)
5 # Saved under runs/detect/predict*/clip.mp4

For video, output_path must be a complete filename such as out/clip.mp4, not a directory. In v1.4.0 the per-frame Results objects do not populate saved_path; use the requested path or the default shown above.

Stream results (memory-flat)

For long videos, pass stream=True to get a generator. Each iteration yields the Results for one frame - no full list buffered in RAM.

python

1 for result in model("long_clip.mp4", stream=True):
2     print(f"frame {result.frame_idx}: {len(result)} detections")

Frame subsampling

python

1 # Process every 2nd frame (halves compute and saved fps)
2 results = model("clip.mp4", vid_stride=2, save=True)

Live preview

python

1 # Display annotated frames in an OpenCV window while processing
2 results = model("clip.mp4", show=True)

VideoSource / VideoWriter for custom pipelines

When you need full control of decoding and encoding - custom frame transforms, mixing tracker output, writing to a non-default codec - use the building blocks directly:

python

1 from libreyolo import LibreYOLO
2 from libreyolo.utils.video import VideoSource, VideoWriter
3 
4 model = LibreYOLO("LibreYOLO9c.pt")
5 
6 with VideoSource("clip.mp4", vid_stride=1) as src, \
7      VideoWriter("out.mp4", fps=src.fps, width=src.width, height=src.height) as out:
8     for frame_bgr, frame_idx in src:
9         result = model(frame_bgr, color_format="bgr")
10         # ... draw, transform, etc.
11         out.write_frame(frame_bgr)

Tracking

LibreYOLO ships four motion trackers that consume Results from any detector and add persistent track IDs: ByteTrack (default), OC-SORT (more robust to occlusion and non-linear motion), and, new in v1.4.0, BoT-SORT (camera-motion compensation) and Deep OC-SORT (appearance ReID on top of OC-SORT). Select one with tracker="bytetrack" / "ocsort" / "botsort" / "deepocsort" on model.track(). Tracking is most tested with single-GPU YOLO9 detection and RF-DETR detection; other detection families are experimental in v1.4.0.

Install

bash

1 pip install libreyolo[tracking]   # compatibility extra; tracking deps ship in base dev install

Video tracking helper

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("LibreYOLO9c.pt")
4 
5 for result in model.track(
6     "clip.mp4",
7     track_conf=0.25,
8     iou=0.45,
9     save=True,             # writes runs/track/<video_stem>.mp4 by default
10     vid_stride=1,
11 ):
12     print(result.frame_idx, result.track_id)

model.track() is a generator for video files. It runs detection frame by frame, uses the lower ByteTrack confidence internally for recovery, and yields Results with result.track_id and result.boxes.id populated.

Basic loop

python

1 from libreyolo import LibreYOLO, ByteTracker
2 from libreyolo.utils.video import VideoSource
3 
4 model = LibreYOLO("LibreYOLO9c.pt")
5 tracker = ByteTracker()
6 
7 with VideoSource("clip.mp4") as src:
8     for frame_bgr, frame_idx in src:
9         result = model(frame_bgr, color_format="bgr", conf=0.1)
10         tracked = tracker.update(result)
11 
12         for i in range(len(tracked.boxes)):
13             track_id = int(tracked.boxes.id[i])
14             xyxy = tracked.boxes.xyxy[i].tolist()
15             cls = int(tracked.boxes.cls[i])
16             print(f"frame {frame_idx} - id {track_id} cls {cls} {xyxy}")

After tracker.update(), result.boxes.id holds the track IDs and result.boxes.is_track is True.

TrackConfig knobs

python

1 from libreyolo import ByteTracker, TrackConfig
2 
3 cfg = TrackConfig(
4     track_high_thresh=0.25,           # first-stage match threshold
5     track_low_thresh=0.1,             # second-stage (low-conf recovery)
6     new_track_thresh=0.25,            # minimum conf to start a new track
7     match_thresh=0.8,                 # IoU cost cutoff (stage 1)
8     match_thresh_low=0.5,             # IoU cost cutoff (stage 2)
9     match_thresh_unconfirmed=0.7,     # IoU cost cutoff for unconfirmed tracks
10     track_buffer=30,                  # frames to keep lost tracks before removal
11     frame_rate=30,                    # scales track_buffer
12     fuse_score=True,                  # multiply IoU by detection score
13     minimum_consecutive_frames=1,     # frames to confirm a new track
14 )
15 tracker = ByteTracker(config=cfg)

Reset between clips

python

1 tracker.reset()   # clears tracked / lost / removed lists and the ID counter

OC-SORT (occlusion-robust)

Select OC-SORT with tracker="ocsort" on model.track(). ByteTrack stays the default. With OC-SORT, track_conf maps to the tracker's det_thresh (for ByteTrack it maps to track_high_thresh).

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("LibreYOLO9c.pt")
4 
5 for result in model.track(
6     "clip.mp4",
7     tracker="ocsort",      # "bytetrack" (default) or "ocsort"
8     track_conf=0.25,       # maps to OC-SORT det_thresh
9     iou=0.45,
10     save=True,
11 ):
12     print(result.frame_idx, result.track_id)

Pass an OCSortConfig for full control. Supplying a config instance selects the tracker by type, so the tracker= string is then ignored.

python

1 from libreyolo import LibreYOLO, OCSortConfig
2 
3 cfg = OCSortConfig(
4     det_thresh=0.25,     # boxes above this drive association and spawn new tracks
5     max_age=30,          # frames a track survives without an observation
6     min_hits=3,          # consecutive hits before a track is reported
7     iou_threshold=0.3,   # minimum IoU for a valid association
8     delta_t=3,           # frame span used to estimate velocity direction
9     inertia=0.2,         # weight of the velocity-direction (momentum) term
10     use_byte=False,      # enable the BYTE low-score recovery pass
11 )
12 
13 model = LibreYOLO("LibreYOLO9c.pt")
14 for result in model.track("clip.mp4", tracker_config=cfg, save=True):
15     print(result.frame_idx, result.track_id)

BoT-SORT (camera-motion compensation)

New in v1.4.0

BoT-SORT extends the ByteTrack association scheme with camera-motion compensation (CMC): it estimates global frame motion with sparse optical flow and warps predicted track positions before matching. That makes it the tracker to try when the camera moves: handheld footage, drones, vehicle-mounted cameras. The LibreYOLO port is motion-only (no ReID branch).

python

1 from libreyolo import LibreYOLO, BoTSortConfig
2 
3 model = LibreYOLO("LibreYOLO9c.pt")
4 
5 # Select by name with defaults
6 for result in model.track("drone.mp4", tracker="botsort", save=True):
7     print(result.frame_idx, result.track_id)
8 
9 # Or configure it: a config instance selects the tracker by type
10 cfg = BoTSortConfig(
11     track_high_thresh=0.25,
12     track_buffer=30,
13     enable_cmc=True,              # camera-motion compensation on (default)
14     cmc_method="sparseOptFlow",   # the shipped CMC estimator
15     cmc_downscale=2,              # estimate flow at half resolution
16 )
17 for result in model.track("drone.mp4", tracker_config=cfg, save=True):
18     print(result.frame_idx, result.track_id)

BoTSortTracker and BoTSortConfig are exported at the top level for the manual tracker.update(result) loop, same as ByteTracker.

Deep OC-SORT (appearance ReID)

New in v1.4.0

Deep OC-SORT adds an appearance-embedding branch to OC-SORT, so tracks that disappear behind an occluder can be re-identified by how they look, not just where they were heading. The default embedder is an OSNet-AIN model auto-downloaded from LibreYOLO/LibreReID-osnet on first use; it runs on the same device as the detector. That extra forward pass makes Deep OC-SORT the slowest of the four trackers: reach for it when identity switches, not speed, are your problem.

python

1 from libreyolo import LibreYOLO
2 from libreyolo.tracking import DeepOCSortConfig
3 
4 model = LibreYOLO("LibreYOLO9c.pt")
5 
6 # Defaults: OSNet-AIN embedder, auto-downloaded
7 for result in model.track("mall.mp4", tracker="deepocsort", save=True):
8     print(result.frame_idx, result.track_id)
9 
10 # Tune the appearance term, or plug in your own embedder
11 cfg = DeepOCSortConfig(
12     det_thresh=0.25,
13     embedder="osnet_ain_x0_25",   # or a callable: (frame, boxes_xyxy) -> (N, D) features
14     w_association_emb=0.75,       # weight of appearance vs motion in matching
15     alpha_fixed_emb=0.95,         # EMA smoothing of per-track embeddings
16 )
17 for result in model.track("mall.mp4", tracker_config=cfg, save=True):
18     print(result.frame_idx, result.track_id)

DeepOCSortTracker and DeepOCSortConfig live in libreyolo.tracking (they are not top-level exports). Custom embedder callables are supported: pass any function that maps a frame and (N, 4) boxes to (N, D) features.

Choosing a tracker

Tracker	Select with	Strength	Cost
ByteTrack	`tracker="bytetrack"`	Fast, simple, the default	Lowest
OC-SORT	`tracker="ocsort"`	Occlusion and non-linear motion	Low
BoT-SORT	`tracker="botsort"`	Moving cameras (CMC)	Medium (optical flow per frame)
Deep OC-SORT	`tracker="deepocsort"`	Identity switches, re-ID after long occlusion	Highest (embedder forward pass)

Ensembling

Detection onlyPython API only

LibreEnsemble runs two or more detection models and fuses their detections into one ordinary Results. Fusion happens at the detection level, never at the tensor level, so every member keeps its own input size, normalization and NMS. That is what lets you mix a grid detector with a DETR, or a .pt checkpoint with an exported backend, in the same ensemble.

Class spaces do not have to match. Members are unified by class name: identical name maps pass straight through, otherwise LibreYOLO builds the union and remaps each member into it. Boxes are only fused with boxes of the same unified class, and a class that only one member knows passes through unfused.

Fuse two detectors

python

1 from libreyolo import LibreEnsemble
2 
3 # Weighted Boxes Fusion (the default), keep only boxes BOTH models found
4 ens = LibreEnsemble(["LibreYOLO9s.pt", "LibreRFDETRs.pt"], min_votes=2)
5 
6 result = ens("image.jpg", conf=0.25)
7 print(result.boxes.xyxy)
8 print(result.names)     # the unified (union) class map
9 print(result.speed)     # per-member timings plus fusion

Trust weights and per-member settings

weights expresses how much you trust each member (set it proportional to each model's validation mAP). conf, iou and device accept either one value for everyone or one value per member.

python

1 ens = LibreEnsemble(
2     ["LibreYOLO9s.pt", "LibreRFDETRs.pt"],
3     weights=[1.0, 1.4],     # pull fused coordinates and scores toward member 2
4     fusion="wbf",           # "wbf" | "wbf_seeded" | "nms" | your own callable
5     fusion_iou=0.55,        # IoU used to CLUSTER boxes for fusion, not member NMS
6     min_votes=1,            # keep boxes confirmed by at least N members
7 )
8 
9 result = ens("image.jpg", conf=[0.25, 0.4])   # per-member confidence

Bring an outside detector

ExternalDetector wraps any callable that returns boxes, so a model that is not a LibreYOLO model can still join the ensemble. The function receives a PIL image and must return boxes in original-image pixels.

python

1 from libreyolo import LibreEnsemble, ExternalDetector
2 
3 def my_detector(image):
4     # -> (boxes_xyxy, scores, labels) in ORIGINAL-image pixels
5     return boxes, scores, labels
6 
7 member = ExternalDetector(my_detector, names={0: "person"})
8 ens = LibreEnsemble(["LibreYOLO9s.pt", member])

Limits

Detection members only. Any member whose task is not detect raises. Segmentation and pose models cannot be ensembled.
At least two members are required.
min_votes above 1 requires a voting fusion. It raises with fusion="nms"; use wbf or wbf_seeded.
Images and image directories only. Video sources and stream=True raise: run the members individually for video.
ens.val() and ens.export() both raise. Validate and export the members individually.
batch is accepted for API parity but images are still processed one at a time.

Instance Segmentation

v1.4.0 validation scope

The heavily tested path is detection, training and inference for YOLO9 and RF-DETR, including RF-DETR segmentation.

RF-DETR segmentation is the heavily tested segmentation path in v1.4.0. Around it sit four experimental options: EdgeCrafter (-seg), and, new in v1.4.0, RTMDet-Ins, EoMT instance segmentation, and D-FINE segmentation. YOLO9 does not ship a segmentation head: it is detect-only.

Run segmentation

python

1 from libreyolo import LibreYOLO
2 
3 # RF-DETR segmentation, the heavily tested segmentation path
4 model = LibreYOLO("LibreRFDETRs-seg.pt")
5 result = model("photo.jpg")
6 
7 # EdgeCrafter segmentation is also available but experimental
8 # model = LibreYOLO("LibreECs-seg.pt")
9 
10 # Segmentation returns boxes + masks
11 print(result.boxes.xyxy)        # bounding boxes (N, 4)
12 print(result.boxes.cls)         # class IDs (N,)
13 print(result.masks.data.shape)  # (N, H, W) tensor of binary masks

Mask representations

python

1 # Raw bitmasks
2 result.masks.data        # tensor (N, H, W) - original image resolution
3 
4 # Polygon contours (one ndarray of (M, 2) per instance)
5 result.masks.xy          # absolute pixel coords
6 result.masks.xyn         # normalized to [0, 1]
7 
8 # Move / convert like Boxes
9 result.masks.cpu()
10 result.masks.numpy()

Save annotated output

save=True draws boxes and translucent mask overlays automatically.

python

1 model("photo.jpg", save=True)

New segmentation families in v1.4.0

Three families gained instance segmentation in v1.4.0, all experimental and all returning the same boxes + masks results:

Family	Checkpoints	Scope
RTMDet-Ins	LibreRTMDett-seg.pt, LibreRTMDets-seg.pt, LibreRTMDetm-seg.pt, LibreRTMDetl-seg.pt, LibreRTMDetx-seg.pt	Inference and validation; training not implemented
EoMT (instance)	LibreEoMTl-seg.pt, LibreEoMTl-seg-1280.pt	Inference and validation; the -1280 variant trades speed for high-resolution masks
D-FINE	LibreDFINEn-seg.pt, LibreDFINEs-seg.pt, LibreDFINEm-seg.pt, LibreDFINEl-seg.pt, LibreDFINEx-seg.pt	Inference, validation and experimental training; CLI train auto-transfers detect weights to segment

D-FINE segmentation has verified parity with its ONNX and TensorRT exports. Note that tiled inference (tiling=True) rejects segmentation models loudly rather than silently dropping masks.

Training segmentation

RF-DETR segmentation uses the RF-DETR COCO-format training pipeline and is part of the heavily tested single-GPU scope. EdgeCrafter and D-FINE segmentation training are available but experimental. For segmentation-specific augmentation (copy-paste), see Data Augmentation.

Semantic Segmentation

ExperimentalSegFormer + TTA new in v1.4.0

Semantic segmentation labels every pixel with a class. It is a different task from instance segmentation: there are no object instances and no boxes, just one dense class map. Pass task="semantic" (aliases: semseg, sem), and read the result from result.semantic_mask. On a semantic model result.boxes and result.masks are both None.

Models

Family	Checkpoints	Backbone	Trained on	Classes	Train?
LibrePIDNet	LibrePIDNet{s,m,l}-sem.pt	PIDNet 3-branch CNN	Cityscapes	19	No
LibreEoMT	LibreEoMTl-sem.pt	DINOv2 ViT-L	ADE20K	150	No
LibreSegformer (new in v1.4.0)	LibreSegformer{b0..b5}-sem.pt	MiT hierarchical transformer	ADE20K	150	Yes (fine-tune)
LibreDINOv2	none published: you train it	DINOv2 + dense head	your data	you choose	Yes

The families behave quite differently, so pick deliberately. LibrePIDNet is a fast real-time CNN carrying Cityscapes road-scene classes. LibreEoMT carries ADE20K's 150 general scene classes. Both ship pretrained weights and cannot be trained inside LibreYOLO: fine-tune them upstream and convert the result.

LibreSegformer, new in v1.4.0, is the middle path: six sizes (b0 to b5, 512 px; b5 at 640) of the SegFormer architecture with a bit-exact reference port, ADE20K pretrained weights, and a fine-tune trainer, so you can start from 150 general classes and adapt to your own. One licensing caveat: the code is Apache-2.0 but the converted NVIDIA ADE20K weights are non-commercial, and the download shows a license notice before fetching them.

LibreDINOv2 is the fine-tuning family without pretrained-head baggage: there is no published LibreDINOv2 semantic checkpoint. You construct it from the pretrained DINOv2 backbone with a fresh dense head and train it on your own masks. Reach for it when your classes are not Cityscapes or ADE20K and you want the strongest features under a fresh head.

Run semantic segmentation

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("LibrePIDNets-sem.pt")   # Cityscapes, 19 classes
4 result = model.predict("street.jpg", save=True)
5 
6 sm = result.semantic_mask     # SemanticMask
7 print(sm.data.shape)          # (H, W) int class ids, on the ORIGINAL image canvas
8 print(sm.classes)             # sorted class ids present, 255 (ignore) excluded
9 print(model.names[13])        # 'car'
10 
11 car = sm.class_mask(13)       # (H, W) bool mask for one class
12 
13 print(result.saved_path)      # saved semantic overlay
14 
15 print(result.boxes, result.masks)   # None None: semantic has no instances

SemanticMask API

python

1 sm = result.semantic_mask
2 
3 sm.data               # (H, W) integer class ids at original resolution
4 sm.orig_shape         # (H, W)
5 sm.classes            # list[int] of ids present, excluding the ignore index
6 sm.class_mask(cid)    # (H, W) bool
7 SemanticMask.IGNORE_INDEX   # 255: the void label, never counted as a class
8 
9 sm.cpu(); sm.numpy()

Validate

Validation reports mean IoU and pixel accuracy. Classes never seen in either the prediction or the ground truth are excluded from the mean rather than scored as zero. fitness is an alias of mIoU, so it is what drives best-checkpoint selection during training.

python

1 metrics = model.val(data="cityscapes.yaml")
2 print(metrics["metrics/mIoU"])
3 print(metrics["metrics/pixel_accuracy"])

bash

1 libreyolo val model=LibrePIDNets-sem.pt data=cityscapes.yaml split=val

Train (LibreDINOv2)

Masks are single-channel lossless images whose pixel value is the class id, paired to each image by filename stem. 255 means ignore and is excluded from both loss and metrics.

bash

1 dataset/
2     images/train/*.jpg
3     images/val/*.jpg
4     masks/train/*.png      # same stem as the image; pixel value = class id
5     masks/val/*.png

python

1 from libreyolo import LibreDINOv2
2 
3 # model_path=None -> pretrained DINOv2 backbone + a fresh dense head
4 model = LibreDINOv2(model_path=None, size="s", task="semantic", nb_classes=19)
5 model.train(data="cityscapes.yaml", epochs=100, batch_size=4, lr=1e-4)

In the dataset YAML, masks_dir names the mask directory (default masks). If you omit it, LibreYOLO rasterizes masks from YOLO polygon labels at load time and appends a background class. label_mapping remaps source pixel values to training ids, and anything unmapped becomes ignore.

Test-time augmentation (new in v1.4.0)

Semantic models now accept augment=True on both predict() and val() (this raised in v1.3.1). TTA runs a horizontal-flip pass and averages logits, trading roughly 2x inference cost for a small, reliable mIoU gain. It is implemented for PIDNet, SegFormer, EoMT and DINOv2 semantic.

python

1 model = LibreYOLO("LibreSegformerb2-sem.pt")
2 result = model.predict("street.jpg", augment=True)   # flip-TTA
3 metrics = model.val(data="ade20k.yaml", augment=True)

Limits

Export support is family-specific. PIDNet supports ONNX, TorchScript, NCNN, and TFLite. DINOv2 and EoMT semantic support ONNX and TorchScript. SegFormer export remains blocked. Check libreyolo formats --family ... for the exact tier.
Only LibreDINOv2 and LibreSegformer train. LibrePIDNet.train() and LibreEoMT.train() raise.
Semantic training now applies HSV color jitter by default (new in v1.4.0), so retrained mIoU can shift slightly versus v1.3.1 runs. The knob comes from the family, not hsv_prob; see Data Augmentation.
EoMT semantic is size l only and locked to imgsz=512 (its checkpoint uses fixed position embeddings), and it cannot batch: val(batch=N) warns and still runs one image at a time.
imgsz divisibility differs per family: PIDNet needs a multiple of 8, EoMT of 16, DINOv2 of 14, and SegFormer of 32. Violations raise.
No tracking for semantic models.
Cityscapes, ADE20K and COCO-Stuff all require a manual download. LibreYOLO ships the dataset YAMLs, not the data.
Raw upstream checkpoints are rejected. Convert with the weights/convert_*_weights.py scripts.

Panoptic Segmentation

New in v1.4.0Inference and val only

Panoptic segmentation answers both questions at once: every pixel gets a class (like semantic segmentation), and countable objects are separated into instances (like instance segmentation). Roads and sky come back as single "stuff" segments; each car and person comes back as its own "thing" segment. The result is one segment-id map plus a per-segment info list, read from result.panoptic.

One family ships the task in v1.4.0: LibreEoMT with COCO-panoptic checkpoints (133 classes, 640 px) in three sizes: LibreEoMTs-panoptic.pt, LibreEoMTb-panoptic.pt, LibreEoMTl-panoptic.pt.

Run panoptic segmentation

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("LibreEoMTb-panoptic.pt")   # task resolved from the -panoptic suffix
4 result = model.predict("street.jpg", save=True)
5 
6 pan = result.panoptic             # PanopticSegmentation
7 print(pan.data.shape)             # (H, W) integer segment ids, original canvas
8 for seg in pan.segments_info:     # one dict per segment
9     print(seg["id"], seg["category_id"], model.names[seg["category_id"]])
10 
11 car_mask = pan.segment_mask(3)    # (H, W) bool mask for one segment id
12 print(result.saved_path)          # saved panoptic overlay
13 
14 # Flip-TTA works here too (new in v1.4.0)
15 result = model.predict("street.jpg", augment=True)

PanopticSegmentation API

python

1 pan = result.panoptic
2 pan.data                 # (H, W) int segment-id map at original resolution
3 pan.segments_info        # list of {"id", "category_id", ...} dicts
4 pan.segment_ids          # ids present in the map
5 pan.segment_mask(sid)    # (H, W) bool mask for one segment
6 
7 pan.cpu(); pan.numpy()

Validate with Panoptic Quality

Validation runs through PanopticValidator and reports Panoptic Quality (PQ), the standard metric that multiplies segmentation quality (average IoU of matched segments) by recognition quality (F1 over segments). augment=True is accepted for flip-TTA.

python

1 metrics = model.val(data="coco_panoptic.yaml")
2 print(metrics["metrics/PQ"])

Limits

Inference and validation only. Panoptic training and export both raise in v1.4.0.
Checkpoints written with the panoptic task string are not loadable by v1.3.1.
On a panoptic model, result.boxes and result.masks are None: everything lives in result.panoptic.

Promptable Segmentation

Python API onlyInference onlySAM 3, EdgeTAM, PicoSAM3 new in v1.4.0

LibreSAM is a separate tier from the detector factory, because a promptable segmenter has a different contract: it runs a heavy image encoder once, then answers cheap spatial prompts (a click, a box) with a mask. There is no fixed class list. Install it with pip install "libreyolo[sam]".

Two things surprise people. First, LibreSAM is a factory function, not a class, and it is deliberately kept outside the LibreYOLO() loader, so LibreYOLO("sam_b") does not work. Import it directly. Second, the whole tier is Python-only: there is no CLI path to it.

Models

Family	Pass to LibreSAM()	Encoder	Notes
SAM-1	"base" (default), "large", "huge"	ViT-B / L / H	Apache-2.0
SAM-2.1	"sam2-tiny", "sam2-small", "sam2-base-plus", "sam2-large"	Hiera	Images only, no video
MobileSAM	"mobilesam"	TinyViT	Fastest; native LibreYOLO port

Those short aliases only work through the LibreSAM() factory. The concrete classes take canonical sizes, so LibreSAM1("base") is right and LibreSAM1("sam_b") raises.

New in v1.4.0: SAM 3, EdgeTAM, PicoSAM3

Three additions span the quality / speed spectrum. Construct them directly; snapshots download on first use.

Model	Construct	Working size	Notes
SAM 3	`LibreSAM3()`	1008	Highest quality; transformers-backed. Weights are gated on Hugging Face under the Meta SAM license: accept the terms and log in before first use.
EdgeTAM	`LibreEdgeTAM()`	1024	Edge-oriented; image inference only; Apache-2.0 end to end.
PicoSAM3	`LibrePicoSAM3()`	96	Native tiny port for ROI segmentation on very small crops; the only SAM-tier model with export (ONNX only).

python

1 from libreyolo import LibreSAM3, LibreEdgeTAM, LibrePicoSAM3
2 
3 model = LibreSAM3()               # gated HF weights (Meta SAM license)
4 r = model.predict("img.jpg", points=[640, 360], labels=[1])
5 
6 model = LibreEdgeTAM()            # Apache-2.0, edge-friendly
7 r = model.predict("img.jpg", bboxes=[100, 100, 500, 500])
8 
9 model = LibrePicoSAM3()           # 96 px ROI segmenter, ONNX-exportable
10 r = model.predict("crop.jpg", bboxes=[8, 8, 88, 88])

Prompt with a click or a box

python

1 from libreyolo import LibreSAM
2 
3 model = LibreSAM("base")          # SAM-1 ViT-B
4 
5 # a single click
6 r = model.predict("img.jpg", points=[640, 360], labels=[1])
7 print(r.masks.data.shape)   # (1, H, W) bool, at the original resolution
8 print(r.boxes.xyxy)         # a tight box derived from the mask
9 print(r.boxes.conf)         # SAM's predicted mask quality, NOT a detection score
10 
11 # a box prompt
12 r = model.predict("img.jpg", bboxes=[100, 100, 500, 500])
13 
14 # segment everything (a coarse grid); lower the grid on CPU, it is slow
15 r = model.predict("img.jpg", points_per_side=16)

Encode once, prompt many times

This is the pattern that makes interactive use fast. The expensive encoder runs once per image and every later prompt reuses the embedding.

python

1 model.set_image("img.jpg")                        # heavy encoder runs ONCE
2 a = model.predict(points=[500, 375], labels=[1])  # cheap: decoder only
3 b = model.predict(bboxes=[100, 100, 200, 200])    # cheap: reuses the embedding
4 model.reset_image()

How prompts are shaped

Nesting depth carries meaning, and this is the single easiest thing to get wrong. Points are plain [x, y] pixels. A label of 1 means include, 0 means exclude.

You pass	It means
points=[x, y]	one object, one point
points=[[x, y], [x, y]]	TWO objects, one point each
points=[[[x, y], [x, y]]]	ONE object, two points

python

1 # refine ONE object with a positive and a negative click
2 r = model.predict(
3     "img.jpg",
4     points=[[[500, 375], [620, 400]]],   # one object, two points
5     labels=[1, 0],                        # include, then exclude
6 )
7 
8 # all three whole-vs-part candidate masks for an ambiguous click
9 r = model.predict("img.jpg", points=[640, 360], labels=[1], multimask=True)

Limits

Images only, across the tier. There is no video segmentation and no memory propagation across frames in v1.4.0 (this includes SAM 2, SAM 3 and EdgeTAM): track() raises. Call predict() per frame.
No training and no validation for any SAM family. Export raises everywhere except PicoSAM3, which exports to ONNX only.
PicoSAM3 accepts only bboxes= ROI prompts. Use LibreSAM2 or LibreSAM3 for points, text, masks, or segment-everything.
Mask prompts (masks=) are not supported and raise. Use points or boxes.
conf here filters on SAM's predicted mask quality, not on detection confidence. Detector intuition does not transfer.
Everything runs in fp32, even on CUDA. This is deliberate: half precision rounds prompt coordinates by several pixels at SAM's 1024px working size, which silently moves where you clicked.
Segment-everything is a simplified grid, not the reference automatic mask generator. It under-segments crowded scenes.
Weights download into ./weights/ relative to your working directory, so running from elsewhere re-downloads.

Open-Vocabulary Detection

Python API onlyOmDet-Turbo + OV-DEIM new in v1.4.0

Give the model a list of class names as text and get real detection boxes back. No training, no labelled data. Change the list and you change what it detects. Install with pip install "libreyolo[openvocab]".

This is not the same as the LibreVLM tier, and the difference matters. These are purpose-built detectors conditioned on text: the detector head returns boxes with real model scores. A VLM instead generates text that LibreYOLO parses into boxes. The rule of thumb: boxes for named classes, use open-vocab; describe or instruct, use a VLM. On licensing, check per family: Grounding DINO, OWLv2 and OmDet-Turbo weights are Apache-2.0, but the OV-DEIM weights are CC BY-NC 4.0 (non-commercial), confirmed with the upstream author.

Models

Pass to LibreOpenVocab()	Class	Backbone	Default conf
"grounding-dino" (default, tiny)	LibreGroundingDINO	Swin-T + BERT	0.25
"grounding-dino-base"	LibreGroundingDINO	Swin-B + BERT	0.25
"owlv2"	LibreOWLv2	ViT-B/16	0.1
"owlv2-large"	LibreOWLv2	ViT-L/14	0.1
"omdet-turbo" (new in v1.4.0)	LibreOMDetTurbo	Swin-T, transformers-backed	0.25
"ov-deim" / "-m" / "-l" (new in v1.4.0)	LibreOVDEIM	DEIM, native NMS-free port	0.25

Detect anything you can name

The vocabulary is set on the model, with set_classes(), and it is sticky across later calls. There is no prompts= or text= argument on predict().

python

1 from libreyolo import LibreOpenVocab
2 
3 model = LibreOpenVocab("grounding-dino")
4 model.set_classes(["person", "dog", "skateboard"])   # sticky vocabulary
5 
6 result = model.predict("street.jpg", conf=0.25, text_threshold=0.25)
7 print(result.boxes.xyxy, result.boxes.conf)
8 print(result.names)     # {0: 'person', 1: 'dog', 2: 'skateboard'}
9 
10 result = model.predict("another.jpg")   # same vocabulary, still set
11 
12 # or set it at construction
13 model = LibreOpenVocab("owlv2", names=["forklift", "pallet"])

Watch out for the lookalike. predict(classes=...) is not the text API. It is the standard integer class-id filter and takes a list of ints. The text vocabulary goes through set_classes().

Practical notes

Short noun phrases work best. "remote control" beats "remote". Phrases that cannot be mapped back to one of your class names unambiguously are dropped, so a missing detection is sometimes a mapping drop rather than a detector miss.
There is no cap on how many classes you may pass. Grounding DINO automatically splits a long vocabulary into chunks that fit its text encoder and runs one forward pass per chunk, so cost grows with vocabulary size. That is the main latency knob you control.
text_threshold is Grounding DINO only. Passing it to the other families raises.
The families score differently, so tune conf per family rather than reusing a number.
OV-DEIM is the interesting speed option: a native, NMS-free port (not a transformers pipeline) in three sizes. Its text features are cached per vocabulary, and v1.4.0 fixed the device-switch crash on that cache. Remember the weights are non-commercial.
OmDet-Turbo is transformers-backed and respects iou= (it did not before v1.4.0).
Expect this to be far slower than a LibreYOLO detector. The honest workflow: use open-vocab to explore or auto-label an open vocabulary, then train a fast detector on the result.

Limits

No CLI. libreyolo predict model=grounding-dino does not work. This tier is reachable only from Python.
No training, no validation, no export, no tracking. All four raise.
imgsz and augment=True are rejected: the processor owns resizing. iou is accepted but ignored, since no LibreYOLO NMS runs here.
Batching gives no speedup: images run one at a time. Everything is fp32.

Pose Estimation

Pose (keypoint) estimation runs on YOLO-NAS (-pose), EdgeCrafter (-pose), and an RF-DETR (-pose) preview. The published checkpoints are single-class ("person") with 17 COCO keypoints. New in v1.4.0, YOLO-NAS pose supports multi-class keypoint training: train on a dataset with several object classes, and multi-class checkpoints now load with their real class count and return real class ids (they were previously forced to single-class person).

Run pose

python

1 from libreyolo import LibreYOLO
2 
3 # YOLO-NAS pose
4 model = LibreYOLO("LibreYOLONASs-pose.pt")
5 result = model("people.jpg")
6 
7 # EdgeCrafter pose
8 # model = LibreYOLO("LibreECs-pose.pt")
9 
10 # Per-person bbox + 17 keypoints
11 print(result.boxes.xyxy)          # person boxes (N, 4)
12 print(result.keypoints.xy.shape)  # (N, 17, 2) pixel coordinates

Preview RF-DETR pose (ported from GroupPose) remains a research preview in v1.4.0.

python

1 # RF-DETR pose preview
2 model = LibreYOLO("LibreRFDETRx-pose.pt")
3 result = model("people.jpg")
4 print(result.keypoints.xy.shape)  # (N, 17, 2)

Keypoint API

python

1 result.keypoints.xy        # (N, K, 2) absolute pixel coords
2 result.keypoints.xyn       # (N, K, 2) normalized to [0, 1]
3 result.keypoints.conf      # (N, K) per-keypoint confidence (None if model doesn't emit it)
4 result.keypoints.has_visible  # (N, K) bool - conf > 0
5 
6 result.keypoints.cpu()
7 result.keypoints.numpy()

Save annotated output

python

1 model("people.jpg", save=True)  # draws boxes + skeleton

Pose training is supported for YOLO-NAS (including multi-class datasets as of v1.4.0); EdgeCrafter pose is currently inference-only. RF-DETR pose is a preview. YOLO9 is detect-only and ships no pose checkpoints. Pose validation under multi-GPU DDP was fixed in v1.4.0 (per-rank file clobbering and a collective deadlock).

Gaze Estimation

Gaze direction estimation is provided by the LibreL2CS family, an L2CS-Net port with a ResNet trunk and two angle-bin classification heads. It is a two-stage model: an upstream face detector locates faces, then the gaze head predicts per-face pitch and yaw in radians. It is inference-only and experimental in v1.4.0. (v1.4.0 also fixed face detection on OpenCV 5 by adding a YuNet detector.)

Install

bash

1 pip install libreyolo[gaze]   # optional Google Drive helper for Gaze360 weights

The published L2CS ResNet-50 weights are trained on Gaze360 and are not mirrored by LibreYOLO. Without the optional helper, pass a local checkpoint path or follow the manual download instructions printed by LibreL2CS.

Two-stage inference

python

1 from libreyolo import LibreYOLO
2 from libreyolo.models.l2cs.face import resolve_face_detector
3 
4 # Gaze head
5 gaze = LibreYOLO("LibreL2CSr50.pt")
6 
7 # Wire any LibreYOLO detector trained on faces
8 face = LibreYOLO("path/to/face-detector.pt")
9 gaze.face_detector = resolve_face_detector(face)
10 
11 result = gaze("portrait.jpg")
12 print(result.boxes.xyxy)    # face boxes
13 print(result.gaze.data)     # (N, 2) tensor - pitch, yaw in radians

Decode angles

python

1 import math
2 
3 for i in range(len(result.gaze)):
4     pitch_rad, yaw_rad = result.gaze.data[i].tolist()
5     pitch_deg = pitch_rad * 180.0 / math.pi
6     yaw_deg = yaw_rad * 180.0 / math.pi
7     print(f"face {i}: pitch={pitch_deg:.1f} deg, yaw={yaw_deg:.1f} deg")

From the CLI: libreyolo predict model=LibreL2CSr50.pt source=portrait.jpg --face-detector path/to/face.pt.

Classification

Whole-image classification spans two supervised paths and a zero-shot path. LibreMobileNetV4 is the production classifier (Apache-2.0 ImageNet-1k weights, exportable to ONNX), with LibreConvNeXt, LibreEfficientNetV2 and LibreResNet as alternatives on the same API. LibreDINOv2 with task=classify is a DINOv2 backbone plus linear probe for transfer learning. v1.4.0 does not publish public -cls checkpoints for it, so construct and train a fresh head; a trained classifier can export to ONNX. For zero-shot (no training, labels as text), use CLIP or, new in v1.4.0, SigLIP2. Classification training gained its own augmentation pack in v1.4.0: auto_augment, erasing, mixup and cutmix.

Family	Checkpoints	Input	Weights	Fine-tune	ONNX export
LibreMobileNetV4	LibreMobileNetV4{s,m,l}-cls.pt	224 / 224 / 256	Apache-2.0 ImageNet-1k (production)	Cross-entropy	Yes
LibreDINOv2 (classify)	Not publicly shipped in v1.4.0	224	Fresh linear head	Linear probe	Yes

LibreMobileNetV4 (production classifier)

Apache-2.0 ImageNet-1k weights

A native MobileNetV4-conv port (derived from timm) whose 1000-class ImageNet-1k weights load bit-identically. Sizes s / m run at 224, l at 256. Checkpoints:

LibreMobileNetV4s-cls.pt, LibreMobileNetV4m-cls.pt, LibreMobileNetV4l-cls.pt

Load and predict. A single image returns one Results; read .probs directly off it (pass a list to get a list back).

python

1 from libreyolo import LibreYOLO
2 
3 # MobileNetV4-conv-Small, Apache-2.0 ImageNet-1k weights (auto-downloaded if missing)
4 model = LibreYOLO("LibreMobileNetV4s-cls.pt")
5 result = model("cat.jpg")            # single image -> one Results
6 
7 probs = result.probs                 # whole-image class vector, length = num classes
8 print(probs.top1, probs.top1conf)    # top-1 class id (int) and its confidence
9 print(probs.top5, probs.top5conf)    # 5 class ids and 5 confidences
10 print(result.names[probs.top1])      # human-readable class name

Fine-tune to a custom class set (ImageFolder layout). The head is rebuilt to the dataset class count automatically; the ImageNet-pretrained backbone transfers cleanly.

python

1 from libreyolo import LibreMobileNetV4
2 
3 model = LibreMobileNetV4(size="s")   # ImageNet-pretrained backbone
4 model.train(
5     data="imagenette160",            # known name, dataset root, or .zip URL
6     epochs=5,
7     batch=64,
8     lr0=1e-3,                        # AdamW + cosine, 1-epoch warmup
9     imgsz=224,
10 )

Validate (top-1 / top-5 accuracy):

python

1 model = LibreYOLO("LibreMobileNetV4s-cls.pt")
2 metrics = model.val(data="imagenette160")
3 print(metrics["metrics/accuracy_top1"])
4 print(metrics["metrics/accuracy_top5"])

Export to ONNX (verified bit-exact against eager). The ONNX graph emits a single logits tensor.

python

1 model = LibreYOLO("LibreMobileNetV4s-cls.pt")
2 path = model.export(format="onnx", imgsz=224)   # single output: logits [batch, num_classes]
3 
4 # Interop note: the ONNX output is RAW LOGITS, not softmaxed. The PyTorch
5 # predict path applies softmax for you; non-Python consumers must apply it
6 # themselves before reading probabilities.

LibreDINOv2 classify (linear probe / transfer)

No public v1.4.0 checkpointsONNX export

A DINOv2-S encoder with a trainable linear head, run at 224. The n / s / m / l sizes control the projector width; all four share the same DINOv2-S encoder. LibreYOLO v1.4.0 does not publicly host LibreDINOv2*-cls.pt checkpoints. Build a fresh classifier and train it on your own ImageFolder dataset.

Build a fresh model with task="classify", train the new head, then use the same Probs prediction surface as MobileNetV4.

python

1 from libreyolo import LibreDINOv2
2 
3 # Fresh DINOv2 backbone + random linear head, sized to the dataset
4 model = LibreDINOv2(size="s", task="classify", nb_classes=3)
5 model.train(data="path/to/imagefolder", epochs=5, lr=1e-4, batch=4)
6 
7 # Validate the same way (top-1 / top-5)
8 metrics = model.val(data="path/to/imagefolder")
9 print(metrics["metrics/accuracy_top1"])
10 
11 result = model("springer.jpg")
12 print(result.probs.top1, result.probs.top1conf)
13 
14 model.export(format="onnx", imgsz=224)

Dataset layout (both families)

Classification uses an ImageNet-style ImageFolder tree (folders, not label files). Class index is assigned by sorted folder name. data= accepts a dataset root, a known name (e.g. imagenette160), or a .zip URL.

text

1 dataset_root/
2   train/                # required; one subfolder per class
3     class_a/img001.jpg
4     class_a/img002.jpg
5     class_b/img003.jpg
6   val/                  # required for validation; same class folders as train
7     class_a/img010.jpg
8     class_b/img011.jpg

Results.probs reference

python

1 probs = result.probs        # Probs payload, 1-D vector of length = num classes
2 probs.data                  # raw tensor / ndarray of class probabilities
3 probs.top1                  # int   - argmax class id
4 probs.top5                  # list  - 5 class ids, highest first
5 probs.top1conf              # float - confidence of the top-1 class
6 probs.top5conf              # 5 confidences, aligned with probs.top5

Zero-shot classification: SigLIP2 and CLIP

SigLIP2 new in v1.4.0Inference-only

Zero-shot classifiers score an image against text labels you choose at runtime: no training, no fixed class list. v1.4.0 adds LibreSigLIP2 (sizes b16 and so400m, native torch port) alongside the existing CLIP family. Both load through the factory and set their vocabulary with set_classes(); SigLIP2 needs pip install "libreyolo[siglip2]" for its SentencePiece tokenizer.

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("LibreSigLIP2b16-cls.pt")
4 model.set_classes(["a forklift", "an empty aisle", "a spill"])
5 
6 result = model.predict("warehouse.jpg")
7 print(model.names[result.probs.top1], float(result.probs.top1conf))
8 
9 # Independent per-label probabilities (sigmoid) instead of softmax
10 result = model.predict("warehouse.jpg", multi_label=True)

SigLIP2's sigmoid training objective makes its multi_label=True scores meaningful on their own, which CLIP-style softmax scores are not: use it when several labels can be true at once. train() raises for both zero-shot families.

MobileNetV4 weights are production grade (Apache-2.0 ImageNet-1k, bit-identical load). LibreDINOv2 classify has no publicly hosted v1.4.0 checkpoint; train a fresh head.
There is no LibreRFDETR classifier since v1.3.0. Classification moved into the dedicated classifier families; legacy LibreRFDETR*-cls checkpoints are rejected on load.
ONNX classify output is raw logits. Apply softmax in non-Python consumers.
Predicting a single image returns one Results. Read result.probs directly, or pass a list and index the list: model(["a.jpg"])[0].probs.
New in v1.4.0: square_resize together with augment now raises instead of silently misbehaving, and the classifier families train multi-GPU via the spawn path.

Depth Estimation

ZipDepth + Depth Anything 3 new in v1.4.0Inference and val only

Monocular depth predicts a dense relative inverse-depth map: higher values are closer to the camera, with no metric unit implied. v1.4.0 has three depth families behind one API, differing mainly in size, license and target hardware:

Family	Sizes (input)	License	Checkpoints
LibreDepthAnythingV2	s / b / l / g (518)	s Apache-2.0; b / l / g CC-BY-NC-4.0	LibreDepthAnythingV2s-depth.pt, LibreDepthAnythingV2b-depth.pt, LibreDepthAnythingV2l-depth.pt; g converts from upstream
LibreDepthAnything3 (new)	l (504)	Apache-2.0	LibreDepthAnything3l-depth.pt
LibreZipDepth (new)	b / bnpu (384)	MIT	LibreZipDepthb-depth.pt, LibreZipDepthbnpu-depth.pt

LibreDepthAnything3 is a separate family from V2 (not an upgrade in place), with a single Apache-2.0 large checkpoint: the quality pick when the license matters. LibreZipDepth is the efficiency pick: MIT-licensed, 384 px, with a bnpu variant whose decoder avoids NPU-hostile ops for edge accelerators.

Run depth estimation

Input imgsz must be divisible by 14 (the DINOv2 patch grid). The depth map is returned on the original image canvas.

python

1 from libreyolo import LibreYOLO
2 
3 # ViT-S encoder, Apache-2.0 weights (commercial use OK)
4 model = LibreYOLO("LibreDepthAnythingV2s-depth.pt")
5 result = model("street.jpg")
6 
7 depth = result.depth_map          # DepthMap payload, (H, W) float on the original canvas
8 print(depth.data.shape)           # (H, W)
9 print(depth.min, depth.max, depth.mean)   # relative inverse depth: higher = closer
10 norm = depth.normalized()         # rescaled to [0, 1] over finite values

DepthMap API

python

1 depth = result.depth_map
2 depth.data          # (H, W) float tensor / ndarray, relative inverse depth
3 depth.min           # min over finite values
4 depth.max           # max over finite values
5 depth.mean          # mean over finite values
6 depth.normalized()  # (H, W) rescaled to [0, 1]; non-finite pixels become 0
7 
8 depth.cpu()
9 depth.numpy()

Zero-shot validation

Validation runs zero-shot through the shared depth validator and reports standard depth metrics (AbsRel, RMSE, and delta thresholds). The validator letterboxes to a fixed square and excludes padded pixels; because predict uses Depth Anything's native keep-aspect resize, non-square val metrics are a documented approximation of predict.

python

1 metrics = model.val(data="depth_dataset.yaml")
2 print(metrics["metrics/abs_rel"])   # absolute relative error (lower is better)
3 print(metrics["metrics/rmse"])      # root mean squared error
4 print(metrics["metrics/delta1"])    # fraction within a 1.25x ratio (higher is better)

Export (new in v1.4.0)

Depth export was unblocked in v1.4.0 for Depth Anything V2 and ZipDepth under a fixed-resolution, batch-1 contract: the exported graph bakes in one input size and no dynamic axes. Depth Anything 3 does not export yet.

python

1 model = LibreYOLO("LibreZipDepthb-depth.pt")
2 model.export(format="onnx")   # fixed resolution, batch 1

Not supported

python

1 model.train(data="...")   # raises NotImplementedError - all depth families are inference + val only

Depth Anything V2 licensing is split: size s is Apache-2.0 and fine for commercial use; b / l / g are CC-BY-NC-4.0 (non-commercial). For commercial use pick V2 size s, Depth Anything 3, or ZipDepth.
Depth is relative inverse depth with no metric unit. Calibrate on your side if you need meters.
For Depth Anything V2, imgsz must be divisible by 14 (the DINOv2 patch grid). Batched predict is disabled because keep-aspect resize yields variable per-image sizes.
Video input works for depth models as of v1.4.0 (it crashed in v1.3.1).

Restoration & Upscaling

SwinIR + Real-ESRGAN new in v1.4.0NAFNet trainable

The restore task takes a degraded image and returns a better one. Unlike most tasks here there is nothing to detect: the output is an image, returned as result.restored. In v1.4.0 the task covers two jobs: cleaning (denoise / deblur, output at input resolution) and, new, super-resolution (output 2x or 4x larger per axis; result.restore_scale tells you the factor).

Family	Job	Sizes	Output scale	Train?
LibreNAFNet	denoise / deblur	s / l	1x	Yes
LibreSwinIR (new)	super-resolution	s / m / l	4x	No
LibreRealESRGAN (new)	super-resolution	x4 / x2 / x4t	4x / 2x / 4x (fast)	No

What a cleaning model actually fixes, whether it denoises or deblurs, is a property of the weights it was trained on, not of the model size. For super-resolution the scale is baked into the checkpoint: there is no scale argument at predict time.

Checkpoints

NAFNet publishes one checkpoint: LibreNAFNetl-restore-sidd.pt, a real-image denoiser trained on SIDD, converted bit-exactly from upstream NAFNet, MIT licensed. For deblurring there is no published checkpoint: convert the upstream GoPro weights with weights/convert_nafnet_weights.py. The plain names LibreNAFNets-restore.pt and LibreNAFNetl-restore.pt are not hosted, so asking for them will fail to download.

Super-resolution ships fully hosted: SwinIR LibreSwinIRs-restore.pt, LibreSwinIRm-restore.pt, LibreSwinIRl-restore.pt (Apache-2.0 code and weights) and Real-ESRGAN LibreRealESRGANx4-restore.pt, LibreRealESRGANx2-restore.pt, LibreRealESRGANx4t-restore.pt. The x4t size is the compact SRVGG variant: much faster, visibly softer.

Clean up an image

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("LibreNAFNetl-restore-sidd.pt")   # SIDD denoiser
4 result = model("noisy.jpg")
5 
6 img = result.restored           # RestoredImage
7 print(img.array.shape)          # (H, W, 3) uint8 RGB, at the original resolution
8 print(result.restore_scale)     # 1 for cleaning models
9 img.save("clean.png")           # save lossless

Cleaning runs at the image's native resolution: the input is padded to a multiple of 16 and cropped back afterwards, so you get the same size out that you put in.

Upscale an image (new in v1.4.0)

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("LibreSwinIRm-restore.pt")        # 4x super-resolution
4 result = model("small.jpg")
5 print(result.restore_scale)         # 4
6 result.restored.save("big.png")     # 4x height, 4x width
7 
8 # Real-ESRGAN: seam-free tiled upscaling for large inputs
9 model = LibreYOLO("LibreRealESRGANx4-restore.pt")
10 result = model("photo.jpg", tile=512)   # process in 512px tiles, bounded VRAM

Save losslessly, or you undo the work

This is the one thing to get right. libreyolo predict --save writes JPEG by default, which re-introduces compression artefacts into an image you just spent a model cleaning up. Ask for PNG.

bash

1 libreyolo predict model=LibreNAFNetl-restore-sidd.pt source=noisy.jpg \
2   save=true output-file-format=png

Train and validate (NAFNet)

NAFNet training takes paired degraded and clean images; SwinIR and Real-ESRGAN are inference and validation only. Validation reports PSNR and SSIM. New in v1.4.0, restoration training applies coupled vertical flip and 90-degree rotation by default (input and target transformed together), so retrained results move slightly versus v1.3.1; see Data Augmentation.

python

1 model = LibreYOLO("LibreNAFNetl-restore-sidd.pt")
2 model.train(data="gopro.yaml", epochs=100)
3 
4 metrics = model.val(data="gopro.yaml")
5 print(metrics["metrics/psnr"], metrics["metrics/ssim"])

Two reporting quirks to expect while training: the console prints PSNR under the mAP50 column heading (a labelling bug, the number is PSNR), and PSNR/SSIM are computed with no border crop, so they are not directly comparable to published NAFNet benchmark figures.
Export: NAFNet supports ONNX (static shapes, imgsz multiple of 16) and TorchScript. SwinIR supports ONNX experimentally and TorchScript. Real-ESRGAN supports ONNX, TorchScript, NCNN, and TFLite. NAFNet TFLite and CoreML remain blocked.

Background Removal

New in v1.4.0Inference and val only

The matte task predicts a per-pixel alpha value in [0, 1]: how much each pixel belongs to the foreground subject. Unlike a binary segmentation mask, a matte captures soft edges (hair, fur, motion blur), which is what makes cutouts look right. v1.4.0 ships LibreBiRefNet, a BiRefNet port at 1024 px in sizes t and l. The LibreBiRefNetl-matte.pt weights are hosted; the t (lite) weights are not rehosted yet pending license confirmation, so convert them locally if you need the small one.

Cut out a subject

python

1 from libreyolo import LibreYOLO
2 from PIL import Image
3 
4 model = LibreYOLO("LibreBiRefNetl-matte.pt")
5 result = model("portrait.jpg")
6 
7 matte = result.matte            # Matte payload
8 print(matte.data.shape)         # (H, W) float32 alpha in [0, 1], original canvas
9 
10 # RGBA cutout: original pixels with the matte as the alpha channel
11 rgba = result.cutout()          # (H, W, 4) uint8
12 Image.fromarray(rgba).save("subject.png")   # transparent background
13 
14 # Or composite yourself
15 alpha = matte.array[..., None]  # (H, W, 1)

save=True writes the matte overlay; on video sources the overlay renders per frame (matte video overlays work as of v1.4.0). Validation runs through MatteValidator against ground-truth alpha maps.

Limits

Inference and validation only. Matte training raises in v1.4.0. Export supports experimental ONNX and fixed-1024 TorchScript; NCNN remains blocked.
Checkpoints written with the matte task string are not loadable by v1.3.1.
Save the cutout as PNG or WebP. JPEG has no alpha channel, so saving a cutout as JPEG silently flattens it.

OCR

New in v1.4.0Inference and val only

The ocr task reads text: a detection stage finds text regions as four-point polygons, then a recognition stage transcribes each one. v1.4.0 ships LibrePPOCR, a PP-OCRv5 port at 960 px in sizes t and l: LibrePPOCRt-ocr.pt, LibrePPOCRl-ocr.pt. Results arrive as result.ocr, an OCRRegions payload pairing every polygon with its text and two confidences (one from the detector, one from the recognizer).

Read text from an image

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("LibrePPOCRt-ocr.pt")
4 result = model("receipt.jpg", save=True)
5 
6 ocr = result.ocr                 # OCRRegions
7 print(ocr.polygons.shape)        # (N, 4, 2) quad corners in pixels
8 for text, conf in zip(ocr.texts, ocr.conf):
9     print(f"{conf:.2f}  {text}")
10 
11 ocr.det_conf                      # (N,) detector scores, separate from recognition
12 print(result.saved_path)          # saved OCR overlay

CLI and validation

libreyolo predict --json emits an ocr array (polygon, text, confidences per region), which makes the CLI directly scriptable for document pipelines. Validation runs through OCRValidator, which matches predictions to ground truth with an optimal one-to-one assignment before scoring.

bash

1 libreyolo predict model=LibrePPOCRt-ocr.pt source=receipt.jpg --json | jq .ocr

Limits

Inference and validation only. OCR training and export raise in v1.4.0.
Checkpoints written with the ocr task string are not loadable by v1.3.1.
On an OCR model, result.boxes is None: regions are polygons in result.ocr, not axis-aligned boxes.

Point Localization

Experimental

LibreFOMO is a FOMO-style point localizer (sizes s / m / l) for centroid-style detection: instead of boxes, each detection is a single image coordinate. Predictions arrive as result.points. Pretrained LibreFOMO weights are not auto-downloaded, so pass a local checkpoint path (or train from scratch, which is experimental and requires allow_experimental=True). New in v1.4.0: FOMO exports to ONNX under the fixed-resolution contract, so a trained point model can leave Python for edge deployment.

python

1 from libreyolo import LibreYOLO
2 
3 # LibreFOMO weights are not hosted by LibreYOLO - pass a local checkpoint
4 model = LibreYOLO("path/to/LibreFOMOm-point.pt")
5 result = model("scene.jpg")
6 
7 points = result.points       # Points payload, (N, 4) rows: x, y, class, confidence
8 print(points.xy)             # (N, 2) absolute pixel coords
9 print(points.xyn)            # (N, 2) normalized to [0, 1]
10 print(points.cls, points.conf)

Annotation (LibreLabel)

libreyolo label starts a local, browser-based annotation tool. It writes LibreYOLO-native label files exactly where the trainer already reads them, so a folder of images becomes a trainable dataset with no conversion step, no cloud account and no database. The server is Python standard library only, and it runs entirely on your machine.

Label a folder of images

bash

1 # open an existing dataset
2 libreyolo label data=path/to/data.yaml
3 
4 # a bare folder works too: LibreYOLO scaffolds the dataset around it
5 libreyolo label data=path/to/images
6 
7 # start on the project home screen and create a project in the browser
8 libreyolo label

Options

Option	Default	What it does
data	(none)	Dataset YAML or a folder. Omit to open the project home screen.
host	127.0.0.1	Interface to bind. See the sharing note below before changing this.
port	8000	Port to bind. Auto-bumps up to port+19 if taken.
device	auto	Device used by the AI assist features.
no_assist	false	Hard-disable every AI assist feature.
no_browser	false	Do not auto-open a browser.
share	false	Bind 0.0.0.0 so teammates on your LAN can label with you.

What you can label

Bounding boxes (detect), polygons (segment) and oriented boxes (obb, with a rotate handle). Keypoints, masks and depth files open read-only, so a save can never silently drop fields it does not understand. Classification labelling is not available yet.

AI assist, and the one rule it never breaks

LibreLabel can pre-label with one of your own detectors, turn a click into a mask with SAM, audit your existing labels for likely mistakes, find near-duplicate images, and detect train/val leakage. No AI path ever writes a label file. Every suggestion is held in memory until a human accepts it. AI assist also never downloads weights: if a checkpoint is not already on disk it refuses and tells you, rather than pulling hundreds of megabytes behind your back.

Box pre-labelling with any in-package detector works on the base install, no extra needed.
SAM click-to-mask needs pip install "libreyolo[label]" and the LibreSAM weights already downloaded.
Assist is task-aware: on an OBB project it is refused entirely, and on a segmentation project only the mask tools stay available.

Export

Export to YOLO, COCO or VOC (or several at once) from the Export dialog in the browser, with reproducible train/val/test splits. Note it is a browser action: there is no CLI export flag. Import is YOLO only, so COCO and VOC are export formats, not entry points.

Sharing, and a trap worth knowing

There is no authentication of any kind. Access is controlled purely by network position, so only share on a network you trust.

The counter-intuitive part: share=true is the safe way to let teammates in. It binds a wildcard address, and because admin rights require a loopback connection, you keep admin on your machine while teammates get a labelling-only view. Binding a specific address instead (host=192.168.1.50) makes your machine indistinguishable from a teammate, which hands full admin to every client on the LAN. Prefer share=true.

Training

v1.4.0 validation scope

The heavily tested path is detection, training and inference for YOLO9 and RF-DETR, including RF-DETR segmentation.

The heavily tested training paths are single-GPU YOLO9 detection, RF-DETR detection, and RF-DETR segmentation. Other model-family trainers and multi-GPU workflows are available but experimental. YOLO9 is detect-only, so there is no YOLO9 segmentation or pose training. New in v1.4.0: YOLOv7 trains (SimOTA loss), SegFormer fine-tunes, and every augmentation knob is documented per family in the Data Augmentation section.

YOLO9 - CNN flagship training

python

1 from libreyolo import LibreYOLO
2 
3 # Fine-tune from a pretrained checkpoint (recommended)
4 model = LibreYOLO("LibreYOLO9c.pt")
5 
6 results = model.train(
7     data="coco128.yaml",     # path to data.yaml (required)
8 
9     # Schedule
10     epochs=300,              # default: 300
11     batch=16,
12     imgsz=640,
13 
14     # Optimizer
15     lr0=0.01,                # initial learning rate
16     optimizer="SGD",         # "SGD", "Adam", "AdamW"
17 
18     # System
19     device="0",              # "" | "cpu" | "cuda" | "0" | "0,1"
20     workers=8,
21     seed=0,
22 
23     # Output
24     project="runs/train",
25     name="yolo9_exp",
26     exist_ok=False,
27 
28     # Training features
29     amp=True,                # automatic mixed precision
30     patience=50,             # early stopping patience
31     resume=False,            # resume from loaded checkpoint
32     pretrained=True,         # transfer-learning init (True, a path, or None)
33     cache="disk",            # cache decoded images: False | True/"ram" | "disk"
34     freeze=10,               # freeze first N groups, or a list of indices / module names
35     save_plots=True,         # write final validation plots to the run dir
36 )
37 
38 print(f"Best mAP50-95: {results['best_mAP50_95']:.3f}")
39 print(f"Best checkpoint: {results['best_checkpoint']}")

After training completes, the model instance is automatically reloaded with the best weights so you can call model(...) immediately. freeze, cache, pretrained, and save_plots are accepted across the trainer-backed families.

RF-DETR - transformer flagship training

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("LibreRFDETRs.pt")
4 
5 results = model.train(
6     data="path/to/data.yaml",
7     epochs=100,
8     batch_size=4,            # NOTE: RF-DETR uses batch_size, not batch
9     lr=1e-4,
10     output_dir="runs/train/rfdetr_exp",
11 )

RF-DETR has its own training signature (batch_size, lr, output_dir) but shares LibreYOLO's dataset loader. Pass a data.yaml for detection or segmentation in either YOLO TXT or native COCO JSON layout - see Dataset Format.

LoRA fine-tuning

Experimental lora=True injects LoRA adapters for low-VRAM fine-tuning: only the adapters (plus the parts that must stay trainable, like detection heads) receive gradients. It requires the optional peft dependency (pip install "libreyolo[lora]"). v1.4.0 extends LoRA well beyond RF-DETR: the supported families are RF-DETR, D-FINE, DEIM, DEIMv2, RT-DETR v1 / v2 / v4, EC, and ConvNeXt (D-FINE and EC detect-only). Unsupported families still raise a clear error rather than ignoring the flag. On export(), adapters are merged into the dense weights, so deployed artifacts need no peft at runtime.

python

1 model = LibreYOLO("LibreRFDETRs.pt")
2 results = model.train(data="data.yaml", epochs=50, lora=True)
3 
4 # Works the same on the newly supported families
5 model = LibreYOLO("LibreDEIMs.pt")
6 results = model.train(data="data.yaml", epochs=50, lora=True)

Experiment loggers

Pass loggers= to stream metrics to TensorBoard, MLflow, or Weights & Biases. Accepts a name ("tensorboard", "mlflow", "wandb"), a configured logger instance, or an iterable mixing both. Each backend is an optional extra (libreyolo[tensorboard], [mlflow], [wandb]).

python

1 from libreyolo import LibreYOLO
2 from libreyolo.training.loggers import MLflowLogger
3 
4 model = LibreYOLO("LibreYOLO9c.pt")
5 
6 # By name
7 model.train(data="coco128.yaml", loggers="tensorboard")
8 
9 # Mix configured instances and names
10 model.train(
11     data="coco128.yaml",
12     loggers=[MLflowLogger(experiment_name="my-exp"), "tensorboard"],
13 )

Loggers are a Python-API feature only. There is no CLI flag for them; the rest of the new training knobs (--task, --cache, --lora, --freeze, --save-plots) are exposed on the CLI.

Training results dict

python

1 {
2     "final_loss": 2.31,
3     "best_mAP50": 0.682,
4     "best_mAP50_95": 0.451,
5     "best_epoch": 87,
6     "save_dir": "runs/train/yolo9_exp",
7     "best_checkpoint": "runs/train/yolo9_exp/weights/best.pt",
8     "last_checkpoint": "runs/train/yolo9_exp/weights/last.pt",
9 }

Resuming training

python

1 # Load the checkpoint with the factory, then resume
2 model = LibreYOLO("runs/train/yolo9_exp/weights/last.pt")
3 results = model.train(data="coco128.yaml", resume=True)

Custom dataset YAML format

data.yaml

1 path: /path/to/dataset
2 train: images/train
3 val: images/val
4 test: images/test  # optional
5 
6 nc: 3
7 names: ["cat", "dog", "bird"]

Additional training paths

Other families have trainer hooks, but they are not the recommended path in v1.4.0. Keep new work on YOLO9 detection or RF-DETR detection/segmentation; use experimental trainers only for compatibility, benchmark reproduction, or targeted research. PicoDet, RTMDet, and EC training require an explicit allow_experimental=True acknowledgement. Note that v1.4.0 fixed harmful fine-tune defaults for PicoDet (lr0 0.1 to 0.01) and DEIM (lr0 4e-4 to 1e-4): pass the old values explicitly if you need to reproduce upstream COCO recipes.

Training from a YAML config

Every model.train(...) accepts cfg="train.yaml" to load all parameters from a file. Explicit kwargs still win over yaml values, so you can use a yaml for the baseline and override individual fields per run.

python

1 model = LibreYOLO("LibreYOLO9c.pt")
2 results = model.train(cfg="configs/yolo9_finetune.yaml")
3 # Override individual fields:
4 # results = model.train(cfg="configs/yolo9_finetune.yaml", epochs=50)

Gradient accumulation

Pass nbs (nominal batch size) to opt into gradient accumulation. The trainer steps the optimizer every nbs / batch forward passes, which lets you train at the recipe's reference batch size on smaller hardware.

python

1 # Effective batch 64 on a single GPU that only fits batch=8
2 model.train(data="coco128.yaml", batch=8, nbs=64)

Distributed training (DDP, overhauled in v1.4.0)

Multi-GPU training got a correctness overhaul in v1.4.0. It is still outside the heavily tested scope, but the failure modes changed from silent to loud:

Correct sharding everywhere. DEIM, D-FINE and YOLO-NAS pose previously trained the full dataset at the full batch on every rank (so extra GPUs bought nothing); they now shard correctly, and loss normalizers are all-reduced globally so gradients match single-GPU training.
SyncBatchNorm defaults on under DDP for the BatchNorm-heavy families (YOLO9, YOLOX, YOLOv7, YOLO-NAS, PicoDet, RTMDet, FOMO), fixing a real multi-GPU convergence degradation from per-rank BN statistics.
Hard setup errors instead of silently wrong runs: a global batch that does not divide by the world size, a per-rank batch below 1 after AutoBatch, or a custom loader that does not shard now raise at setup.
Spawn-path multi-GPU reaches the classifier families (ResNet, ConvNeXt, EfficientNetV2, MobileNetV4) and NAFNet: pass device="0,1" and workers are spawned for you, no torchrun needed.

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("LibreYOLO9c.pt")
4 # Two GPUs: the global batch is split per rank (16 -> 8 + 8)
5 model.train(data="coco128.yaml", epochs=300, batch=16, device="0,1")

device="0,1" (or a list [0, 1]) selects multi-GPU. Under torchrun the launcher owns the process group; outside it, LibreYOLO spawns DDP workers itself. Both paths run the same trainer.

bash

1 # Explicit torchrun launch also works
2 torchrun --nproc_per_node=2 train_yolo9.py

Data Augmentation

Documented + spec-checked in v1.4.0

Training-time augmentation is configured directly on model.train(): mosaic, MixUp, HSV jitter, flips, and the affine warp (rotation, translation, scale, shear, perspective) are all plain keyword arguments. The same knobs work as key=value pairs on libreyolo train, where mosaic= and mixup= are CLI shorthands for mosaic_prob and mixup_prob.

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("LibreYOLO9c.pt")
4 model.train(
5     data="coco128.yaml",
6     epochs=100,
7 
8     # Augmentation knobs (defaults shown in the table below)
9     mosaic_prob=1.0,       # 4-image mosaic
10     mixup_prob=0.5,        # blend in a second sample
11     hsv_prob=1.0,          # HSV color jitter
12     flip_prob=0.5,         # horizontal flip
13     flipud=0.1,            # vertical flip (good for aerial imagery)
14     degrees=10.0,          # random rotation range for the affine warp
15     translate=0.1,         # random translation fraction
16     shear=2.0,             # random shear, degrees
17     perspective=0.0005,    # projective warp magnitude (0 = pure affine)
18     no_aug_epochs=15,      # final epochs with strong augmentation off
19 )

bash

1 libreyolo train model=yolo9-c data=coco128.yaml epochs=100 \
2   mosaic=1.0 mixup=0.5 hsv_prob=1.0 flip_prob=0.5 degrees=10 translate=0.1

The augmentation knobs

These are the base TrainConfig fields. The defaults below are the base values: families override them with tuned recipes (YOLO9 defaults to degrees=0, shear=0 and mixup off, while YOLOX keeps all three on, for example). Print the exact resolved defaults for your model with libreyolo cfg.

Knob	Base default	What it does
`mosaic_prob`	1.0	Probability of building a 4-image mosaic sample.
`mixup_prob`	1.0	Probability of blending in a second sample (MixUp).
`hsv_prob`	1.0	Probability of HSV color jitter.
`flip_prob`	0.5	Horizontal-flip probability.
`flipud`	0.0	Vertical-flip probability. Off by default; useful when the scene has no fixed up (aerial, microscopy).
`degrees`	10.0	Random-rotation range for the affine warp, in degrees.
`translate`	0.1	Random-translation fraction for the affine warp.
`mosaic_scale`	(0.1, 2.0)	Random-scale range for the affine warp.
`shear`	2.0	Random-shear range for the affine warp, in degrees.
`perspective`	0.0	Projective warp magnitude, sampled in [-p, +p]; around 0.0005 is typical. 0 keeps the warp purely affine.
`mixup_scale`	(0.5, 1.5)	Jitter-scale range applied to the MixUp partner image.
`no_aug_epochs`	15	Final epochs trained with strong augmentation disabled, so the model converges on clean images.

Which families honor which knobs

Not every family runs every augmentation: each one trains through the pipeline its recipe came with. v1.4.0 makes this explicit with a declarative spec (libreyolo/data/augment/spec.py) that is pinned to the real pipelines by tests. Every knob has one of three statuses per family, and the CLI now warns whenever you explicitly set a parameter the selected family ignores, so a typo or a wrong assumption no longer fails silently.

Status	Meaning
used	The knob reaches the training pipeline and changes samples.
gated by mosaic	The knob only applies to samples that took the mosaic branch; with mosaic_prob=0 it never fires.
ignored	The knob never reaches this family’s pipeline; setting it does nothing (and the CLI warns).

Pipeline	Families	What actually runs
YOLOX-style mosaic	YOLO9, YOLO9-E2E, YOLO9-P2, YOLOX, YOLOv7, RTMDet, PicoDet, RT-DETR, RT-DETRv2, FOMO	HSV jitter and flips run per sample. The affine warp (degrees / translate / mosaic_scale / shear / perspective) and MixUp run on the mosaic canvas only, so they are gated by mosaic_prob. RTMDet, PicoDet, RT-DETR, RT-DETRv2 and FOMO have no vertical flip; FOMO also drops perspective.
YOLO-NAS	YOLO-NAS	No mosaic (mosaic_prob is ignored). Instead a per-sample affine is always on, so degrees / translate / shear / perspective apply directly, and MixUp is independent of mosaic.
DETR-style pass-through	D-FINE, DEIM, DEIMv2, RT-DETRv4, EC	Only flip_prob and no_aug_epochs are yours to tune. Color jitter, zoom-out and IoU-crop are fixed recipe constants, and there is no mosaic, MixUp or affine warp. Exception: EC pose honors hsv_prob, degrees and translate through its keypoint-aware affine.
RF-DETR native	RF-DETR	Flip, scale jitter and random crop from the native recipe; flip_prob and no_aug_epochs are configurable, HSV is not.
Classification	ResNet, ConvNeXt, MobileNetV4, EfficientNetV2, DINOv2 (classify)	The detection knobs never apply (horizontal flip is a fixed 0.5). Use the classification pack below.
Semantic	SegFormer (and the shared semantic pipeline)	Scale jitter and HSV come from family attributes rather than TrainConfig knobs; flip is a fixed 0.5. HSV jitter defaults on as of v1.4.0.
Restoration	NAFNet	Coupled input / target crop, flips and rot90 at fixed probabilities (vertical flip + rot90 new in v1.4.0). TrainConfig knobs are ignored.

Mosaic gating, explained

In the YOLOX-style pipelines, MixUp and the affine warp ride inside the mosaic branch: a sample first becomes a 4-image mosaic (with probability mosaic_prob), and only then is the mosaic canvas warped and optionally blended with another sample. Two practical consequences:

Setting mosaic_prob=0 also turns off MixUp and the affine warp for these families, whatever their knobs say. v1.4.0 warns at training start when mixup_prob > 0 can never fire because mosaic_prob=0.
To train with light augmentation but keep some geometry, lower mosaic_prob rather than zeroing it, or switch the geometry off explicitly with degrees=0 translate=0 shear=0.

python

1 # Minimal augmentation: flips only
2 model.train(
3     data="data.yaml",
4     mosaic_prob=0.0,   # also disables mixup + affine in mosaic-gated families
5     mixup_prob=0.0,
6     hsv_prob=0.0,
7     flip_prob=0.5,
8     no_aug_epochs=0,
9 )

Classification augmentation pack (new in v1.4.0)

The classification ImageFolder pipeline has its own four knobs, all off by default. At most one of MixUp / CutMix runs per batch: MixUp fires with probability mixup, otherwise CutMix with probability cutmix, so the two should sum to at most 1.

Knob	Default	What it does
`auto_augment`	None	Policy name: "randaugment", "autoaugment" or "augmix".
`erasing`	0.0	RandomErasing probability.
`mixup`	0.0	Batch-MixUp probability with soft labels. Python API only: on the CLI, --mixup is the detection mixup_prob alias.
`cutmix`	0.0	Batch-CutMix probability with soft labels.

python

1 from libreyolo import LibreMobileNetV4
2 
3 model = LibreMobileNetV4(size="s")
4 model.train(
5     data="imagenette160",
6     epochs=20,
7     auto_augment="randaugment",
8     erasing=0.25,
9     mixup=0.2,
10     cutmix=0.2,
11 )

Task-specific extras

A few knobs live on family TrainConfig subclasses rather than the base config, and are reachable from Python or a training YAML (the CLI does not expose them):

Knob	Families	What it does
`copy_paste`	RF-DETR (segment), YOLO9 lineage	Copy-paste instance augmentation probability for segmentation training: instances are cut out and pasted into the sample.
`copy_paste_mode`	same	Source of pasted instances: "flip" mirrors the same sample; "mixup" pulls a second sample (RF-DETR supports "flip" only).
`rot90`	YOLO9 lineage (OBB path)	Random 90-degree rotation probability for oriented-box training; ignored for axis-aligned detection.
`crop_resize_prob`	RF-DETR, D-FINE (segment), EC (segment)	Random crop-resize probability in the native pipelines.
`brightness_contrast_prob`	YOLO-NAS (pose), EC (pose)	Brightness / contrast jitter probability for keypoint training.
`affine_prob`	YOLO-NAS (pose), EC (pose)	Keypoint-aware affine probability.

Training augmentation vs test-time augmentation

Everything above happens during train(). Test-time augmentation is separate: predict(augment=True) / val(augment=True) run extra augmented forward passes and merge the outputs at inference time. In v1.4.0 TTA covers detection families where implemented, the four semantic families (PIDNet, SegFormer, EoMT, DINOv2), and EoMT panoptic.

Distillation

YOLO9 and YOLOX studentsDINOv2 teacher new in v1.4.0

Knowledge distillation trains a small student model against a larger frozen teacher, so the student learns from the teacher's intermediate features on top of its own labels. You get a model that runs at the student's speed but recovers some of the teacher's accuracy. Point distill_model at a teacher checkpoint and distillation turns on.

Distill a big model into a small one

python

1 from libreyolo import LibreYOLO
2 
3 student = LibreYOLO("LibreYOLO9t.pt")     # small student
4 
5 student.train(
6     data="coco.yaml",
7     epochs=100,
8     distill_model="LibreYOLO9c.pt",   # the frozen teacher: this turns distillation ON
9     distill_loss_type="mgd",          # "mgd" (default) or "cwd"
10     dis=2e-5,                         # global weight; omit to take the per-loss default
11 )

bash

1 libreyolo train model=LibreYOLO9t.pt data=coco.yaml epochs=100 \
2   distill-model=LibreYOLO9c.pt distill-loss-type=mgd dis=2e-5

During training the distillation term shows up as a distill loss component alongside the usual ones.

Arguments

Argument	Default	Meaning
distill_model	None	Teacher checkpoint path. Setting it enables distillation.
dis	None	Global distillation loss weight. Falls back to 2e-5 for MGD, 1.0 for CWD.
distill_loss_type	"mgd"	Feature loss: "mgd" or "cwd".
distill_mask_ratio	0.65	MGD only: fraction of spatial positions masked. Python API only.
distill_tau	1.0	CWD only: softmax temperature. Python API only.

Note the short name: the weight argument is dis, not distill_loss_weight. Three of these have CLI flags (distill-model, dis, distill-loss-type); distill_mask_ratio and distill_tau are reachable from Python or a training YAML only.

Foundation-teacher distillation: DINOv2 (new in v1.4.0)

Beyond checkpoint teachers, v1.4.0 adds distill_model="dinov2": the student's backbone features are regressed against a frozen DINOv2 foundation encoder with a feat_mse loss. No teacher checkpoint of your own is needed, which makes this the cheapest way to add a distillation signal to a YOLO9-backbone training run.

python

1 student = LibreYOLO("LibreYOLO9s.pt")
2 student.train(
3     data="coco128.yaml",
4     epochs=100,
5     distill_model="dinov2",       # frozen foundation teacher
6     distill_loss_type="feat_mse", # feature regression against DINOv2
7     distill_normalize=True,       # normalize features before the loss
8 )

The teacher runs at a DINOv2-compatible resolution internally; v1.4.0 fixed a border-cropping bug at sizes that are not multiples of 14, so odd input sizes distill correctly.

MGD or CWD

MGD (Masked Generative Distillation, the default) masks random spatial positions in the student features and asks it to regenerate the teacher's. Because it regresses raw feature magnitudes, its default weight is small: 2e-5.

CWD (Channel-Wise Distillation) turns each channel into a spatial distribution and matches them with a KL divergence. Normalizing per channel makes it scale invariant, so it copes better when teacher and student feature magnitudes are far apart. Its default weight is 1.0.

We do not publish a head-to-head accuracy comparison of the two, so treat MGD as the default and try CWD if the loss scale looks unhealthy.

Limits

Only the YOLO9 and YOLOX families can be students. Every other family raises at setup, because distillation needs feature tap points that only these two declare. The DINOv2 foundation teacher targets YOLO9 backbones.
Teacher and student strides must match exactly. Both supported families use strides 8/16/32, so in practice you distill within a family, across sizes. Channel widths may differ freely: a 1x1 adapter bridges them.
Multi-GPU, mixed precision and gradient accumulation all work with distillation on.
Resuming works, and the adapter state is restored, but the teacher is not stored in the checkpoint: pass distill_model again when you resume.

Training Monitor

Every training run, for every model family, writes machine-readable progress files into its run directory. You do not have to enable anything. libreyolo monitor serves them as a live dashboard, and because it only reads files it works equally well on a running job, a finished one, or one that crashed.

bash

1 libreyolo monitor                     # watch runs/ on http://127.0.0.1:8420
2 libreyolo monitor runs/train/exp      # open one run directly
3 libreyolo monitor --port 9000 --no-browser

Run artifacts

The two files below are the contract, and they are the reason this is useful to scripts and agents as well as to humans: you can poll a run's state without parsing logs.

File	What it is
status.json	Current state of the run, rewritten atomically every epoch.
metrics.jsonl	Append-only, one JSON object per epoch. The full metric history.
train.log	The run log.

status.json always carries state (running, completed or failed), pid, progress, eta_seconds, and the current and best metric. If the run dies it records state: "failed" plus an error object with the exception type and message, so a crash is visible in the file rather than only in a terminal you have closed.

python

1 import json, time
2 
3 def wait_for_run(run_dir):
4     while True:
5         status = json.load(open(f"{run_dir}/status.json"))
6         if status["state"] != "running":
7             return status
8         print(f'{status["progress"]:.0%}  eta {status["eta_seconds"]:.0f}s  '
9               f'best {status["best_metric"]}')
10         time.sleep(30)
11 
12 final = wait_for_run("runs/train/exp")
13 if final["state"] == "failed":
14     print(final["error"]["type"], final["error"]["message"])
15 else:
16     print(final["checkpoints"]["best"])

The monitor also exposes the same data over HTTP (/api/status, /api/metrics, /api/log, /api/images), so you can drive a dashboard of your own from it.

Profiling

libreyolo profile measures where the time actually goes, in training and in inference. It is deliberately a measuring tool and nothing else: it never edits your config or tunes anything for you. It tells you what is slow and leaves the decision to you.

Behavior change in v1.4.0: model.train(profile=True) now keeps training after the profiled window instead of stopping (and it no longer corrupts resume state). Pass profile_then_stop=True to restore the old capture-and-exit behavior.

Profile training or inference

bash

1 # training: is the GPU actually busy, or am I dataloader-bound?
2 libreyolo profile run coco128 --weights LibreYOLO9t.pt --batch 16 --repeat 3
3 
4 # inference: latency percentiles and where they are spent
5 libreyolo profile infer bus.jpg --weights LibreYOLO9t.pt --runs 200

profile infer reports p50, p90 and p99 latency, throughput, and a split across preprocess, forward and postprocess (NMS), plus a verdict on what is bounding you. That split is usually the punchline: a model that looks slow is often spending its time in NMS or in preprocessing rather than in the network.

Then look closer

Both commands write the same profile.json, and the analysis subcommands all read it, so you profile once and then interrogate the result from several angles.

Subcommand	What it answers
summary	The high-level diagnosis: utilisation, what is bounding you, the kernel mix.
phases	Where the time went: forward, backward, dataload, optimizer.
kernels	Which individual GPU kernels dominate.
ops	The framework view: which operations cost the most CPU time.
get	Print one metric, for use inside a script.
compare	Diff two profiles, before and after a change.
what-if	Estimate the payoff of a change before you write it.

bash

1 libreyolo profile summary runs/profile/prof/profile.json
2 libreyolo profile kernels runs/profile/prof/profile.json --top 20
3 libreyolo profile compare before.json after.json

Two practical notes. Every subcommand takes --json, which makes the profiler usable inside an automated optimize loop. And compare will only report statistical significance if both profiles were captured with --repeat 2 or higher: a single run is noisy enough to mislead you, especially when the job is launch-bound.

Validation

Run COCO-standard evaluation on a validation set. The heavily tested validation paths are single-GPU YOLO9 detection, RF-DETR detection, and RF-DETR segmentation.

python

1 results = model.val(
2     data="coco128.yaml",   # dataset config
3     batch=16,
4     imgsz=640,
5     conf=0.001,            # low conf for mAP calculation
6     iou=0.6,               # NMS IoU threshold
7     split="val",           # "val", "test", or "train"
8     save_json=False,       # save predictions as COCO JSON
9     verbose=True,          # print per-class metrics
10     plots=True,            # save validation plots (metrics, per-class AP, confusion matrix); alias for save_plots
11 )
12 
13 print(f"mAP50:    {results['metrics/mAP50']:.3f}")
14 print(f"mAP50-95: {results['metrics/mAP50-95']:.3f}")

Validation results dict

By default, LibreYOLO uses COCO evaluation and returns precision, recall, AP/AR metrics, and per-image timing:

python

1 {
2     "metrics/mAP50-95": 0.489,   # COCO primary metric (AP@[.5:.95])
3     "metrics/mAP50": 0.721,      # AP@0.5 (PASCAL VOC style)
4     "metrics/mAP75": 0.534,      # AP@0.75 (strict)
5     "metrics/precision": 0.68,
6     "metrics/recall": 0.61,
7     "metrics/precision(B)": 0.68, # bbox aliases
8     "metrics/recall(B)": 0.61,
9     "metrics/mAP50(B)": 0.721,
10     "metrics/mAP50-95(B)": 0.489,
11     "metrics/mAP_small": 0.291,
12     "metrics/mAP_medium": 0.532,
13     "metrics/mAP_large": 0.648,
14     "metrics/AR1": 0.362,        # Average Recall (max 1 det)
15     "metrics/AR10": 0.571,
16     "metrics/AR100": 0.601,
17     "metrics/AR_small": 0.387,
18     "metrics/AR_medium": 0.641,
19     "metrics/AR_large": 0.739,
20     "speed/preprocess_ms": 1.2,
21     "speed/inference_ms": 6.8,
22     "speed/postprocess_ms": 0.9,
23     "speed/total_ms": 8.9,
24     "speed/total_s": 12.3,
25     "speed/images_seen": 1382,
26 }

Segmentation validation returns mask metrics with (M) suffixes alongside bbox metrics with (B) suffixes; OBB validation adds (OBB) metrics. Pose validation returns COCO keypoint metrics through PoseValidator. Beyond those, validators cover classify (top-1 / top-5), semantic (mIoU / pixel accuracy), point, depth (zero-shot), and, new in v1.4.0, panoptic (PanopticValidator with the Panoptic Quality metric), matte (MatteValidator), and OCR (OCRValidator with optimal one-to-one assignment). Semantic and panoptic validation accept augment=True for flip-TTA as of v1.4.0 (this raised before). Pass plots=True (or --save-plots on the CLI) to write metric, per-class AP, confusion-matrix, and sample plots to the run directory.

Quantization

New in v1.4.0YOLO9 and RF-DETR

LibreYOLO quantizes models directly in PyTorch. A quantized model keeps the normal predict / val / train / save contract, so accuracy is measured with the same validators as float models, and accuracy recovery is just train() on the quantized model (QAT), optionally with the existing distillation kwargs (QAD).

The grammar: quantize, then optionally recover

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("LibreYOLO9s.pt")
4 
5 # Step 1: quantize. calib is a small UNLABELED image set, used forward-only
6 # to derive activation ranges and scales.
7 qmodel = model.quantize(recipe="int8", calib="coco128.yaml", samples=128)
8 
9 qmodel.val(data="coco8.yaml")            # honest accuracy, same validators
10 qmodel.predict("bus.jpg")
11 qmodel.save("LibreYOLO9s-int8.pt")       # manifest-carrying checkpoint
12 
13 # Step 2 (optional): QAT is plain train() on the quantized model
14 qmodel.train(data="coco.yaml", epochs=5)
15 
16 # QAD: same, plus the existing distillation kwargs
17 qmodel.train(data="coco.yaml", epochs=5, distill_model="LibreYOLO9m.pt")

bash

1 libreyolo quantize --model LibreYOLO9s.pt --recipe int8 --calib coco8.yaml
2 libreyolo train model=LibreYOLO9s-int8.pt data=coco.yaml epochs=5

LibreYOLO("LibreYOLO9s-int8.pt") restores the quantized structure and scales automatically: checkpoints carry a quant manifest, and trainer checkpoints written during QAT / QAD carry it too, so best.pt from a QAT run is itself a quantized checkpoint. model.quant_info() reports the recipe, module counts, calibration state and execution tier; model.dequantize() restores float modules in place.

Recipes

Recipe	What it does	Families	Calibration
`fp16`	Half-precision cast with a float32 I/O contract. Inference-only.	yolo9, rfdetr	none
`bf16`	bfloat16 cast: fp32 exponent range at half storage; the fix when fp16 overflows on DETR-style models. Inference-only.	yolo9, rfdetr	none
`fp8`	E4M3 weight + activation simulation on Conv2d and Linear.	yolo9, rfdetr	required
`int8`	W8A8: per-channel INT8 weights, per-tensor affine INT8 activations.	yolo9, rfdetr	required (calib=None gives weights-only)
`w4a16`	Grouped INT4 weights, float activations, Linear only.	rfdetr	not needed
`w4a8`	Grouped INT4 weights + INT8 activations; maps to NPU W4A8 deployments.	rfdetr	required
`nvfp4`	NVFP4 W4A4: E2M1 elements, 16-element blocks, FP8 block scales.	rfdetr	not needed (dynamic)
`mxfp4`	OCP MXFP4: E2M1 elements, 32-element blocks, power-of-two scales.	rfdetr	not needed (dynamic)
`int2`	Research preview: grouped 2-bit weights + INT8 activations. PTQ alone is unusable; QAT / QAD required.	rfdetr	required

The split is deliberate: sub-8-bit acceleration is GEMM-only on current hardware, so the Linear-only recipes are rejected for conv-heavy families like YOLO9 (use int8 or fp8 there); transformer families (RF-DETR) are the target for the 4-bit recipes. Per-family keep_high_precision defaults protect the first layer and the heads; override with quantize(..., keep_high_precision=("head.",)) if you know what you are doing.

Calibration data is not training data

calib= is a few hundred images, no labels read, forward-only. Its job is activation ranges and scales. Default coco128.yaml (auto-downloaded); multiple batches matter because ranges are estimated across them.
data= on train / val is the labeled dataset, for gradients and metrics. Different argument, different job.
The default range estimator is minmax; algorithm="percentile" exists but measured worse everywhere, and it collapses DETR-family accuracy because transformer activation outliers are load-bearing. What actually fixes small-model int8 sensitivity is calibrating on enough batches: with the coco128 default, YOLO9-t lands within about one mAP point of fp32.

Honest numbers: simulation first

v1.4.0 executes quantized arithmetic in simulation (fake-quantize with straight-through gradients, computed in fp32 islands). Simulation is numerics-true: a val() score on any device is a real claim about the quantized arithmetic. It is not a speed claim; packed low-bit kernels are a deployment concern. The fp16 and bf16 casts are the exception: they execute natively.

Deploying quantized models

python

1 # Finalize: pack real low-bit weights, strip fp32 masters
2 qmodel.export(format="pt")    # -> <name>-final.pt
3 
4 # int8 exports straight to QDQ ONNX with the model's own calibrated scales
5 qmodel = LibreYOLO("LibreYOLO9s-int8.pt")
6 qmodel.export(format="onnx")  # ONNX Runtime / TensorRT consume real INT8 kernels

Finalized checkpoints store packed weights and shrink accordingly (measured: YOLO9-s int8 29.5 to 9.6 MB; RF-DETR-n nvfp4 122 to 26 MB), and unpacking reproduces the simulation bit for bit on the device you finalized on. Loading one gives an inference-ready model, and train() on it re-prepares masters automatically.
For fp16 / bf16, call dequantize() and use the float exporters (half=True gives fp16 ONNX).
The sub-8-bit Linear recipes and fp8 have no deployable ONNX form yet: they execute in PyTorch and crystallize via format="pt".
In-tree Triton kernels back the simulation, with a pluggable registry and a LIBREYOLO_QUANT_KERNELS override.
Checkpoints with finalized quant state are not loadable by v1.3.1.

Export

Export PyTorch models to ONNX, TorchScript, TensorRT, OpenVINO, NCNN, CoreML, or TFLite for deployment. The heavily tested export paths remain single-GPU YOLO9 detection, RF-DETR detection, and RF-DETR segmentation.

The support matrix is canonical (new in v1.4.0)

v1.4.0 replaces guesswork with a canonical export-support matrix. Every family / task / format combination has a tier: validated (tested, safe), experimental (exports with a warning), or blocked (raises up front instead of producing a broken artifact). Query it before you build a pipeline around a format:

bash

1 # Formats and tiers for one family / task
2 libreyolo formats --family yolo9
3 libreyolo formats --family rfdetr --task segment
4 
5 # Everything about one model, including its export_support map
6 libreyolo info --model LibreYOLO9s.pt --json | jq .export_support

v1.4.0 also unblocked whole task groups under a fixed-resolution, batch-1 contract: PIDNet (semantic), FOMO (point), and ZipDepth / Depth Anything V2 (depth) now export to ONNX. Two behavior guarantees landed with the matrix: export never mutates the live model before the request is accepted (LoRA adapters fold and quantized models re-prepare only after format lookup and option preflight succeed), and LoRA adapters are merged into dense weights on export, so deployed artifacts need no peft at runtime.

Quick export

python

1 # ONNX (default)
2 model.export()
3 
4 # TorchScript
5 model.export(format="torchscript")
6 
7 # TensorRT (requires NVIDIA GPU + TensorRT)
8 model.export(format="tensorrt")
9 
10 # OpenVINO (optimized for Intel hardware)
11 model.export(format="openvino")
12 
13 # NCNN (via PNNX)
14 model.export(format="ncnn")
15 
16 # CoreML (.mlpackage, macOS runtime)
17 model.export(format="coreml")
18 
19 # TFLite (needs Python 3.12+); "litert" is an accepted alias
20 model.export(format="tflite")
21 
22 # Quantized checkpoints: pack low-bit weights, or emit QDQ INT8 ONNX
23 qmodel.export(format="pt")      # finalized packed checkpoint
24 qmodel.export(format="onnx")    # int8 -> QDQ ONNX (see Quantization)

All export parameters

python

1 path = model.export(
2     format="onnx",            # "onnx", "torchscript", "tensorrt", "openvino", "ncnn", "coreml", or "tflite"
3     output_path="model.onnx", # output file (auto-generated if None)
4     imgsz=640,                # input resolution (default: model's native); also accepts (h, w) for rectangular
5     opset=None,               # ONNX opset (auto: 13, or 17 for wrappers that need it)
6     simplify=True,            # run onnxsim graph simplification
7     dynamic=True,             # enable dynamic batch axis (ONNX); TFLite requires static shapes
8     half=False,               # export in FP16
9     batch=1,                  # batch size for static graph
10     device=None,              # device to trace on (default: model's current device)
11     int8=False,               # INT8 quantization: TensorRT, OpenVINO, or ONNX (YOLO9 detection only)
12     data=None,                # calibration dataset for INT8
13     fraction=1.0,             # fraction of calibration data to use
14     allow_download_scripts=False, # allow data.yaml download hooks during calibration
15     workspace=4.0,            # TensorRT workspace size (GB)
16     min_batch=1,              # TensorRT dynamic profile minimum batch
17     opt_batch=1,              # TensorRT dynamic profile optimal batch
18     max_batch=8,              # TensorRT dynamic profile maximum batch
19     hardware_compatibility="none", # TensorRT compatibility mode
20     gpu_device=0,             # GPU device index for TensorRT
21     trt_config=None,          # optional TensorRT YAML config path
22     compute_units="all",      # CoreML routing: all, cpu_only, cpu_and_gpu, cpu_and_ne
23     nms=False,                # embed NMS in the graph (ONNX YOLO9 detection, or CoreML)
24     iou=0.45,                 # embedded-NMS IoU threshold
25     conf=0.25,                # embedded-NMS confidence threshold
26     max_det=300,              # embedded-NMS max detections (ONNX only)
27     verbose=False,            # verbose logging
28 )

OpenVINO INT8 export additionally requires nncf. NCNN export writes a directory containing model.ncnn.param, model.ncnn.bin, and metadata.yaml. CoreML export writes a .mlpackage bundle, requires coremltools, and does not support INT8.

ONNX embedded NMS (YOLO9 detection)

Pass nms=True to bake NMS into an exported ONNX graph so the model emits final boxes directly. This is currently limited to the yolo9 family on the detect task (other families/tasks raise). It forces a fixed batch-1 graph (dynamic=False) and records nms / nms_conf / nms_iou / max_det in the ONNX metadata.

python

1 model = LibreYOLO("LibreYOLO9c.pt")
2 model.export(format="onnx", nms=True, conf=0.25, iou=0.45, max_det=300)

int8=True is now also supported for ONNX (in addition to TensorRT and OpenVINO), again limited to YOLO9 detection; it needs a calibration data= dataset.

TFLite (LiteRT) export

Runtime backend new in v1.4.0 LibreYOLO has a TFLite export path built on onnx2tf. TFLite is the format of Google's LiteRT runtime (TensorFlow Lite was renamed LiteRT in 2024; the .tflite file format is unchanged). It supports YOLO9 and YOLOX detection, the MobileNetV4 / ConvNeXt / EfficientNetV2 / ResNet classifiers, PIDNet semantic segmentation, and Real-ESRGAN restoration. RF-DETR detection is experimental; RF-DETR segmentation and pose are blocked. It requires Python 3.12+ (the onnx2tf 2.4.x wheels do not target older Python) plus the optional extra libreyolo[tflite] (alias: libreyolo[litert]). Export is FP32 and static-shape only (no half, int8, or dynamic yet).

bash

1 pip install "libreyolo[tflite]"   # Python 3.12+; [litert] is the same extra

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("LibreYOLO9c.pt")
4 model.export(format="tflite")   # writes a .tflite file; format="litert" also works

For RF-DETR, the exporter rewrites each GridSample node into a TFLite-safe bilinear subgraph because onnx2tf's default lowering is numerically broken.

TFLite is no longer export-only. New in v1.4.0, LibreYOLO("model.tflite") loads the file through a LiteRT runtime backend (ai-edge-litert), so the same factory that runs your ONNX and TensorRT artifacts now runs TFLite too; see TFLite Inference.

ONNX metadata

Exported ONNX files include embedded metadata:

Key	Example value
`libreyolo_version`	`"1.4.0"`
`model_family`	`"yolox"`
`model_size`	`"s"`
`nb_classes`	`"80"`
`names`	`'{"0": "person", "1": "bicycle", ...}'`
`imgsz`	`"640"`
`dynamic`	`"True"`
`half`	`"False"`

This metadata is automatically read back when loading the exported file with LibreYOLO("model.onnx").

TorchScript Inference

Run an exported .torchscript model through the same runtime-backend prediction API.

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("model.torchscript")
4 
5 result = model("image.jpg", conf=0.25, iou=0.45, save=True)
6 print(result.boxes.xyxy)

ONNX Inference

Run inference using ONNX Runtime instead of PyTorch. Useful for deployment environments without PyTorch.

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("model.onnx")
4 
5 result = model("image.jpg", conf=0.25, iou=0.45, save=True)
6 print(result.boxes.xyxy)

Auto-metadata

If the ONNX file was exported by LibreYOLO, class names and class count are read automatically from the embedded metadata:

python

1 # Export with metadata
2 model.export(format="onnx", output_path="model.onnx")
3 
4 # Load - names and nb_classes auto-populated
5 onnx_model = LibreYOLO("model.onnx")
6 print(onnx_model.names)       # {0: "person", 1: "bicycle", ...}
7 print(onnx_model.nb_classes)  # 80

For ONNX files without metadata (e.g., exported by other tools), specify nb_classes manually:

python

1 model = LibreYOLO("external_model.onnx", nb_classes=20)

Device selection

python

1 # Auto-detect (CUDA if available, else CPU)
2 model = LibreYOLO("model.onnx", device="auto")
3 
4 # Force CPU
5 model = LibreYOLO("model.onnx", device="cpu")
6 
7 # Force CUDA
8 model = LibreYOLO("model.onnx", device="cuda")

Prediction parameters

Runtime artifacts loaded through LibreYOLO() support the shared runtime prediction API:

python

1 result = model(
2     "image.jpg",
3     conf=0.25,
4     iou=0.45,
5     imgsz=640,
6     classes=[0, 2],
7     max_det=300,
8     save=True,
9     output_path="output/annotated.jpg",  # final file path when save=True
10     color_format="auto",
11 )

Runtime backends do not expose PyTorch-only options such as tiling, overlap_ratio, or output_file_format.

Runtime backends also handle saving a little differently from the PyTorch wrappers: if you set output_path, pass a final file path, not a directory. If you omit it, the current backend default is under runs/detections/.

TensorRT Inference

Run inference using TensorRT for maximum throughput on NVIDIA GPUs. Requires CUDA plus the TensorRT Python bindings.

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("model.engine")
4 
5 result = model("image.jpg", conf=0.25, iou=0.45, save=True)
6 print(result.boxes.xyxy)

TensorRT artifacts loaded through LibreYOLO() support the same core runtime prediction API as ONNX and OpenVINO, including the same file-path-only output_path behavior for save=True.

OpenVINO Inference

Run inference using OpenVINO, optimized for Intel CPUs, GPUs, and VPUs.

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("model_openvino/")
4 
5 result = model("image.jpg", conf=0.25, iou=0.45, save=True)
6 print(result.boxes.xyxy)

OpenVINO directories loaded through LibreYOLO() read metadata.yaml when present and support the same core runtime prediction API.

NCNN Inference

Run inference using NCNN for lightweight deployment on CPU or Vulkan-capable GPU targets.

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("model_ncnn/")
4 
5 result = model("image.jpg", conf=0.25, iou=0.45, save=True)
6 print(result.boxes.xyxy)

An NCNN export directory contains model.ncnn.param, model.ncnn.bin, and usually metadata.yaml.

CoreML Inference

Run an exported .mlpackage through CoreML on macOS. CoreML routes execution with compute_units instead of PyTorch device strings.

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("model.mlpackage", compute_units="all")
4 
5 result = model("image.jpg", conf=0.25, iou=0.45, save=True)
6 print(result.boxes.xyxy)

Supported compute_units values are all, cpu_only, cpu_and_gpu, and cpu_and_ne.

TFLite Inference

New in v1.4.0

Run an exported .tflite file through Google's LiteRT interpreter, the runtime formerly named TensorFlow Lite. Requires Python 3.12+ and pip install "libreyolo[tflite]" (or the [litert] alias), which brings in ai-edge-litert.

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("model.tflite")
4 
5 result = model("image.jpg", conf=0.25, iou=0.45, save=True)
6 print(result.boxes.xyxy)

TFLite artifacts support the same core runtime prediction API as the other backends, including the file-path-only output_path behavior for save=True. Exported graphs are static-shape, so run at the exported imgsz.

CLI

Installing LibreYOLO registers a libreyolo command on your PATH (entry point in pyproject.toml). The CLI mirrors the Python API and accepts key=value syntax.

Subcommands

Command	Purpose
`predict`	Run inference on images, directories, or videos
`train`	Train a model on a dataset
`val`	Evaluate a model on a dataset
`export`	Export to ONNX / TorchScript / TensorRT / OpenVINO / NCNN / CoreML / TFLite
`quantize`	Quantize a model with a recipe + calibration set (new in v1.4.0)
`label`	Launch LibreLabel, the local browser annotation tool
`monitor`	Serve a live dashboard over training runs
`profile`	Profile training or inference, then analyse the result
`ui`	Launch a local drag-and-drop / paste browser inference UI
`doctor`	Run pre-training dataset health checks (YOLO detection format)
`checks`	Print Python, torch, CUDA, GPU, and optional-package info
`models`	List registered model families and CLI shortcut names (enriched in v1.4.0; --json schema changed)
`formats`	List export formats; --family / --task filter by support tier (new in v1.4.0)
`cfg`	Print the default training configuration YAML
`info`	Load a model and print family, size, task, device, classes, and its export_support map
`metadata`	Inspect raw checkpoint metadata from a .pt file
`version`	Print LibreYOLO + Python + torch versions

Model name shortcuts

The CLI accepts short names (yolo9-c) that resolve to weight filenames (LibreYOLO9c.pt) - discoverable via libreyolo models. You can also pass any explicit checkpoint path.

Released-v1.4.0 caveats: without transformers,libreyolo models omits RF-DETR and DINOv2 instead of listing them as unavailable. Some task-suffixed shortcuts for non-detection-default families also fail to resolve. Install libreyolo[rfdetr] for the transformer families, and pass the full official checkpoint filename when a shortcut fails.

Common options

Command	Important options
`predict`	`conf`, `iou`, `imgsz`, `classes`, `max_det`, `half`, `batch`, `tiling`, `overlap_ratio`, `output_file_format`, `project`, `name`, `exist_ok`, `face_detector`
`train`	`epochs`, `batch`, `imgsz`, `lr0`, `optimizer`, `scheduler`, `workers`, `seed`, `resume`, `amp`, `task`, `cache`, `lora`, `freeze`, `save_plots`, `allow_download_scripts`, `dry_run`, plus the augmentation knobs (`mosaic`, `mixup`, `hsv_prob`, `flip_prob`, `degrees`, ...)
`val`	`split`, `batch`, `imgsz`, `conf`, `iou`, `max_det`, `half`, `save_plots`, `data_dir`, `use_coco_eval`, `project`, `name`, `exist_ok`, `save_json`, `allow_download_scripts`
`export`	`format`, `imgsz`, `batch`, `half`, `int8`, `dynamic`, `simplify`, `nms`, `conf`, `iou`, `max_det`, `opset`, `data`, `fraction`, `device`, `allow_download_scripts`, `verbose`

Predict

bash

1 # Flagship: YOLO9
2 libreyolo predict model=yolo9-c source=image.jpg conf=0.25 save=true
3 
4 # Flagship: RF-DETR
5 libreyolo predict model=rfdetr-s source=image.jpg save=true
6 
7 # Video - saved under runs/detect/predict*/
8 libreyolo predict model=yolo9-c source=clip.mp4 save=true
9 
10 # Tiled inference for very large images
11 libreyolo predict model=yolo9-c source=aerial.jpg tiling=true save=true
12 
13 # Gaze (requires a face detector)
14 libreyolo predict model=LibreL2CSr50.pt source=portrait.jpg \
15     --face-detector path/to/face.pt save=true

Train

bash

1 libreyolo train model=yolo9-c data=coco128.yaml epochs=300 batch=16 device=0
2 
3 # Dry-run prints the resolved config without launching training
4 libreyolo train model=yolo9-c data=coco128.yaml --dry-run

Validate

bash

1 libreyolo val model=runs/train/exp/weights/best.pt data=coco128.yaml split=val

Export

bash

1 libreyolo export model=runs/train/exp/weights/best.pt format=onnx dynamic=true
2 libreyolo export model=best.pt format=tensorrt half=true
3 libreyolo export model=best.pt format=openvino int8=true data=coco128.yaml
4 libreyolo export model=best.pt format=coreml

Export with embedded NMS and rectangular size

bash

1 # Embed NMS into an ONNX YOLO9 detection graph
2 libreyolo export model=yolo9-c format=onnx nms=true conf=0.25 iou=0.45 max_det=300
3 
4 # Rectangular export size (imgsz accepts a single value or two comma-separated dims)
5 libreyolo export model=yolo9-c format=onnx imgsz=640,480
6 
7 # TFLite (Python 3.12+, libreyolo[tflite])
8 libreyolo export model=rfdetr-s format=tflite

Quantize (new in v1.4.0)

bash

1 # PTQ: int8 with a small calibration set
2 libreyolo quantize --model LibreYOLO9s.pt --recipe int8 --calib coco8.yaml --samples 128
3 
4 # Write to an explicit path, machine-readable output
5 libreyolo quantize --model LibreRFDETRn.pt --recipe nvfp4 --out rfdetr-nvfp4.pt --json

The quantized checkpoint then flows through the normal commands: libreyolo val for honest accuracy, libreyolo train for QAT, and libreyolo export for deployment. See Quantization.

Local inference UI

libreyolo ui serves a local browser page where you drop, paste, or pick images, choose a model, and view results. It binds 127.0.0.1:8000 by default and auto-bumps the port if taken.

bash

1 libreyolo ui                       # opens http://127.0.0.1:8000
2 libreyolo ui --port 9000 --no-browser --device 0

Dataset health check

libreyolo doctor runs pre-training checks on a YOLO detection-format dataset and exits non-zero when errors are found (--strict also fails on warnings), so it can gate CI.

bash

1 libreyolo doctor coco8.yaml
2 libreyolo doctor --data coco8.yaml --strict --json
3 libreyolo doctor coco8.yaml --fast --only labels   # skip image decoding, run one check family

Machine-readable output

Every command accepts --json (structured stdout for piping into scripts or agents) and --quiet (suppress stderr progress lines). The core predict, train, val, and export commands also accept --help-json to dump their parameter schema as JSON.

bash

1 libreyolo predict model=yolo9-c source=img.jpg --json | jq .
2 
3 libreyolo train --help-json > train_schema.json

API Reference

LibreYOLO (factory)

python

1 LibreYOLO(
2     model_path: str,
3     *,
4     device: str = "auto",
5     task: str | None = None,    # override only when a custom artifact is ambiguous
6     nb_classes: int | None = None,  # mainly for external exported artifacts
7     compute_units: str = "all", # CoreML only: all, cpu_only, cpu_and_gpu, cpu_and_ne
8 ) -> model wrapper or runtime backend

Prefer official checkpoint filenames and exported artifact paths, then let the factory resolve the details. It handles PyTorch checkpoints, .onnx, .torchscript, .engine, .tensorrt, .mlpackage, OpenVINO directories containing model.xml, and NCNN directories containing model.ncnn.param plus model.ncnn.bin. The task argument is for ambiguous custom artifacts; otherwise resolution comes from checkpoint metadata, filename suffix, and family default.

Prediction (PyTorch model wrappers)

python

1 model(
2     source,                     # image input (see supported formats)
3     *,
4     conf: float = 0.25,
5     iou: float = 0.45,
6     imgsz: int = None,
7     device: str = "auto",
8     classes: list[int] = None,
9     max_det: int = 300,
10     augment: bool = False,
11     save: bool = False,
12     batch: int = 1,
13     stream: bool = False,
14     vid_stride: int = 1,
15     show: bool = False,
16     output_path: str = None,
17     color_format: str = "auto",
18     tiling: bool = False,
19     overlap_ratio: float = 0.2,
20     output_file_format: str = None,
21 ) -> Results | list[Results] | Generator[Results, None, None]

Prediction (runtime backends)

python

1 backend(
2     source,
3     *,
4     conf: float = 0.25,
5     iou: float = 0.45,
6     imgsz: int = None,
7     classes: list[int] = None,
8     max_det: int = 300,
9     save: bool = False,
10     batch: int = 1,
11     output_path: str = None,    # final file path when save=True
12     color_format: str = "auto",
13 ) -> Results | list[Results]

If output_path is omitted for a runtime backend, the current default save location is runs/detections/.

Results

python

1 result = Results(
2     boxes: Boxes | None,
3     orig_shape: tuple[int, int],  # (height, width)
4     path: str | None,
5     names: dict[int, str],
6     masks: Masks | None = None,
7     keypoints: Keypoints | None = None,
8     probs: Probs | None = None,
9     obb: OBB | None = None,
10     gaze: Gaze | None = None,
11     points: Points | None = None,
12     semantic_mask: SemanticMask | None = None,
13     depth_map: DepthMap | None = None,
14     restored: RestoredImage | None = None,
15     speed: dict[str, float] | None = None,
16     track_id = None,
17     frame_idx: int | None = None,
18     # New in v1.4.0 (placed after the complete v1.3 signature, so
19     # positional v1.3 call sites keep working):
20     panoptic: PanopticSegmentation | None = None,
21     matte: Matte | None = None,
22     ocr: OCRRegions | None = None,
23     restore_scale: int = 1,
24 )
25 
26 len(result)          # number of detections
27 result.cpu()         # copy with tensors on CPU
28 result.cuda()        # copy with tensors on CUDA
29 result.numpy()       # copy with numpy arrays
30 result.summary()     # list[dict] with the payloads present
31 result.to_json()     # JSON string from summary()
32 result.cutout()      # (H, W, 4) RGBA ndarray; matte results only

Boxes

python

1 boxes = Boxes(boxes, conf, cls)
2 
3 boxes.xyxy           # (N, 4) tensor - x1, y1, x2, y2
4 boxes.xywh           # (N, 4) tensor - cx, cy, w, h
5 boxes.conf           # (N,) tensor - confidence scores
6 boxes.cls            # (N,) tensor - class IDs
7 boxes.id             # (N,) track IDs when tracking, else None
8 boxes.is_track       # True when track IDs are attached
9 boxes.data           # (N, 6) [xyxy, conf, cls], or (N, 7) with track IDs
10 
11 len(boxes)           # number of boxes
12 boxes.cpu()          # copy on CPU
13 boxes.numpy()        # copy as numpy arrays

Task payloads

python

1 result.masks.data        # segmentation masks, (N, H, W)
2 result.masks.xy          # list of mask contours in pixel coordinates
3 result.masks.xyn         # normalized mask contours
4 
5 result.keypoints.xy      # pose keypoint coordinates
6 result.keypoints.xyn     # normalized keypoint coordinates
7 result.keypoints.conf    # keypoint confidence when present
8 
9 result.obb.xywhr         # (N, 5): center x/y, width, height, rotation
10 result.obb.xyxyxyxy      # (N, 4, 2): four oriented box corners
11 result.obb.conf          # (N,) confidence scores
12 result.obb.cls           # (N,) class IDs
13 
14 result.gaze.data         # (N, 2): pitch, yaw in radians
15 result.gaze.pitch_deg    # pitch in degrees
16 result.gaze.yaw_deg      # yaw in degrees
17 result.gaze.direction_3d # approximate 3D direction vectors
18 
19 result.semantic_mask.data      # (H, W) class-id map (semantic)
20 result.depth_map.data          # (H, W) relative inverse depth
21 result.points.xy               # (N, 2) point detections (FOMO)
22 result.restored.array          # (H, W, 3) uint8 restored image
23 result.restore_scale           # int upscale factor; 1 unless super-resolution
24 
25 # New in v1.4.0
26 result.panoptic.data           # (H, W) segment-id map
27 result.panoptic.segments_info  # per-segment {"id", "category_id", ...}
28 result.matte.data              # (H, W) float32 alpha in [0, 1]
29 result.ocr.polygons            # (N, 4, 2) text-region quads
30 result.ocr.texts               # list[str] transcriptions
31 result.ocr.conf                # (N,) recognition scores
32 result.ocr.det_conf            # (N,) detection scores

model.track()

python

1 model.track(
2     source,                       # video file path
3     *,
4     track_conf: float = 0.25,
5     iou: float = 0.45,
6     imgsz: int = None,
7     classes: list[int] = None,
8     max_det: int = 300,
9     save: bool = False,
10     show: bool = False,
11     vid_stride: int = 1,
12     output_path: str = None,
13     tracker: str = "bytetrack",   # "bytetrack" | "ocsort" | "botsort" | "deepocsort"
14     tracker_config = None,        # a config instance selects the tracker by type
15     augment: bool = False,
16     **tracker_kwargs,
17 ) -> Generator[Results, None, None]

model.quantize() (new in v1.4.0)

python

1 model.quantize(
2     recipe: str,                  # "fp16" | "bf16" | "fp8" | "int8" | "w4a16"
3                                   # | "w4a8" | "nvfp4" | "mxfp4" | "int2"
4     calib: str = "coco128.yaml",  # unlabeled calibration images (forward-only)
5     samples: int = 128,
6     batch: int = 8,
7     algorithm: str = "auto",      # "auto" (minmax) | "minmax" | "percentile"
8     keep_high_precision = None,   # module-name substrings to keep in float
9     verbose: bool = True,
10 ) -> model                        # quantized in place
11 
12 model.quant_info()                # dict describing the quant state, or None
13 model.dequantize()                # restore float modules in place

model.export()

python

1 model.export(
2     format: str = "onnx",       # "onnx", "torchscript", "tensorrt", "openvino",
3                                 # "ncnn", "coreml", "tflite" (alias "litert"), or "pt"
4     *,
5     output_path: str | None = None,
6     imgsz: int | None = None,
7     opset: int | None = None,   # auto: 13, or 17 for wrappers that need it
8     simplify: bool = True,
9     dynamic: bool = True,
10     half: bool = False,
11     batch: int = 1,
12     device: str | None = None,
13     int8: bool = False,
14     data: str | None = None,    # calibration data for INT8
15     fraction: float = 1.0,      # fraction of calibration data
16     allow_download_scripts: bool = False,
17     workspace: float = 4.0,     # TensorRT workspace (GB)
18     min_batch: int = 1,         # TensorRT dynamic profile minimum batch
19     opt_batch: int = 1,         # TensorRT dynamic profile optimal batch
20     max_batch: int = 8,         # TensorRT dynamic profile maximum batch
21     hardware_compatibility: str = "none",
22     gpu_device: int = 0,
23     trt_config = None,          # optional TensorRT YAML config path
24     compute_units: str = "all", # CoreML only
25     nms: bool = False,          # CoreML embedded NMS where supported
26     iou: float = 0.45,          # CoreML embedded NMS IoU threshold
27     conf: float = 0.25,         # CoreML embedded NMS confidence threshold
28     verbose: bool = False,
29 ) -> str                        # path to exported file or directory

model.val()

python

1 model.val(
2     data: str = None,           # path to data.yaml
3     batch: int = 16,
4     imgsz: int = None,
5     conf: float = 0.001,
6     iou: float = 0.6,
7     workers: int = 4,
8     allow_download_scripts: bool = False,
9     device: str = None,
10     split: str = "val",         # "val", "test", or "train"
11     augment: bool = False,
12     save_json: bool = False,
13     verbose: bool = True,
14 ) -> dict

Returns (COCO evaluation, default):

python

1 {
2     "metrics/mAP50-95": float,   # COCO primary metric
3     "metrics/mAP50": float,
4     "metrics/mAP75": float,
5     "metrics/mAP_small": float,
6     "metrics/mAP_medium": float,
7     "metrics/mAP_large": float,
8     "metrics/AR1": float,
9     "metrics/AR10": float,
10     "metrics/AR100": float,
11     "metrics/AR_small": float,
12     "metrics/AR_medium": float,
13     "metrics/AR_large": float,
14 }

model.train() (YOLO9)

python

1 model.train(
2     data: str,                  # path to data.yaml (required)
3     *,
4     epochs: int = 300,
5     batch: int = 16,
6     imgsz: int = 640,
7     lr0: float = 0.01,
8     optimizer: str = "SGD",
9     device: str = "",
10     workers: int = 8,
11     seed: int = 0,
12     project: str = "runs/train",
13     name: str = "yolo9_exp",
14     exist_ok: bool = False,
15     resume: bool = False,
16     amp: bool = True,
17     patience: int = 50,
18     allow_download_scripts: bool = False,
19     callbacks = None,
20 ) -> dict

Returns the standard LibreYOLO training dict with final_loss, best_mAP50, best_mAP50_95, best_epoch, save_dir, best_checkpoint, and last_checkpoint.

model.train() (RF-DETR)

python

1 model.train(
2     data: str,                  # path to data.yaml
3     epochs: int = 100,
4     batch_size: int = 4,
5     lr: float = 1e-4,
6     output_dir: str = "runs/train",
7     resume: str = None,
8     **kwargs,                   # additional RF-DETR training args
9 ) -> dict

Additional experimental trainers exist for YOLO-NAS, D-FINE, DEIM, DEIMv2, EC, PicoDet, RT-DETRv2/v4, and RTMDet, plus the new classification (MobileNetV4, ConvNeXt, EfficientNetV2, DINOv2), semantic-segmentation (DINOv2), and point (FOMO) families. They follow the same model.train(data="...yaml", ...) shape but their defaults and experimental gates are family-specific.

Runtime artifact loading

Load exported artifacts through LibreYOLO(), the same way you load PyTorch checkpoints. The factory chooses ONNX Runtime, TorchScript, TensorRT, OpenVINO, NCNN, or CoreML from the path:

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("model.onnx")
4 model = LibreYOLO("model.torchscript")
5 model = LibreYOLO("model.engine")
6 model = LibreYOLO("model_openvino/")
7 model = LibreYOLO("model_ncnn/")
8 model = LibreYOLO("model.mlpackage", compute_units="all")

Advanced integrations can reach lower-level runtime modules, but normal application code should stay on the factory path.

ValidationConfig

python

1 from libreyolo import ValidationConfig
2 
3 config = ValidationConfig(
4     data="coco128.yaml",
5     data_dir=None,             # override dataset root directory
6     split="val",               # "val", "test", or "train"
7     batch_size=16,
8     imgsz=640,
9     conf_thres=0.001,
10     iou_thres=0.6,
11     max_det=300,
12     iou_thresholds=(           # mAP IoU sweep
13         0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, 0.95,
14     ),
15     device="auto",
16     save_dir=None,
17     save_json=False,
18     verbose=True,
19     num_workers=4,
20     half=False,
21     augment=False,             # test-time augmentation (TTA)
22     allow_download_scripts=False,
23     # Pose-only fields (PoseValidator)
24     keypoints_json=None,
25     images_dir=None,
26     oks_sigmas=None,
27 )
28 
29 # Load/save YAML
30 config = ValidationConfig.from_yaml("config.yaml")
31 config.to_yaml("config.yaml")

Architecture Guide

This section is for contributors who want to understand the codebase internals.

Base class design

PyTorch model families inherit from BaseModel in libreyolo/models/base/model.py. Subclasses implement these abstract methods:

Method	Purpose
`_init_model()`	Build and return the nn.Module
`_get_available_layers()`	Return layer-name to module mapping
`_get_preprocess_numpy()`	Return the NumPy preprocessor used for export / calibration
`_preprocess()`	Image to tensor conversion
`_forward()`	Model forward pass
`_postprocess()`	Raw output to detection dicts

BaseModel provides the shared wrapper behavior: prediction, export, validation, size/name metadata, and training helpers. The actual single-image, batch, and tiled inference flow lives in libreyolo/models/base/inference.py, while deployment runtimes live under libreyolo/backends/.

Package structure

text

1 libreyolo/
2     __init__.py          # Public API exports + deprecated-alias resolver
3     tasks.py             # Task types, suffix conventions, resolution rules
4     assets/parkour.jpg   # SAMPLE_IMAGE
5     models/
6         __init__.py      # LibreYOLO() factory + model registry bootstrap
7         base/
8             model.py     # BaseModel - shared wrapper behaviour
9             inference.py # Shared prediction pipeline (image/dir/video/tiled)
10         yolox/           # LibreYOLOX (detect)
11         yolo9/           # LibreYOLO9 (detect)
12         yolo9_e2e/       # LibreYOLO9E2E (detect)
13         yolonas/         # LibreYOLONAS (detect, pose)
14         dfine/           # LibreDFINE (detect)
15         deim/            # LibreDEIM (detect)
16         deimv2/          # LibreDEIMv2 (detect)
17         rtdetr/          # LibreRTDETR (detect)
18         rtdetrv2/        # LibreRTDETRv2 (detect)
19         rtdetrv4/        # LibreRTDETRv4 (detect)
20         rfdetr/          # LibreRFDETR (detect, segment, pose, obb) - lazy-loaded
21         ec/              # LibreEC / EdgeCrafter (detect, pose, segment)
22         picodet/         # LibrePICODET (detect)
23         rtmdet/          # LibreRTMDet (detect)
24         dinov2/          # LibreDINOv2 (semantic, classify) - lazy-loaded
25         mobilenetv4/     # LibreMobileNetV4 (classify)
26         convnext/        # LibreConvNeXt (classify)
27         efficientnetv2/  # LibreEfficientNetV2 (classify)
28         depth_anything/  # LibreDepthAnythingV2 (depth)
29         fomo/            # LibreFOMO (point)
30         l2cs/            # LibreL2CS (gaze, inference-only)
31     backends/
32         base.py
33         onnx.py          # ONNX Runtime loader
34         torchscript.py   # TorchScript loader
35         tensorrt.py      # TensorRT loader
36         openvino.py      # OpenVINO loader
37         ncnn.py          # NCNN loader
38         coreml.py        # CoreML loader
39     export/
40         exporter.py      # BaseExporter and format registry
41         onnx.py / torchscript.py / tensorrt.py / openvino.py / ncnn.py / coreml.py
42         config.py / calibration.py
43     training/
44         trainer.py       # Shared trainer scaffolding
45         config.py        # TrainConfig dataclass (single source of truth)
46         augment.py / callbacks.py / distributed.py / ema.py / scheduler.py
47         artifacts.py / train_config.yaml
48         # Per-family trainers live in models/<family>/trainer.py
49     validation/
50         config.py                # ValidationConfig
51         base.py / preprocessors.py
52         detection_validator.py   # DetectionValidator, SegmentationValidator
53         pose_validator.py        # PoseValidator
54         coco_evaluator.py        # COCOEvaluator
55     tracking/
56         tracker.py       # ByteTracker
57         config.py        # TrackConfig
58         kalman_filter.py / matching.py / strack.py
59     cli/
60         __init__.py      # libreyolo entrypoint (Typer app)
61         commands/        # predict / train / val / export / special
62         aliases.py / config.py / parsing.py / output.py / errors.py
63     utils/
64         results.py       # Results, Boxes, Masks, Keypoints, Probs, OBB, Gaze
65         image_loader.py  # Unified image loading
66         video.py         # VideoSource, VideoWriter, video inference loop
67         general.py       # Path helpers, NMS, tiling utilities
68         download.py / drawing.py / logging.py / predict_args.py
69         serialization.py / box_ops.py
70     data/
71         dataset.py / pose_dataset.py / utils.py / yolo_coco_api.py
72     config/
73         datasets/        # Built-in dataset YAML configs (coco8, coco128, coco5000, coco, etc.)
74         export/          # TensorRT default YAML

Adding a new model family

1Create libreyolo/models/newmodel/model.py with a class inheriting BaseModel
2Set FAMILY, FILENAME_PREFIX, INPUT_SIZES, SUPPORTED_TASKS, and DEFAULT_TASK as needed
3Implement registry hooks such as can_load(), detect_size(), detect_nb_classes(), and detect_size_from_filename()
4Implement the model init, preprocess, forward, postprocess, train, and validation hooks that the family needs
5Create the supporting network and utilities under libreyolo/models/newmodel/
6Add the import to libreyolo/models/__init__.py; subclass registration happens when the import runs
7Export the class from libreyolo/__init__.py
8(Optional) Override val_preprocessor_class if validation preprocessing differs from the standard path

Export architecture

User code should export through model.export(...). Internally, BaseExporter in libreyolo/export/exporter.py owns the format registry, and concrete exporters register themselves through subclass registration.

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("LibreYOLO9c.pt")
4 model.export(format="onnx")

To add a new export format, implement a new BaseExporter subclass with a unique format_name and import it from libreyolo/export/exporter.py so the registry is populated.

Dataset Format

Every task loads through one data.yaml. Detection, instance segmentation, and OBB accept two interchangeable label formats (YOLO TXT or native COCO JSON), and the loader picks the right one from the config. Pose, semantic segmentation, depth, and classification each add a small format of their own. The table maps every task to its layout.

Formats by task

Task	Data layout	Labels
Detection	`data.yaml` + `labels/*.txt`, or COCO JSON	One box per line
Instance segmentation	`data.yaml` + polygon `.txt`, or COCO JSON	Polygon per line (TXT) / polygons + RLE (COCO)
OBB	`data.yaml` + rotated-box `.txt`, or COCO JSON	One rotated box per line
Pose	`data.yaml` + `.txt` + `kpt_shape`/`flip_idx`	Box + keypoints per line
Semantic segmentation	`data.yaml` + `masks_dir/` PNGs	Per-pixel class ID (255 = ignore)
Depth	`data.yaml` + `depths_dir/` maps	Per-pixel depth (0 = invalid)
Classification	ImageFolder (`train/<class>/`)	Folder name = class

data.yaml structure

The shared contract for detection, segmentation, OBB, and pose. train/val/test may be a directory, a .txt file list (one image path per line), or a list of paths. nc is optional: when omitted it is inferred from names.

data.yaml

1 path: /absolute/path/to/dataset   # dataset root
2 train: images/train               # dir, .txt file list, or list of paths
3 val: images/val
4 test: images/test                 # optional
5 
6 nc: 80                            # optional; inferred from names if absent
7 names: ["person", "bicycle", "car", "..."]

Configs resolve from an explicit path, the working directory, then the built-ins under libreyolo/config/datasets/. Roots default under ~/datasets; override with LIBREYOLO_DATASETS_DIR.

YOLO TXT labels

The default layout: one .txt per image under labels/, mirroring the images/ tree with the same file stem. All coordinates are normalized to [0, 1].

text

1 dataset/
2     images/train/img001.jpg
3     labels/train/img001.txt        # same stem as the image

text

1 # Detection      one box per line
2 <class_id> <cx> <cy> <w> <h>
3 
4 # Segmentation   one polygon per line (box derived from the vertices)
5 <class_id> <x1> <y1> <x2> <y2> ... <xn> <yn>
6 
7 # Pose           box, then K keypoints (needs kpt_shape / flip_idx below)
8 <class_id> <cx> <cy> <w> <h> <kx1> <ky1> <v1> ... <kxK> <kyK> <vK>
9 
10 # OBB            four rotated-box corners
11 <class_id> <x1> <y1> <x2> <y2> <x3> <y3> <x4> <y4>

data.yaml (pose)

1 kpt_shape: [17, 3]   # K keypoints, 3 values each: x, y, visibility
2 flip_idx: [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15]

Native COCO JSON

Detection, segmentation, and OBB also load COCO JSON directly: add an annotations: block mapping each split to its JSON file. train/val then point at image directories (not .txt lists). Requires pycocotools; class names come from the JSON categories, so nc/names are optional.

data.yaml (COCO)

1 path: dataset
2 train: images/train               # image directory
3 val: images/val
4 annotations:
5   train: annotations/train.json   # COCO instances JSON
6   val: annotations/val.json

The same switch feeds YOLO9, RF-DETR, DEIM, and D-FINE training and the detection, OBB, and pose validators. A COCO layout with annotations/instances_train2017.json on disk is also detected automatically, without the annotations: key.

Which segmentation format? A YOLO polygon row is a single ring per instance: it cannot express a hole or a split (multi-part) mask. COCO JSON keeps every polygon of an instance and decodes RLE masks, holes included. Use COCO JSON when instances have holes or disconnected parts; either format is fine for simple blobs. Crowd annotations (iscrowd: 1) are skipped.

Semantic segmentation masks

Pair each image with a single-channel mask whose pixel values are class IDs; 255 marks ignored pixels. masks_dir is substituted for images in each path (default masks), and masks must be lossless (PNG) with the same stem as their image. Optional label_mapping remaps source IDs to train IDs (unmapped values become ignore). Omit masks_dir to rasterize masks from YOLO polygon labels at load time, with a background class appended.

text

1 dataset/
2     images/train/scene001.jpg
3     masks/train/scene001.png       # single-channel class IDs, 255 = ignore

data.yaml (semantic)

1 path: /path/to/dataset
2 train: images/train
3 val: images/val
4 masks_dir: masks
5 nc: 3
6 names: ["road", "building", "vegetation"]

Depth maps

Pair each image with a single-channel depth map under depths_dir (default depths). 16-bit PNG/TIF is divided by depth_scale (default 256.0); .npy float files are used as-is. Zero, negative, and non-finite pixels are invalid. An optional depth_stem_suffix and a *_mask validity map are honored automatically. Depth is validation-only.

data.yaml (depth)

1 path: /path/to/dataset
2 val: images/val
3 depths_dir: depths
4 depth_scale: 256.0                # 16-bit PNG encoding: value / 256 = depth

Classification

Classification uses an ImageNet-style ImageFolder tree instead of a data.yaml - see Classification for the layout. data= takes a dataset root, a .zip URL, or a known name.

Built-in datasets

Configs ship under libreyolo/config/datasets/. Download behavior differs per config: URL-backed sets fetch on first use, script-backed sets need allow_download_scripts=True, and a few must be placed locally.

Config	Task	Download
`coco8`	Detection (8 images)	Automatic
`coco128`	Detection (128 images)	Automatic
`coco5000`	Detection	Script: `allow_download_scripts=True`
`coco` / `coco-val-only`	Detection (full)	Script: `allow_download_scripts=True`
`coco8-pose` / `coco-pose`	Pose	Script: `allow_download_scripts=True`
`cocostuff`	Semantic (182 classes)	Manual: place locally

python

1 results = model.val(data="coco8.yaml")                          # auto-downloads
2 results = model.train(data="coco128.yaml", epochs=10)           # auto-downloads
3 model.train(data="coco8-pose.yaml", allow_download_scripts=True)  # script config

1	from libreyolo import LibreYOLO, SAMPLE_IMAGE
2
3	# Default: YOLO9 detection
4	model = LibreYOLO("LibreYOLO9c.pt")
5	result = model(SAMPLE_IMAGE, conf=0.25, save=True)
6
7	print(f"Detected {len(result)} objects")
8	print(result.boxes.xyxy)
9	print(result.saved_path)

1	git clone https://github.com/LibreYOLO/libreyolo.git
2	cd libreyolo
3	pip install -e .

1	# ONNX export and inference
2	pip install libreyolo[onnx]
3	# or: pip install onnx onnxsim onnxruntime
4
5	# RT-DETR compatibility extra (currently no extra packages)
6	pip install libreyolo[rtdetr]
7
8	# RF-DETR support
9	pip install libreyolo[rfdetr]
10	# or: pip install transformers
11
12	# TensorRT export and inference (NVIDIA GPU)
13	pip install libreyolo[tensorrt]
14	# Installs TensorRT CUDA 12 Python packages on Linux/Windows.
15	# Host driver/CUDA compatibility still matters.
16
17	# OpenVINO export and inference (Intel CPU/GPU/VPU)
18	pip install libreyolo[openvino]
19	# INT8 export also needs: pip install nncf
20
21	# NCNN export and inference
22	pip install libreyolo[ncnn]
23	# or: pip install pnnx ncnn
24
25	# TFLite export + LiteRT runtime backend (Python 3.12+)
26	pip install libreyolo[tflite]
27	# "litert" is an alias extra: pip install libreyolo[litert]
28
29	# Tracking API compatibility extra
30	pip install libreyolo[tracking]
31	# Tracking dependencies are part of the base install; Deep OC-SORT's ReID
32	# embedder weights auto-download on first use.
33
34	# CoreML export and inference (macOS only for runtime)
35	pip install libreyolo[coreml]
36	# or: pip install coremltools
37
38	# L2CS gaze optional auto-download helper
39	pip install libreyolo[gaze]
40
41	# Promptable segmentation (LibreSAM: SAM-1, SAM-2, SAM 3, MobileSAM,
42	# EdgeTAM, PicoSAM3)
43	pip install libreyolo[sam]
44
45	# Open-vocabulary detection (Grounding DINO, OWLv2, OmDet-Turbo, OV-DEIM)
46	pip install libreyolo[openvocab]
47
48	# LibreLabel AI assist (SAM click-to-mask)
49	pip install libreyolo[label]
50
51	# Zero-shot classification
52	pip install libreyolo[clip] # CLIP
53	pip install libreyolo[siglip2] # SigLIP2 tokenizer (SentencePiece)
54
55	# Validation and training plots
56	pip install libreyolo[plots]
57
58	# SenseNova Vision preview
59	pip install libreyolo[sensenova]
60
61	# Converter-only dependencies for CLIP and SigLIP2 checkpoints
62	pip install libreyolo[clip-convert]
63	pip install libreyolo[siglip2-convert]
64
65	# LoRA fine-tuning (peft)
66	pip install libreyolo[lora]
67
68	# Experiment loggers
69	pip install libreyolo[tensorboard] # or [mlflow], [wandb]
70
71	# EoMT instance / panoptic segmentation
72	pip install libreyolo[eomt]
73
74	# Install every optional LibreYOLO extra
75	pip install libreyolo[all]

1	# ONNX environment
2	uv venv .venv-onnx
3	uv pip install --python .venv-onnx/bin/python -e '.[onnx]'
4
5	# RT-DETR environment
6	uv venv .venv-rtdetr
7	uv pip install --python .venv-rtdetr/bin/python -e '.[rtdetr]'
8
9	# Repeat with .[rfdetr], .[openvino], .[ncnn], .[coreml], .[gaze], .[tracking], or .[tensorrt] as needed

1	from libreyolo import LibreYOLO, SAMPLE_IMAGE
2
3	# Use the official checkpoint name and let the factory resolve the details
4	model = LibreYOLO("LibreYOLO9c.pt")
5
6	# Run on a single image (SAMPLE_IMAGE ships with the package)
7	result = model(SAMPLE_IMAGE)
8
9	print(f"Found {len(result)} objects")
10	print(result.boxes.xyxy) # bounding boxes (N, 4)
11	print(result.boxes.conf) # confidence scores (N,)
12	print(result.boxes.cls) # class IDs (N,)

1	from libreyolo import LibreYOLO, SAMPLE_IMAGE
2
3	# Same factory, same call shape - just point at an RF-DETR checkpoint
4	model = LibreYOLO("LibreRFDETRs.pt")
5	result = model(SAMPLE_IMAGE)
6
7	print(f"Found {len(result)} objects")
8	print(result.boxes.xyxy)

1	result = model(SAMPLE_IMAGE, save=True)
2	print(result.saved_path) # e.g. runs/detect/predict/parkour.jpg

1	results = model("images/", save=True, batch=4)
2	for r in results:
3	print(f"{r.path}: {len(r)} detections")

1	from libreyolo import LibreYOLO
2
3	model = LibreYOLO("LibreYOLO9c.pt") # detection

1	from libreyolo import LibreYOLO
2
3	model = LibreYOLO("LibreRFDETRs.pt") # detect (validated)
4	# model = LibreYOLO("LibreRFDETRs-seg.pt") # segment (validated)
5	# model = LibreYOLO("LibreRFDETRx-pose.pt") # pose (research preview)
6	# model = LibreYOLO("LibreRFDETRn-obb.pt") # obb (research preview)

1	from libreyolo import LibreYOLO
2
3	# 1. Filename suffix decides → segment
4	model = LibreYOLO("LibreRFDETRs-seg.pt")
5
6	# 2. Override regardless of filename
7	model = LibreYOLO("custom_weights.pt", task="segment")
8
9	# 3. Detection is implicit
10	model = LibreYOLO("LibreYOLO9c.pt") # task="detect"

1	# Detection (implicit)
2	LibreYOLO9c.pt
3	LibreRFDETRs.pt
4	LibreRTDETRr50.pt
5
6	# Instance segmentation (-seg)
7	LibreRFDETRs-seg.pt
8	LibreECm-seg.pt
9
10	# Semantic segmentation (-sem)
11	LibreDINOv2n.pt # semantic is DINOv2's default; -sem optional
12	LibreSegformerb0-sem.pt # SegFormer requires the -sem suffix
13
14	# Panoptic segmentation (-panoptic)
15	LibreEoMTs-panoptic.pt
16
17	# Pose (-pose)
18	LibreYOLONASn-pose.pt
19	LibreECs-pose.pt
20	LibreRFDETRx-pose.pt # preview
21
22	# Oriented boxes (-obb)
23	LibreRFDETRn-obb.pt # preview
24
25	# Classification (-cls)
26	LibreMobileNetV4s-cls.pt
27	LibreConvNeXtt-cls.pt
28	LibreEfficientNetV2b0-cls.pt
29	# LibreDINOv2 classify checkpoints are not publicly shipped in v1.4.0
30	LibreSigLIP2b16-cls.pt # zero-shot
31
32	# Depth (-depth)
33	LibreDepthAnythingV2s-depth.pt
34	LibreZipDepthb-depth.pt
35
36	# Restoration / super-resolution (-restore)
37	LibreNAFNetl-restore-sidd.pt
38	LibreSwinIRm-restore.pt
39	LibreRealESRGANx4-restore.pt
40
41	# Background removal (-matte)
42	LibreBiRefNetl-matte.pt
43
44	# OCR (-ocr)
45	LibrePPOCRt-ocr.pt
46
47	# Point (-point)
48	LibreFOMOs-point.pt
49
50	# Gaze (-gaze optional; only task for L2CS)
51	LibreL2CSr50.pt

1	result = model(
2	"image.jpg",
3	conf=0.25, # confidence threshold (default: 0.25)
4	iou=0.45, # NMS IoU threshold (default: 0.45)
5	imgsz=640, # input size override (default: model's native)
6	device="auto", # "auto", "cpu", "mps", "0", "cuda:0", ...
7	classes=[0, 2, 5], # filter to specific class IDs (default: all)
8	max_det=300, # max detections per image (default: 300)
9	augment=False, # test-time augmentation where implemented
10	save=True, # save annotated image (default: False)
11	batch=4, # directory batch size
12	stream=False, # video only: yield frame results instead of a list
13	vid_stride=1, # video only: process every N-th frame
14	show=False, # video only: display annotated frames
15	tiling=False, # large-image tiled detection
16	overlap_ratio=0.2, # tile overlap ratio
17	output_path="out/", # images: directory; video: final file path
18	color_format="auto", # "auto", "rgb", or "bgr"
19	output_file_format="png", # output format: "jpg", "png", "webp"
20	)

1	# File path (string or pathlib.Path)
2	result = model("photo.jpg")
3	result = model(Path("photo.jpg"))
4
5	# URL
6	result = model("https://example.com/image.jpg")
7	result = model("s3://bucket/image.jpg")
8	result = model("gs://bucket/image.jpg")
9
10	# PIL Image
11	from PIL import Image
12	img = Image.open("photo.jpg")
13	result = model(img)
14
15	# NumPy array (HWC or CHW, RGB or BGR, uint8 or float32)
16	import numpy as np
17	arr = np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8)
18	result = model(arr)
19
20	# OpenCV (BGR) - specify color_format
21	import cv2
22	frame = cv2.imread("photo.jpg")
23	result = model(frame, color_format="bgr")
24
25	# PyTorch tensor (CHW or NCHW)
26	import torch
27	tensor = torch.randn(3, 640, 640)
28	result = model(tensor)
29
30	# Raw bytes
31	with open("photo.jpg", "rb") as f:
32	result = model(f.read())
33
34	# BytesIO
35	from io import BytesIO
36	result = model(BytesIO(open("photo.jpg", "rb").read()))
37
38	# Directory of images
39	results = model("images/", batch=4)

1	result = model("image.jpg")
2
3	# Number of detections
4	len(result) # e.g., 5
5
6	# Bounding boxes in xyxy format (x1, y1, x2, y2)
7	result.boxes.xyxy # tensor of shape (N, 4)
8
9	# Bounding boxes in xywh format (center_x, center_y, width, height)
10	result.boxes.xywh # tensor of shape (N, 4)
11
12	# Confidence scores
13	result.boxes.conf # tensor of shape (N,)
14
15	# Class IDs
16	result.boxes.cls # tensor of shape (N,)
17
18	# Combined data: [x1, y1, x2, y2, conf, cls]
19	# Tracking adds a track_id column before conf/cls.
20	result.boxes.data # shape (N, 6), or (N, 7) when tracked
21
22	# Metadata
23	result.orig_shape # (height, width) of original image
24	result.path # source file path (or None)
25	result.names # {0: "person", 1: "bicycle", ...}
26
27	# Move to CPU / convert to numpy
28	result_cpu = result.cpu()
29	boxes_np = result.boxes.numpy()

1	# Only detect people (class 0) and cars (class 2)
2	result = model("image.jpg", classes=[0, 2])

1	meta = model.info(detailed=False, verbose=True)
2	# meta -> {"family": ..., "size": ..., "task": ..., "params": ..., "imgsz": ..., "names": {...}, ...}

1	result = model(
2	"large_aerial_image.jpg",
3	tiling=True,
4	overlap_ratio=0.2, # 20% overlap between tiles (default)
5	save=True,
6	)
7
8	# Extra metadata on tiled results
9	result.tiled # True
10	result.num_tiles # number of tiles used
11	result.saved_path # output directory when save=True
12	result.tiles_path # directory containing per-tile crops
13	result.grid_path # grid visualization image

1	from libreyolo import LibreYOLO
2
3	model = LibreYOLO("LibreYOLO9c.pt")
4	results = model("clip.mp4", save=True)
5	# Saved under runs/detect/predict*/clip.mp4

1	for result in model("long_clip.mp4", stream=True):
2	print(f"frame {result.frame_idx}: {len(result)} detections")

1	# Process every 2nd frame (halves compute and saved fps)
2	results = model("clip.mp4", vid_stride=2, save=True)

1	# Display annotated frames in an OpenCV window while processing
2	results = model("clip.mp4", show=True)

1	from libreyolo import LibreYOLO
2	from libreyolo.utils.video import VideoSource, VideoWriter
3
4	model = LibreYOLO("LibreYOLO9c.pt")
5
6	with VideoSource("clip.mp4", vid_stride=1) as src, \
7	VideoWriter("out.mp4", fps=src.fps, width=src.width, height=src.height) as out:
8	for frame_bgr, frame_idx in src:
9	result = model(frame_bgr, color_format="bgr")
10	# ... draw, transform, etc.
11	out.write_frame(frame_bgr)

1	from libreyolo import LibreYOLO
2
3	model = LibreYOLO("LibreYOLO9c.pt")
4
5	for result in model.track(
6	"clip.mp4",
7	track_conf=0.25,
8	iou=0.45,
9	save=True, # writes runs/track/<video_stem>.mp4 by default
10	vid_stride=1,
11	):
12	print(result.frame_idx, result.track_id)

1	from libreyolo import LibreYOLO, ByteTracker
2	from libreyolo.utils.video import VideoSource
3
4	model = LibreYOLO("LibreYOLO9c.pt")
5	tracker = ByteTracker()
6
7	with VideoSource("clip.mp4") as src:
8	for frame_bgr, frame_idx in src:
9	result = model(frame_bgr, color_format="bgr", conf=0.1)
10	tracked = tracker.update(result)
11
12	for i in range(len(tracked.boxes)):
13	track_id = int(tracked.boxes.id[i])
14	xyxy = tracked.boxes.xyxy[i].tolist()
15	cls = int(tracked.boxes.cls[i])
16	print(f"frame {frame_idx} - id {track_id} cls {cls} {xyxy}")

1	from libreyolo import ByteTracker, TrackConfig
2
3	cfg = TrackConfig(
4	track_high_thresh=0.25, # first-stage match threshold
5	track_low_thresh=0.1, # second-stage (low-conf recovery)
6	new_track_thresh=0.25, # minimum conf to start a new track
7	match_thresh=0.8, # IoU cost cutoff (stage 1)
8	match_thresh_low=0.5, # IoU cost cutoff (stage 2)
9	match_thresh_unconfirmed=0.7, # IoU cost cutoff for unconfirmed tracks
10	track_buffer=30, # frames to keep lost tracks before removal
11	frame_rate=30, # scales track_buffer
12	fuse_score=True, # multiply IoU by detection score
13	minimum_consecutive_frames=1, # frames to confirm a new track
14	)
15	tracker = ByteTracker(config=cfg)

1	from libreyolo import LibreYOLO, OCSortConfig
2
3	cfg = OCSortConfig(
4	det_thresh=0.25, # boxes above this drive association and spawn new tracks
5	max_age=30, # frames a track survives without an observation
6	min_hits=3, # consecutive hits before a track is reported
7	iou_threshold=0.3, # minimum IoU for a valid association
8	delta_t=3, # frame span used to estimate velocity direction
9	inertia=0.2, # weight of the velocity-direction (momentum) term
10	use_byte=False, # enable the BYTE low-score recovery pass
11	)
12
13	model = LibreYOLO("LibreYOLO9c.pt")
14	for result in model.track("clip.mp4", tracker_config=cfg, save=True):
15	print(result.frame_idx, result.track_id)

1	from libreyolo import LibreYOLO, BoTSortConfig
2
3	model = LibreYOLO("LibreYOLO9c.pt")
4
5	# Select by name with defaults
6	for result in model.track("drone.mp4", tracker="botsort", save=True):
7	print(result.frame_idx, result.track_id)
8
9	# Or configure it: a config instance selects the tracker by type
10	cfg = BoTSortConfig(
11	track_high_thresh=0.25,
12	track_buffer=30,
13	enable_cmc=True, # camera-motion compensation on (default)
14	cmc_method="sparseOptFlow", # the shipped CMC estimator
15	cmc_downscale=2, # estimate flow at half resolution
16	)
17	for result in model.track("drone.mp4", tracker_config=cfg, save=True):
18	print(result.frame_idx, result.track_id)

1	from libreyolo import LibreYOLO
2	from libreyolo.tracking import DeepOCSortConfig
3
4	model = LibreYOLO("LibreYOLO9c.pt")
5
6	# Defaults: OSNet-AIN embedder, auto-downloaded
7	for result in model.track("mall.mp4", tracker="deepocsort", save=True):
8	print(result.frame_idx, result.track_id)
9
10	# Tune the appearance term, or plug in your own embedder
11	cfg = DeepOCSortConfig(
12	det_thresh=0.25,
13	embedder="osnet_ain_x0_25", # or a callable: (frame, boxes_xyxy) -> (N, D) features
14	w_association_emb=0.75, # weight of appearance vs motion in matching
15	alpha_fixed_emb=0.95, # EMA smoothing of per-track embeddings
16	)
17	for result in model.track("mall.mp4", tracker_config=cfg, save=True):
18	print(result.frame_idx, result.track_id)

1	from libreyolo import LibreEnsemble
2
3	# Weighted Boxes Fusion (the default), keep only boxes BOTH models found
4	ens = LibreEnsemble(["LibreYOLO9s.pt", "LibreRFDETRs.pt"], min_votes=2)
5
6	result = ens("image.jpg", conf=0.25)
7	print(result.boxes.xyxy)
8	print(result.names) # the unified (union) class map
9	print(result.speed) # per-member timings plus fusion