Train YOLOv9: Custom Object Detection Made Easy

‍

In this guide, we'll show the process of training a YOLOv9 model using a custom dataset. Specifically, we'll provide an example that focuses on training a vision model to recognize basketball players on a court. However, this guide is versatile, allowing you to apply it to any dataset of your choosing.

‍

What is YOLOv9?

With the continuous evolution of computer vision technologies, YOLOv9 emerges as the latest advancement, developed by Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao.

‍

This trio of researchers has a rich history in the field, having contributed to the development of preceding models such as YOLOv4, YOLOR, and YOLOv7. YOLOv9 not only continues the legacy of its predecessors but also introduces significant innovations that set new benchmarks in object detection capabilities.

‍

YOLOv9 is an advanced object detection model that represents a significant leap forward in computer vision technology. It is the latest iteration in the "You Only Look Once" (YOLO) series, known for its high speed and accuracy in detecting objects in images.

‍

YOLOv9 stands out due to its incorporation of Programmable Gradient Information (PGI) and the introduction of the Generalized Efficient Layer Aggregation Network (GELAN), two groundbreaking innovations designed to enhance model performance and efficiency.

‍

Accuracy and Performance

YOLOv9 performance comparison on MS COCO dataset — Comparisons of the real-time object detectors on MSCOCO dataset [1]

‍

The YOLOv9 model is available in four variants, categorized based on their parameter count:

‍

Model	size (pixels)	AP^val	AP₅₀^val	AP₇₅^val	Params (M)	FLOPs (G)
YOLOv9-S	640	46.8%	63.4%	50.7%	7.1	26.3
YOLOv9-M	640	51.4%	68.1%	56.1%	20.0	76.3
YOLOv9-C	640	53.0%	70.2%	57.8%	25.3	102.1
YOLOv9-E	640	55.6%	72.8%	60.6%	57.3	189.0

‍

As of the latest update, the weights for the YOLOv9-S and YOLOv9-M models remain unpublished. The differentiation in model sizes caters to a range of application needs, from lightweight models for edge devices to more comprehensive models for high-performance computing environments.

‍

In terms of performance, YOLOv9 sets a new standard in the field of object detection. The smallest model configuration, despite its limited size, achieves an impressive 46.8% AP (Average Precision) on the MS COCO dataset's validation set. Meanwhile, the largest model variant, v9-E, boasts a remarkable 55.6% AP, establishing a new state-of-the-art benchmark for object detection performance.

‍

This leap in accuracy demonstrates the effectiveness of YOLOv9's innovative optimization strategies.

‍

Architecture and Innovations

The YOLOv9 architecture introduces a significant advancement in the field of object detection by incorporating Programmable Gradient Information (PGI) and a new network architecture called Generalized Efficient Layer Aggregation Network (GELAN). Here's an explanation of these key components:

‍

Programmable Gradient Information (PGI)

PGI is a novel concept aimed at addressing the challenge of data loss within deep neural networks. In traditional architectures, as information passes through multiple layers, some of it gets lost, leading to less efficient learning and model performance. PGI allows for more precise control over the gradients during the training process, ensuring that critical information is preserved and utilized more effectively. This leads to improved learning outcomes and model accuracy.

‍

Generalized Efficient Layer Aggregation Network (GELAN)

GELAN represents a significant innovation within the YOLOv9 architecture. It is designed to enhance the model's performance and efficiency by optimizing how different layers in the network aggregate and process information. The key focus of GELAN is to maximize parameter utilization, ensuring that the model can achieve higher accuracy without a proportional increase in computational resources or model size.

‍

This architecture allows YOLOv9 to tackle object detection tasks with greater precision and efficiency, setting a new benchmark in the performance of deep learning models for computer vision.

‍

The combination of PGI and GELAN in YOLOv9 represents a holistic approach to improving the learning capabilities of neural networks, focusing not just on the depth or width of the model, but also on how effectively it can learn and retain information throughout the training process.

‍

This leads to a model that is not only highly accurate but also efficient in terms of computational resources, making it suitable for a wide range of applications from edge devices to cloud-based systems.

‍

Easily train YOLOv9 on a custom dataset

The Ikomia API allows to train and infer YOLOv9 object detector with minimal coding.

‍

Setup

To begin, it's important to first install the API in a virtual environment [3]. This setup ensures a smooth and efficient start to using the API's capabilities.


pip install ikomia

‍

Dataset

For this tutorial, we're using a Basketball dataset [4] from Roboflow with 539 images to illustrate the training of our custom YOLOv9 object detection model. The dataset contains nine labels:

Real Physical Objects: Player, Referee, Hoop, Ball
TV Screen Information: Team Name, Team Points, Time Remaining, Period, Shot Clock

‍

These labels encompass both tangible objects on the basketball court and digitally overlaid information typically displayed on a TV screen, offering a comprehensive approach to object detection within the context of a basketball game.

‍

Train YOLOv9 with a few lines of code

You can also directly charge the notebook we have prepared.

Go to notebook

Go to Colab


from ikomia.dataprocess.workflow import Workflow
import os


#----------------------------- Step 1 -----------------------------------#
# Create a workflow which will take your dataset as input and
# train a YOLOv9 model on it
#------------------------------------------------------------------------#
wf = Workflow()

#----------------------------- Step 2 -----------------------------------#
# First you need to convert the COCO format to IKOMIA format.
# Add an Ikomia dataset converter to your workflow.
#------------------------------------------------------------------------#

dataset = wf.add_task(name="dataset_coco")

dataset.set_parameters({
    "json_file":"Path/To/Dataset/train/_annotations.coco.json",
    "image_folder":"Path/To/Dataset/train",
    "task":"detection",
    "output_folder":os.getcwd()+"/dataset"
})

#----------------------------- Step 3 -----------------------------------#
# Then, you want to train a YOLOv9 model.
# Add YOLOv9 training algorithm to your workflow
#------------------------------------------------------------------------#

train = wf.add_task(name="train_yolo_v9", auto_connect=True)
train.set_parameters({
    "model_name":"yolov9-c",
    "epochs":"50",
    "batch_size":"8",
    "train_imgsz":"640",
    "test_imgsz":"640",
    "dataset_split_ratio":"0.8",
    "output_folder":os.getcwd(),
}) 

#----------------------------- Step 4 -----------------------------------#
# Execute your workflow.
# It automatically runs all your tasks sequentially.
#------------------------------------------------------------------------#
wf.run()

Here are the configurable parameters :

‍

model_name (str) - default 'yolov9-c': Model architecture to be trained. Should be one of :

- yolov9-s (coming soon)

- yolov9-m (coming soon)

- yolov9-c

- yolov9-e

train_imgsz (int) - default '640': Size of the training image.
test_imgsz (int) - default '640': Size of the eval image.
epochs (int) - default '50': Number of complete passes through the training dataset.
batch_size (int) - default '8': Number of samples processed before the model is updated.
dataset_split_ratio (float) – default '0.9': Divide the dataset into train and evaluation sets ]0, 1[.
output_folder (str, optional): Path to where the model will be saved.
config_file (str, optional): Path to hyperparameters configuration file .yaml.
dataset_folder (str, optional): Path to where the re-formatted dataset will be saved.
model_weight_file (str, optional): Path to pretrained model weights. Can be used to fine tune a model.

‍

The training process for 50 epochs was completed in approximately 50 mins using an NVIDIA L4 24GB GPU.

‍

Once your model has completed its training phase, you can assess the performance by analyzing the graphs produced by the YOLOv9 training process. These visualizations represent various metrics that are crucial for understanding the effectiveness of your object detection model.

‍

In summary, these plots suggest that the model has learned and improved its ability to detect and classify objects as training progressed. The high precision along with increasing recall and mAP values are indicative of a well-performing model. However, we can see that the model would have benefited from being trained for longer.

‍

Run your fine-tuned YOLOv9 model

We can test our custom model using the ‘infer_yolo_v9’ algorithm. While by default the algorithm uses the COCO pre-trained Yolov9-c model, we can apply our fine-tuned model by specifying the 'model_weight_file' and 'class_file' parameters accordingly.


from ikomia.dataprocess.workflow import Workflow
from ikomia.utils.displayIO import display

# Create your workflow for YOLO inference
wf = Workflow()

# Add the YOLOv9 algorithm to your workflow
yolov9 = wf.add_task(name="infer_yolo_v9", auto_connect=True)

yolov9.set_parameters({
    "model_weight_file":"Path/To/[Timestramp]/weights/best.pt",
    "class_file":"Path/To/[Timestramp]/classes.yaml",
    "conf_thres":"0.3",
    "iou_thres":"0.25"
})

# Run on your image
wf.run_on(url="https://pbs.twimg.com/ext_tw_video_thumb/1660454979298115585/pu/img/A_Jrl2uawkkDi_Kf.jpg")
# wf.run_on(path=os.getcwd()+"/test/youtube-128_jpg.rf.2723e31eec77e1ff7b73c45c625082f6.jpg")

# Get the object detection image output
img_bbox = yolov9.get_image_with_graphics()

# Display
display(img_bbox)