Train Deep Learning models with Ikomia
– 10 min read –
Today, Deep Learning algorithms represent one of the main research field in Computer Vision. Ikomia is a platform that aims to facilitate evaluation and sharing of Computer Vision algorithms. Thus it must provide ready to use Deep Learning models in its marketplace. Moreover, such algorithms are often composed by two independent parts: training and inference (forward pass). That’s why you may find two Ikomia plugins for the same algorithm in the marketplace. Currently there are more plugins providing Deep Learning model inference only.
In this post, we will describe how to train well-known Deep Learning models on your data with Ikomia plugins. We will focus on three application fields: classification, object detection and semantic segmentation. All plugins used in this tutorial are available in the public marketplace. Source codes are also available in our GitHub repository.
Image classification is the task of identifying what image represents. So training process uses image datasets where image representations are know. As a result, classification algorithm does not need any additional annotations. As a convention, the file structure of the dataset is fixed. We follow the PyTorch structure: a root folder with train and val sub-folder. And at the last folder level, as many sub-folders as the number of classes:
Loading such dataset in Ikomia is easy. You just have to use the ‘Open Folder’ function from the main menu. Then you have the possibility to visualize contained image directly.
Firstly you have to select the Deep Learning model you want to train. If you did not install such algorithms yet, open the Ikomia marketplace and search for a classification model. Use the search bar with convenient keywords (ex: classification, train) to find it easily. Then you are able to pick your Deep Learning model from the Ikomia process library.
Secondly, each algorithm comes with its own set of hyper-parameters. And you will have to setup them correctly in the window that pops up when you select your algorithm. Parameters can also be modified in the information area of the Workflow Creator. For example, the ResNet parameters window looks like this:
After that, you should have a new workflow containing a single node. Training algorithms for classification require a Folder data type as unique input. So what we have to do now is to set the global input of our workflow (consult this post if you are not familiar with the Workflow Creator):
- add a new global input.
- select the Folder data type.
- select the root folder of the classification dataset in the project view.
Finally, connect the global input to your node and you are ready to launch your training.
At the time of writing, Ikomia marketplace offers the following classification models:
- ResNet (18, 34, 50, 101 and 152 layers)
- ResNeXt (50 and 101 layers)
- MnasNet (for mobile devices)
Of course, this list will continue growing and contributions are welcome!
Object detection aims to find objects of certain target classes with precise localization in a given image and assign each instance a corresponding class label. The training process of such algorithms requires object annotations for each image of a dataset. Basically, annotations for a given object consist in a bounding box (or polygon) associated with a class label. This tedious operation is handled by human or other deep learning algorithms.
Unfortunately, there is no standard format to represent such data. Even if some reference datasets like COCO or PascalVOC has defined some popular formats, you may have to deal with many others. That’s why Ikomia has defined its own annotation format (inspired by popular frameworks like TensorFlow and PyTorch). Then we (and the community) provide specific plugins in the marketplace to load dataset from a third-party format to Ikomia one. This system ensures that all training algorithms can be connected behind any valid dataset reader.
At the time of writing, Ikomia marketplace offers the following dataset readers:
- COCO 2017
- PascalVOC 2012
- VGG Image Annotator (VIA)
And of course, this list will continue growing and contributions are welcome!
Train object detection algorithm require compatible datasets as input. So we need to install a convenient dataset reader from the marketplace (or develop one if none exists). Such process node doesn’t need data to be loaded previously in Ikomia. In other words, dataset reader is a self-input node which is functional without setting global input for the workflow. After executing the node, you will see the dataset image gallery (can take a while..). You can visualize annotations of a selected image by double-clicking on it (and double-click again to go back to gallery).
Then you have to choose your Deep Learning model and connect it to the dataset reader. As for image classification, you need to setup hyper-parameters and you are ready to go.
At the time of writing, Ikomia marketplace offers the following object detection models (more to come…):
- Faster RCNN
- Tiny YOLO v3
- YOLO v3
- Tiny YOLO v4
- YOLO v4
- EfficientNet B0
Segmentation algorithms can be divided into three categories:
- instance segmentation: detect and segment (ie find boundaries) of each object instance in image
- semantic segmentation: assign a class label to each pixel of image
- panoptic segmentation: do both of above works
In terms of dataset, we will find the same data as in object detection field. In addition, segmentation masks (labelled image) are also provided. Ikomia manages these datasets the same way it does for object detection. We provide plugins in the marketplace to load datasets in Ikomia format.
The training scheme is also equivalent to object detection. So you have to create a workflow containing a dataset reader node followed by a training node for segmentation. You will find again a dedicated window to setup hyper-parameters.
Depending on Deep Learning models, you may have to consider different training modes:
- From scratch: training process learns all weights from scratch. This mode requires very large dataset to achieve good accuracy. It is also very time consuming.
- Transfer learning: training process starts from a pre-trained model (trained on large dataset). It learns all weights of the network but takes much less time than training from scratch. It is often a good trade-off between training time and accuracy.
- Fine-tuning: like transfer learning, training process starts from a pre-trained model. But it does not learn all weights of the network. Instead, it keeps features from one can call the backbone and learn only the final layer weights from which we derive predictions. This mode is quiet fast and may produce sufficient accuracy.
You will find these options in the parameter windows of the algorithms, among others hyper-parameters. Basically, you have to check if you want to start with pre-trained model and if you want to keep features from backbone network (Feature extract mode).
MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.
Training plugins from Ikomia integrate natively MLflow. Thus, starting a training will automatically open the MLflow dashboard. It is a centralized view where experiments, runs, parameters and metrics are reported. That way, you can keep track of your model tests. Accuracy and loss curves are also available.
For now, we manage only the logging capability of MLflow. However we plan to provide a more complete integration in a near future.
Here is what the MLflow dashboard presents:
This tutorial is now finished and you are ready to train Deep Learning models on your own data. Notice that Ikomia marketplace is public and free, all contributions are very welcome. Please visit our API documentation if you plan to publish your Deep Learning algorithm.
Thanks for reading.