This article presents “image blending”, a Image augmentation method. It explains the homogeneous mixture of images for generate realistic training data for object detection. A concrete application is illustrated in the detection of starfish on images of the coral reef.
Machine learning models are becoming increasingly important in research and industry, due to impressive progress, especially in tasks that were previously reserved for human experts. To learn, models go through a training phase for which they generally need labelled data. A general rule is that the more data available, the more the model gains in performance since it will be able to capture more phenomena (patterns) during its learning. This observation is all the more true as the problem and the data are complex.
In the context of object recognition topics, it is not uncommon for the learning game to be small, or even non-existent. It is indeed expensive to manually label large volumes of images, especially when a certain expertise is required (in the context of medical imaging for example).
One of the most used methods to address this problem is Data Augmentation (DA), in other words, the artificial increase in the size of the dataset using image manipulation methods. This increase is typically achieved by performing operations that modify the appearance of the image, without changing its semantics: for example, by changing the brightness, by rotating or mirroring, by changing the scale, or by adding noise.
In this article, we'll look at a more advanced augmentation method: image blending. Image blending is a process of transferring an image from the source domain to the target domain while ensuring that the transformed pixels conform to the target domain to ensure consistency.
The idea is thus to increase the number of images available for training an object detection model, by directly adding the target object in different backgrounds. Labelled images are thus very easily obtained: knowing where the object has been introduced, you can create the corresponding bounding box (which serves as a label) at the same time.
More specifically, we will be interested in Gradient Domain Blending, which allows for a homogeneous mixture of images. The advantage is to generate images without borders between the background and the added object, so that they are realistic enough to be useful in training a detection model.
The principle, inspired by this work, consists in solving the Poisson equation associated with the image gradient over the zone, defined by a mask, where the mixture must take place. The mask makes it possible to better specify where the object of interest is located on the image, and to define a gradient to smooth the transition between the two images.
The algorithm thus takes four elements as input:
A new image, already labeled, containing the object of interest is obtained at the output. Thus, by repeating the process on a sufficiently large number of background images, we can create a new database ready to be used for training an object detection algorithm such as YOLO.
Opposite, an example of an object of interest and the corresponding mask: the white zone defines more or less the position of the object on the image.
The example of application that we are going to study comes from a dataset made available for a Kaggle competition proposing the detection of a starfish family (Crowns of Thorns Starfish, or COTs) on video images of the Australian Great Barrier Reef.
The available dataset contains 23,000 images, including about 5,000 including the objects to be detected. The idea is to use the 18,000 images of the seabed that do not contain useful information as a basis for this method of augmentation.
The first step consists in building a dataset, in our case of about fifty entries, of objects to be added to the funds in order to simulate a variety of cases. The bulk of the manual work is at this stage, to collect the images and define the corresponding masks. This dataset is then increased using conventional techniques (mirror, rotation, contrast, etc.) in order to obtain an object base that is as varied as possible.
But generating images by randomly positioning the object on a background is not enough: in order to obtain a model that can be used in real cases, the images generated must be plausible examples in reality. For example, let's say we're looking to develop an algorithm for detecting bikes on the road. If we use this method to augment an existing dataset by inserting bike images on top of background images containing a road, we want to insert the 'bike' objects on the road and not at the top of a tree at the edge for example.
Thus, the second step consists in analyzing the real images of the dataset in order to be inspired by them. We are particularly interested in the size of the objects to be detected, and in their spatial distribution on real images.
The distributions thus obtained will allow us to build images whose diversity is as close as possible to reality.
Finally, we move on to the generation of new images:
Below are 2 before-and-after examples of augmented images with the corresponding label, which a detection model could use as a training set.
Even after analyzing real images to understand their context, it is not impossible to generate outlier images. For example, in the example images above, objects to be detected could be inadvertently introduced into areas where it is impossible to have real objects (such as areas where there is only water). Without additional verification, introducing this unrealistic data into the training game could cause the model to learn bad patterns.
One solution is to start by training a first model on the non-augmented dataset. We can then, on a principle similar to that of GaNS, use this model to detect the objects present in our new images: if the model based on real data succeeds in identifying the new “false” objects, we can infer that they are realistic enough to be used as a training base.
In conclusion, data augmentation is an indispensable component in the process of improving predictive models. More specifically, image blending makes it possible to go further in areas where few labelled data are available. However, we should not fall into the trap of increasing without understanding the context of the data and the purpose of the machine learning model, in order not to introduce outlier data into our model.
Do you have an idea of use cases involving automatic image analysis but do you feel that your training data set is too small? Contact us to identify how increasing data could be useful.