YOLOv11 Instance Segmentation with OpenCV and Java (Part 3)

you can also read this article in Medium -> click here

In this article, we will look at instance segmentation using YOLOv11.

This is Part 3 of a 3-part series.

Instance segmentation goes a step further than object detection and involves identifying individual objects in an image and segmenting them from the rest of the image.

In the first part and second part, we set up our project, prepared the input image for the model, loaded our model, and fed the input data through the network. We also had a look at the post-processing and extracted the segmentation information from the results. Now, we will see how to overlay the segmentation masks over the original image.

In this part, we'll inspect the maskOutputs in detail and perform a matrix multiplication between the 32 mask coefficients we extracted in the previous step and the 32 prototype masks in maskOutputs.

Understanding the Mask Outputs

After we ran our input image through the network, we got two Mat output objects. The first one (boxOutputs) was the Bounding Box Predictions. We had a close look at them in the previous part. Here the code as a reminder:

List<String> outNames = net.getUnconnectedOutLayersNames();
List<Mat> outputsList = new ArrayList<>();
net.forward(outputsList, outNames);
 
// Get relevant outputs and print them out
Mat boxOutputs = outputsList.get(0);
Mat maskOutputs = outputsList.get(1);
 
LOGGER.info("Boxes Output: "+boxOutputs.toString());
LOGGER.info("Masks Output: "+maskOutputs.toString());

The boxOutputs had a shape of [1, 116, 8400], which we transposed to [8400 x 116] to make it easier to work with. This meant that for every 8400 predictions, we had 116 values with the following structure:

Bounding box predictions
Class probabilities
Mask coefficients (which we extracted as a last step in the previous part).

Now let's concentrate on the masks output (maskOutputs):

INFO: Masks Output: Mat [ 1*32*160*160*CV_32FC1, isCont=true, isSubmat=false, nativeObj=0x600000b57d20, dataAddr=0x14abf8000 ]

We see that it has a shape of [1, 32, 160, 160]. Let's break it down:

The maskOutputs contains 32 masks, each of size 160x160 pixels. (1 is just the batch size — meaning, we're processing one image).

Now we get to the main part:

Instance segmentation works by taking each of these 32 masks and multiplying it by its corresponding mask coefficient from the boxOutputs.

For example — if the first mask coefficient is 0.3, we multiply every pixel in the first mask by 0.3. If the second coefficient is 0.9, we multiply every pixel in the second mask by 0.9, and so on. We'll do this for all 32 masks and then sum all the resulting masks pixel by pixel to create one combined mask. This is called linear combination and this is what is going to give us the final segmentation mask.

Let's have a look at the code snippet we inspected the last time:

var segmentationMasks = new ArrayList<Mat>();
 
LOGGER.info("-----Start analysing the inference-----");
for (int i = 0; i < mat2D.rows(); i++)
{
    Mat detectionMat = mat2D.row(i);
    List<Double> scores = new ArrayList<>();
    for (int j = 4; j < NUM_CLASSES+4; j++) {
        scores.add(mat2D.get(i, j)[0]);
    }
 
    MaxScore maxScore = ScoreUtils.findMaxScore( scores );
    if(maxScore.maxValue() < 0.6) {
        continue;
    }
 
    // Extract mask coefficients
    Mat maskCoeffs = detectionMat.colRange(4 + NUM_CLASSES, 4 + NUM_CLASSES + 32);
    // Generate mask for this detection
    Mat objectMask = generateMask(maskOutputs, maskCoeffs);
    segmentationMasks.add(objectMask);
}

We'll have a closer look at one of the last lines there:

Mat objectMask = generateMask(maskOutputs, maskCoeffs);

Generating the Final Mask

We just talked about how we can use the maskOutputs and the maskCoefficients to generate the final mask, so let's have a look at the actual implementation of the generateMask method:

private static Mat generateMask(Mat prototypesMasks, Mat maskCoeffs) {
    Mat mask = new Mat(MASK_RESOLUTION, MASK_RESOLUTION, CvType.CV_32F, new Scalar(0));
 
    for (int i = 0; i < 32; i++) {
        float coeff = (float) maskCoeffs.get(0, i)[0];
 
        Mat prototypeMask = prototypesMasks.col(i).reshape(1, MASK_RESOLUTION);
        Core.addWeighted(mask, 1.0, prototypeMask, coeff, 0.0, mask);
    }
 
    // Apply sigmoid to get final mask (σ(x) = 1 / (1 + e^(-x)))
    Mat floatMask = new Mat();
    mask.convertTo(floatMask, CvType.CV_32F);
    Core.multiply(floatMask, new Scalar(-1.0), floatMask);// -x
    Core.exp(floatMask, floatMask); // e^(-x)
    Core.add(floatMask, new Scalar(1.0), floatMask); // 1 + e^(-x)
    Core.divide(1.0, floatMask, floatMask); // 1 / (1 + e^(-x))
 
    return floatMask;
}

MASK_RESOLUTION is 160, as we saw above, I've just added it as a constant.

Let's have a closer look at what happens here:

We create a blank mask and iteratively add all the 32 prototype masks, each weighted by its corresponding coefficient. We use OpenCV's addWeighted function here to do the weighted addition of each prototype mask. On every iteration, we add the prototypeMask × coeff to the mask Mat, which in the end represents our final mask. (the fifth parameter 0.0 is the gamma parameter here, which is a scalar value that can be added to the result)
The final mask can contain values outside the typical [0,1] range, which is expected for masks, so to address this, we apply the sigmoid activation function as a post-processing step:
```
σ(x) = 1/(1 + e^(-x))
```
This normalizes all the pixel values to the range [0,1]

Now that we have all the prototype masks in the segmentationMasks list!

Overlaying the Mask on the Original Image

Let's simplify our logic a bit and assume we only have one object for segmentation in our image (like my plane example in the main article image).

All the masks we get are going to be very similar so we can just take one of them and apply it on our image to see the result:

Mat objectMask = segmentationMasks.get(0);

Note again that this only works assuming we have one object in the image

Now let's overlay this mask on top of our image and visualize the instance segmentation.

What we need to do next is the following:

Resizes the mask to match the dimensions of the original image
Converts the mask from floating-point values to the full 0–255 pixel range.
Use threshold to make all object pixels white. Let's say we choose threshold 200: pixels ≥ 200 become 255 (white), pixels < 200 become 0 (black)
Apply pink overlay to the original image where the mask is active.

Here the code:

Mat objectMask = segmentationMasks.getFirst();
Mat result = image.clone();
 
Mat resizedMask = resizeMask(objectMask, image);
 
// here we just get black and white (object is in white)
Mat normalizedMask = normalizeMask(resizedMask);
 
// Use threshold to make all object pixels white
Imgproc.threshold(normalizedMask, normalizedMask, 200, 255, Imgproc.THRESH_BINARY);
// Create a solid pink colored image with the same size of our result image
Mat pinkLayer = new Mat(result.size(), result.type(), PINK );
//Copies pink pixels from pinkLayer to result only where normalizedMask is white (255)
pinkLayer.copyTo(result, normalizedMask);
 
Imgproc.resize(result, result, new Size(IMG_SIZE, IMG_SIZE), 0, 0, Imgproc.INTER_LINEAR);
 
HighGui.imshow( "results", result );
HighGui.waitKey(0);
 
HighGui.destroyAllWindows();
System.exit(0);

IMG_SIZE is 640 (see Part 1):

IMG_SIZE is 640px for this YOLO model

And the PINK constant is new Scalar(203, 192, 255)

The methods for resizing and normalization are pretty simple too:

private static Mat resizeMask(Mat mask,Mat originalImage) {
    Mat resized = new Mat();
    Imgproc.resize(mask, resized, originalImage.size(), 0, 0, Imgproc.INTER_LINEAR);
    LOGGER.info("Resized mask size: " + resized.size());
    return resized;
}
 
private static Mat normalizeMask(Mat mask) {
    Mat normalized = new Mat();
    Core.normalize(mask, normalized, 0, 255, Core.NORM_MINMAX, CvType.CV_8U);
    return normalized;
}

I hope you managed to follow along. If you run the program now, you'll see the result!

Perfect! We've successfully completed instance segmentation with YOLOv11!

Conclusion

Hopefully this series helped you get a grasp of how to do instance segmentation in Java with the help of OpenCV and YOLOv11. The process is not very straight-forward, but I hope we managed to tackle all the tricky parts in detail. If you have any questions, I'll be happy to help!

Java is becoming better and better when it comes to rapidly experimenting with object detection, instance segmentation, and integration with LLMs. This series demonstrates what is now possible in this ecosystem with the help of OpenCV.

Thanks for reading!