Image segmentation-based pathfinding

Project realized during the second year of the Master

The aim of this project was to determine an objective to be reached in an image in order to allow a robot to navigate autonomously and avoid obstacles. Indeed, it is possible for a robot to navigate autonomously in a known map thanks to SLAM (Simultaneous Location And Mapping). SLAM requires to know the environment through a map made beforehand.

The objective here was to allow navigation without having a map of the environment and avoiding obstructions. For this, the idea was to make a robot recognize the road via a camera and make it follow the highest point of the road in the image.

To recognize the road, I relied on a Fully Convolutional Networks model. This model has been pre-trained by Nvidia and is named Segnet. It has been chosen in order to use the algorithm on outdoor roads but also in building corridors.  Since only the class of the road is used, it is possible to determine the path without taking into consideration the obstacles on the road.

This project was developed on a Jetson Nano from Nvidia and could run at more than 60 FPS as shown in the GIF. Once the goal was determined (red dot on the GIF), it was possible to determine the path to reach it.

First step

Recovering the route in the mask produced by Segnet.

The mask was recovered alone to improve the performance of the algorithm.

Second step

Conversion of the image into a graph of nodes (here the nodes are boxes and the links between the boxes are their adjacency).

Once the graph is obtained, a first Breadth-First Search is performed and is used as a filter. This breadth-first search was inspired by the game Number of Islands and allows to keep the area of the graph with the most nodes, and to remove the noise (here represented by boxes detached from the graph, see image of third step).

Third step

Search of the path on the portion of the road thanks to a Breadth-First Search starting from the bottom center of the image towards the objective.

The target was placed on the highest square of the graph. As for its x-coordinate, this was calculated by averaging the x-positions of the highest cells of the graph..

First idea to track the objective

Before implementing a width path, my idea was to determine the angle between the lens in the image and the bottom center of the image and then to make a robot follow this trajectory by adapting its orientation with an IMU.

Breadth-First Search


Choice of the pathfinding algorithm

The choice was made to use the width path rather than the famous A* because after some tests it was determined that the BFS is more efficient in some situations as shown in the pictures above.

We can read 0.30 seconds for the width path and 0.94 for the A* because we had to calculate the heuristic for a large number of cells.



This project did not give very convincing results. Indeed the Segnet model has been trained to recognize many classes and is not specially designed to recognize precisely a route, which led to too much noise and inaccuracies. The focus should have been on either indoor or outdoor navigation. One possibility would have been to train a model especially for outdoor routes or to use a model trained on a dataset like Cityscapes.

However, this project suggests the possibility of including obstacle avoidance on the trajectory of a robot. Indeed, if a robot has a global map, it will be able to move from a point A to a point B. Thus this project could allow the robot to determine a path in its local map (visible part thanks to its camera) and easily adapt its trajectory according to it to avoid obstacles without moving too far from the trajectory dictated in the global map.

Skills developed during this project

The ROS and OpenCV libraries have been used to retrieve through a subscriber the image segmentation of segnet, published on a rostopic by the node ros_deep_learning

This project was realized in Python using the following libraries: OpenCV, Numpy, Pygame, Collections (for pathfinding) and Time.

Pygame allowed the conversion of the image's pixels into a cell grid to implement the pathfinding algorithm.