Localization for Warehouse Autonomous Robots

Gaurav Gupta
March 5, 2023
Read time 6 mins

Mobile robots are transforming the logistics industry. For a long time, tape-based navigation robots (Automated Guided Vehicles, AGVs) have borne the load (pun intended!) of transporting goods within factory floors. They are still responsible for a large share of material movement in heavy industries. While they primarily run on simple line following and are robust in operation, they come with some limitations.

Firstly, installing magnetic tapes is an investment-heavy process, in terms of capital and human resources. Once set up, it is difficult to keep the process flexible since all the possible marker lanes have been laid down. Further, one would still require additional markers to know the robot’s position along the length (or the curve), overall the programming can also be very custom to each deployment. Lastly, this is a wear-and-tear-heavy setup and requires regular maintenance expenditure.

An AGV in a warehouse application (source)

Lately, however, the trend and growth are in the domain of ‘natural’ or marker-less navigation carried out by Autonomous Mobile Robots (AMRs). Robots using LiDARs and Cameras typically create a map of the environment and use that to re-localize themselves when traversing in that world. Barring the difference in complexity and absence of GPS, this would be very much in line with how self-driving cars navigate. While this is the promised future of robot navigation, the available technology is often not 100% reliable, and getting close to perfection would take serious R&D effort as the state-of-the-art open-source packages (ROS/ROS2) and localization may fail in case the environment is too dynamic. On top of that, it isn’t trivial with the open source state-of-the-art software (ROS/ROS2 packages) to achieve robustness. Further, it might add to the robot BoM (Bill of Materials) owing to expensive sensors and computing units. That said, this remains the best shot and is certainly the future of navigating robots in a human-collaborative workspace.

An AMR in a automated store and retrieval system (ASRS) (source)

Ground based QR navigation

QR tags and related markers such as ArUco and others are a great and popular way to localize robots. They’re used in a variety of ways such as mounting them on walls, the ceiling, or even the floor! Robots in this case are equipped with downward-facing cameras which can then capture the marker data as it traverses over them. While there is a range of markers by different universities and industrial vendors, the fundamental principle is that they contain patterns that are easy to recognize and interpreted by marker-relevant decoding. This allows the user to get a unique identity about the marker and a relative pose of the camera to the marker. At the time of installation, the user can one time record the pose of the marker relative to a fixed frame (say a map frame). As a result, these two pieces of information and homogeneous transformations can provide the pose of the robot in the given frame when the vision system reports the information.

Marker based navigation at a warehouse (source)

This marker-based localization method is relatively inexpensive to set up as it just involves a physical printout of these patterns and some industry-relevant packing to prevent wear and tear. Unfortunately, it is prone to damage if set up in the same environment as humans or any other material actors. Another challenge to overcome is that a pose with high certainty by the camera is only generated intermittently i.e. when the marker is within the field of view of the camera. The camera must also have a high enough capture rate if the robot moves at higher speeds.

So how do we localize if an accurate pose is only generated every half a second or so? Let’s find out!

EKF Localization: Sensor Fusion

In an earlier article on GPS localization, I talked about the use of an Extended Kalman Filter (EKF) for state estimation of an outdoor robot with noisy GPS, odometry, and IMU readings. The problem of indoor maker-based navigation is quite similar. On one side it is easier in the sense that pose estimate from a vision system is highly accurate and reliable as opposed to GPS readings, on the other end the demand for accuracy is much higher. An error of a few centimeters can set the robot off course from the next marker and result in a spiral of decreasing confidence in state estimate, let alone missing the demand of the application.

For this application, we can use wheel odometry which is prone to drift, pose from a vision system that is highly reliable but infrequently available, and control inputs that don’t necessarily translate into desired state change with high accuracy. The exact accuracy depends on the specific resolution and quality of encoders, motor driver, target speed, camera resolution, and many other factors.

The filter uses a process model that predicts the state of the robot at each time step based on the previous state and the control inputs (i.e. command velocity). At each time step, the filter also incorporates measurements from wheel odometry, the downward-facing camera looking at the fiduciary markers to update the prediction and produce an improved estimate of the robot’s state. The EKF can handle non-linearity in the process and measurement models, making it well-suited for this application. By continuously fusing the measurements from the camera and the odometry, the filter can provide a robust and accurate estimate of the robot’s pose, even in the presence of noise and uncertainty.

Simulation and results

We used Unity3D to set up a differential drive robot (read more about our simulation setup here) and modeled some drift and Gaussian noise to emulate real-world odometry. We also emulated ground markers to generate accurate robot pose. When the robot reaches within a neighborhood of the markers we published an accurate robot pose with some Gaussian noise, this is a good representation of how vision systems generate pose.

The video above shows a fleet of robots in a goods-to-person application. Robots use ground markers to navigate and deliver goods to an operator who can then make a dispatch cart out of it. In plots below, we show the output from a similar run and follow it up with some technical analysis

EKF localization with ground markers
x-y covariance from EKF output

As evident, EKF (green) does a great job of providing near-ground truth (yellow) performance while the wheel encoder (blue) drifts over time. The covariance plot further demonstrates that the robot’s confidence in its pose reduces over time until it encounters another marker. Overall, the robot does a good job of localizing itself in the given environment (since filtered plot superimposes ground truth), but as noted earlier the specific accuracy would depend on the robot control, encoder accuracy, distance between successive markers, speed of motion, and many more. Requirements are often sub-centimeters for warehouse applications as the robot is required to precisely dock to a conveyor, or lift trolleys so some fine-tuning and possible inclusion of an IMU could be helpful.


While this read covered robot localization using ground-based markers, it can be easily extended to a wall or ceiling-based markers as well. The robot would in general spot multiple markers but the accuracy of each marker will be higher because of the noise due to the larger distance between the camera and the marker.

If you are looking to leverage autonomous robots for your warehouse automation use cases, we can fast forward your software development cycle, and help your robots move faster, localize better and deployment swift. Get in touch!

Read more
| Copyright © 2023 Black Coffee Robotics