dc.description.abstract | This thesis presents an automatic forklift approach for lifting and handling pallets. The
project more specifically develops a solution for autonomous object detection and pose es-
timation by Machine Learning (ML), point cloud processing, and arithmetic calculations.
The project is based on a real-life scenario identified together with the industrial partner
Red Rock, which includes a forklift operation, where the machine is supposed to identify,
lift, and handle pallets autonomously. A key to achieving this automation is to localize and
classify the pallet as well as to estimate the Six Dimensional (6D) pose of the pallet, which
include its (x, y, z) position and (pitch, roll, yaw) orientation. Positioned directly in front
of the pallet, the pose estimation must be performed around the range of 2-meter distance
and 0° to ±45° angle.
A systematic solution consisting of two major phases, object detection, and pose estimation,
is developed to achieve the project goal. For object detection, the You Only Look Once X
(YOLOX)-S ML algorithm is selected and implemented. The algorithm is pre-trained on
the COCO dataset. It is, after that transfer, learned on the Logistics Objects in Context
(LOCO) dataset to be able to detect pallets in an industrial environment. To improve the
detection inference, the algorithm is optimized with the Intel OpenVINO toolkit, resulting in
improved inference latency by over 2.5 times on Central Processing Unit (CPU). The output
of the YOLOX-S algorithm is a bounding box around the pallet, and a custom struct links
object detection and poses estimation together. The pose estimation algorithm converts the
Two Dimensional (2D) bounding box data into Three Dimensional (3D) vectors, in which
only the relevant points in the point cloud are kept. In contrast, all irrelevant points are
filtered out from the environment. A series of arithmetic calculations from the filtered point
cloud are applied, including Random Sample Consensus (RANSAC) and vector operations,
in which the prior calculates the largest vertical plane of the identified pallet. Based on the
object detection output and the pose estimation calculations, a 3D vector and a 3D point
resulting in the pallet’s pose is found.
Several tests and experiments have been performed to evaluate and validate the developed
solution. The tests are based on a developed ground truth setup consisting of an AprilTag
marker which provides a robust and precise ground truth measurement. Results from the
standstill experiment show that the algorithm can estimate the position within 0.3 and 7.5
millimeters for the x and y axes. Moreover, the z-axis managed to be kept within 1.6 and
28.6 millimeters. The pitch orientation was kept within 3.65° and 5.21°, while the yaw ori-
entation managed to be within 0.86° and 2.64°. Overall standstill test results have evaluated
the best and worst case, respectively, within 0° and 45° degrees. | |