A Machine Learning and Point Cloud Processing based Approach for Object Detection and Pose Estimation: Design, Implementation, and Validation

Nylund, Simon; Bringager, Fredrik

Nylund, Simon; Bringager, Fredrik

Master thesis

View/Open

no.uia:inspera:106884834:23917883.pdf (27.51Mb)

URI

https://hdl.handle.net/11250/3020380

Date

2022

Metadata

Show full item record

Collections

Master's theses in Information and Communication Technology [505]

Abstract

This thesis presents an automatic forklift approach for lifting and handling pallets. The

project more specifically develops a solution for autonomous object detection and pose es-

timation by Machine Learning (ML), point cloud processing, and arithmetic calculations.

The project is based on a real-life scenario identified together with the industrial partner

Red Rock, which includes a forklift operation, where the machine is supposed to identify,

lift, and handle pallets autonomously. A key to achieving this automation is to localize and

classify the pallet as well as to estimate the Six Dimensional (6D) pose of the pallet, which

include its (x, y, z) position and (pitch, roll, yaw) orientation. Positioned directly in front

of the pallet, the pose estimation must be performed around the range of 2-meter distance

and 0° to ±45° angle.

A systematic solution consisting of two major phases, object detection, and pose estimation,

is developed to achieve the project goal. For object detection, the You Only Look Once X

(YOLOX)-S ML algorithm is selected and implemented. The algorithm is pre-trained on

the COCO dataset. It is, after that transfer, learned on the Logistics Objects in Context

(LOCO) dataset to be able to detect pallets in an industrial environment. To improve the

detection inference, the algorithm is optimized with the Intel OpenVINO toolkit, resulting in

improved inference latency by over 2.5 times on Central Processing Unit (CPU). The output

of the YOLOX-S algorithm is a bounding box around the pallet, and a custom struct links

object detection and poses estimation together. The pose estimation algorithm converts the

Two Dimensional (2D) bounding box data into Three Dimensional (3D) vectors, in which

only the relevant points in the point cloud are kept. In contrast, all irrelevant points are

filtered out from the environment. A series of arithmetic calculations from the filtered point

cloud are applied, including Random Sample Consensus (RANSAC) and vector operations,

in which the prior calculates the largest vertical plane of the identified pallet. Based on the

object detection output and the pose estimation calculations, a 3D vector and a 3D point

resulting in the pallet’s pose is found.

Several tests and experiments have been performed to evaluate and validate the developed

solution. The tests are based on a developed ground truth setup consisting of an AprilTag

marker which provides a robust and precise ground truth measurement. Results from the

standstill experiment show that the algorithm can estimate the position within 0.3 and 7.5

millimeters for the x and y axes. Moreover, the z-axis managed to be kept within 1.6 and

28.6 millimeters. The pitch orientation was kept within 3.65° and 5.21°, while the yaw ori-

entation managed to be within 0.86° and 2.64°. Overall standstill test results have evaluated

the best and worst case, respectively, within 0° and 45° degrees.

Publisher

University of Agder