A Machine Learning and Point Cloud Processing based Approach for Object Detection and Pose Estimation: Design, Implementation, and Validation
Abstract
This thesis presents an automatic forklift approach for lifting and handling pallets. Theproject more specifically develops a solution for autonomous object detection and pose es-timation by Machine Learning (ML), point cloud processing, and arithmetic calculations.The project is based on a real-life scenario identified together with the industrial partnerRed Rock, which includes a forklift operation, where the machine is supposed to identify,lift, and handle pallets autonomously. A key to achieving this automation is to localize andclassify the pallet as well as to estimate the Six Dimensional (6D) pose of the pallet, whichinclude its (x, y, z) position and (pitch, roll, yaw) orientation. Positioned directly in frontof the pallet, the pose estimation must be performed around the range of 2-meter distanceand 0° to ±45° angle.
A systematic solution consisting of two major phases, object detection, and pose estimation,is developed to achieve the project goal. For object detection, the You Only Look Once X(YOLOX)-S ML algorithm is selected and implemented. The algorithm is pre-trained onthe COCO dataset. It is, after that transfer, learned on the Logistics Objects in Context(LOCO) dataset to be able to detect pallets in an industrial environment. To improve thedetection inference, the algorithm is optimized with the Intel OpenVINO toolkit, resulting inimproved inference latency by over 2.5 times on Central Processing Unit (CPU). The outputof the YOLOX-S algorithm is a bounding box around the pallet, and a custom struct linksobject detection and poses estimation together. The pose estimation algorithm converts theTwo Dimensional (2D) bounding box data into Three Dimensional (3D) vectors, in whichonly the relevant points in the point cloud are kept. In contrast, all irrelevant points arefiltered out from the environment. A series of arithmetic calculations from the filtered pointcloud are applied, including Random Sample Consensus (RANSAC) and vector operations,in which the prior calculates the largest vertical plane of the identified pallet. Based on theobject detection output and the pose estimation calculations, a 3D vector and a 3D pointresulting in the pallet’s pose is found.
Several tests and experiments have been performed to evaluate and validate the developedsolution. The tests are based on a developed ground truth setup consisting of an AprilTagmarker which provides a robust and precise ground truth measurement. Results from thestandstill experiment show that the algorithm can estimate the position within 0.3 and 7.5millimeters for the x and y axes. Moreover, the z-axis managed to be kept within 1.6 and28.6 millimeters. The pitch orientation was kept within 3.65° and 5.21°, while the yaw ori-entation managed to be within 0.86° and 2.64°. Overall standstill test results have evaluatedthe best and worst case, respectively, within 0° and 45° degrees.