A Machine Learning and Point Cloud Processing based Approach for Object Detection and Pose Estimation: Design, Implementation, and Validation
Abstract
This thesis presents an automatic forklift approach for lifting and handling pallets. The project more specifically develops a solution for autonomous object detection and pose estimation by Machine Learning (ML), point cloud processing, and arithmetic calculations.The project is based on a real-life scenario identified together with the industrial partner Red Rock, which includes a forklift operation, where the machine is supposed to identify,lift, and handle pallets autonomously. A key to achieving this automation is to localize and classify the pallet as well as to estimate the Six Dimensional (6D) pose of the pallet, which include its (x, y, z) position and (pitch, roll, yaw) orientation. Positioned directly in front of the pallet, the pose estimation must be performed around the range of 2-meter distance and 0° to ±45° angle.
A systematic solution consisting of two major phases, object detection, and pose estimation, is developed to achieve the project goal. For object detection, the You Only Look Once X (YOLOX)-S ML algorithm is selected and implemented. The algorithm is pre-trained on the COCO dataset. It is, after that transfer, learned on the Logistics Objects in Context (LOCO) dataset to be able to detect pallets in an industrial environment. To improve the detection inference, the algorithm is optimized with the Intel OpenVINO toolkit, resulting in improved inference latency by over 2.5 times on Central Processing Unit (CPU). The output of the YOLOX-S algorithm is a bounding box around the pallet, and a custom struct links object detection and poses estimation together. The pose estimation algorithm converts the Two Dimensional (2D) bounding box data into Three Dimensional (3D) vectors, in which only the relevant points in the point cloud are kept. In contrast, all irrelevant points are filtered out from the environment. A series of arithmetic calculations from the filtered point cloud are applied, including Random Sample Consensus (RANSAC) and vector operations, in which the prior calculates the largest vertical plane of the identified pallet. Based on the object detection output and the pose estimation calculations, a 3D vector and a 3D point resulting in the pallet’s pose is found.
Several tests and experiments have been performed to evaluate and validate the developed solution. The tests are based on a developed ground truth setup consisting of an AprilTag marker which provides a robust and precise ground truth measurement. Results from the standstill experiment show that the algorithm can estimate the position within 0.3 and 7.5 millimeters for the x and y axes. Moreover, the z-axis managed to be kept within 1.6 and 28.6 millimeters. The pitch orientation was kept within 3.65° and 5.21°, while the yaw orientation managed to be within 0.86° and 2.64°. Overall standstill test results have evaluated the best and worst case, respectively, within 0° and 45° degrees.