ARSFineTune: On-the-Fly Tuning of Vision Models for Unmanned Ground Vehicles
Document Type
Conference Proceeding
Publication Date
1-1-2024
Abstract
The performance of semantic segmentation (SS) can degrade when the data distribution in the deployed environment is different from what the model initially learned during their training. While domain adaptation (DA) and continual learning (CL) methods have been proposed to improve performance in new or unseen domains over time, the effort required for annotating large swathes of training data during deployment is non-trivial; acquiring new data for training incurs both significant network, device memory costs, and manual effort for labeling. To address this, we propose ARSFineTune, a novel framework that actively selects the most informative regions of visuals encountered by a mobile robot for the CL network to learn from, greatly minimizing the data transfer overhead related to annotations. We first propose a proficient entropy-driven ranking mechanism to identify candidate regions and rank challenging images at the edge node. We then facilitate a cyclical feedback loop between the server and edge, continuously refining the accuracy of semantic segmentation by fine-tuning the model with minimal transferred data to/from the field deployed device. We implement ARSFineTune in a real-time setting using the Robotics Operating System (ROS), where a Jackal (an unmanned ground vehicle - UGV) collaborates with the central server. Through extensive experiments, we found that ARSFineTune delivers competitive performance, closely aligning with existing state-of-the-art techniques, while requiring substantially less data for fine-tuning. Specifically, with only 5% of the total labeled regions (25% challenging regions of the most 20% problematic samples) of the entire dataset for fine-tuning, ARSFineTune reaches a performance level nearly identical (≈ 97%) to the previous state-of-the-art model, boasting mIoU scores of 59.5% on the Cityscape dataset and 41% on the CAD-EdgeTune dataset, which is a challenging dataset due to varying lighting conditions over time. The reduction in annotation efforts also contribute to a 23.5% improved network latency and 41% less memory usage during model inference stage on the UGV vehicle; 79% reduction in data transfer time between UGV and annotation server and finally, 16.59% reduction in latency, 28.57% less power usage and 10% less memory usage in the server during model fine-tuning stage..
Identifier
85202341903 (Scopus)
ISBN
[9798350369441]
Publication Title
Proceedings - 2024 20th International Conference on Distributed Computing in Smart Systems and the Internet of Things, DCOSS-IoT 2024
External Full Text Location
https://doi.org/10.1109/DCOSS-IoT61029.2024.00033
First Page
170
Last Page
178
Grant
#N00014-23-1-2119
Fund Ref
U.S. Army
Recommended Citation
Ahmed, Masud; Hasan, Zahid; Faridee, Abu Zaher Md; Anwar, Mohammad Saeid; Jayarajah, Kasthuri; Purushotham, Sanjay; You, Suya; and Roy, Nirmalya, "ARSFineTune: On-the-Fly Tuning of Vision Models for Unmanned Ground Vehicles" (2024). Faculty Publications. 912.
https://digitalcommons.njit.edu/fac_pubs/912