CTFS: A consolidated transformer framework for instance and semantic segmentation tasks

Kun Dai; Fuyuan Qiu; Hongbo Gao; Tao Xie; Chuqing Cao; Ruifeng Li; Lijun Zhao; Ke Wang

doi:10.1016/j.neunet.2025.107745

CTFS: A consolidated transformer framework for instance and semantic segmentation tasks

Neural Netw. 2025 Jun 24:191:107745. doi: 10.1016/j.neunet.2025.107745. Online ahead of print.

Authors

Kun Dai¹, Fuyuan Qiu², Hongbo Gao³, Tao Xie⁴, Chuqing Cao⁵, Ruifeng Li⁶, Lijun Zhao⁷, Ke Wang⁸

Affiliations

¹ State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, 150006, China; Yangtze River Delta HIT Robot Technology Research Institute, Wuhu, 241000, China. Electronic address: 20s108237@stu.hit.edu.cn.
² State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, 150006, China. Electronic address: 24S108207@stu.hit.edu.cn.
³ State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, 150006, China. Electronic address: 23b908032@stu.hit.edu.cn.
⁴ State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, 150006, China; Yangtze River Delta HIT Robot Technology Research Institute, Wuhu, 241000, China. Electronic address: xietao1997@hit.edu.cn.
⁵ Yangtze River Delta HIT Robot Technology Research Institute, Wuhu, 241000, China. Electronic address: caochuqing@ahpu.edu.cn.
⁶ State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, 150006, China. Electronic address: lrf100@hit.edu.cn.
⁷ State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, 150006, China; Yangtze River Delta HIT Robot Technology Research Institute, Wuhu, 241000, China. Electronic address: zhaolj@hit.edu.cn.
⁸ State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, 150006, China. Electronic address: wangke@hit.edu.cn.

PMID: 40578213
DOI: 10.1016/j.neunet.2025.107745

Abstract

Instance segmentation and semantic segmentation are fundamental tasks that support many computer vision applications. Recently, researchers have investigated the feasibility of constructing a unified transformer framework and leveraging multi-task learning techniques to optimize instance and semantic segmentation tasks simultaneously. However, these methods learn the proportion and distribution of task-shared parameters concurrently during the training process, which inevitably presents a challenge to sufficiently optimize the network. In addition, conventional gradient rectification algorithms attempt to address gradient conflicts from an overall perspective, but they fall short of adequately resolving conflicts among individual elements within gradient vectors. In this study, we develop a consolidated Transformer framework CTFS to address these issues. To address the first issue, we introduce an affinity-guided sharing strategy (AGSS) that learns the proportion and distribution of task-shared parameters in two separate stages. This approach leverages the proportion of task-shared parameters as prior knowledge to guide the subsequent learning process, reducing the difficulty of network optimization. To address the second issue, we propose a fine-grained gradient rectification strategy (FGRS) that effectively mitigates gradient conflicts for each element in gradient vectors during backpropagation. Built upon the standard Swin Transformer without complicating its network architecture, CTFS achieves impressive performance on both the COCO dataset for the instance segmentation task and the ADE20K dataset for the semantic segmentation task.

Keywords: Gradient conflicts; Instance segmentation; Multi-task learning; Semantic segmentation.