CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation