Abstract and 1 Introduction

  1. Related work

    2.1. Generative Data Augmentation

    2.2. Active Learning and Data Analysis

  2. Preliminary

  3. Our method

    4.1. Estimation of Contribution in the Ideal Scenario

    4.2. Batched Streaming Generative Active Learning

  4. Experiments and 5.1. Offline Setting

    5.2. Online Setting

  5. Conclusion, Broader Impact, and References

A. Implementation Details

B. More ablations

C. Discussion

D. Visualization

6. Conclusion

In this paper, we propose a new problem, how to design an effective method to realize the effective screening and utilization of generated data, to further improve the performance of downstream perception tasks. To address this problem, we propose a gradient-based generated data contribution estimation method and embed it into the actual training process. We design a complete pipeline that can automatically generate data to improve the performance of downstream perception tasks. Experiments prove that our method can achieve better performance than unfiltered or CLIP-filtered methods on long-tailed segmentation tasks.

Broader Impact

Our goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here.

References

Jordan Ash, Surbhi Goel, Akshay Krishnamurthy, and Sham Kakade. Gone fishing: Neural active learning with fisher embeddings. In Adv. Neural Inform. Process. Syst., pages 8927–8939, 2021.

Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, and Alekh Agarwal. Deep batch active learning by diverse, uncertain gradient lower bounds. In Int. Conf. Learn. Represent., 2020.

Shekoofeh Azizi, Simon Kornblith, Chitwan Saharia, Mohammad Norouzi, and David J. Fleet. Synthetic data from diffusion models improves imagenet classification. Transactions on Machine Learning Research, 2023.

Wenbin Cai, Ya Zhang, and Jun Zhou. Maximizing expected model change for active learning in regression. In IEEE Int. Conf. Data Mining, pages 51–60. IEEE, 2013.

Arantxa Casanova, Pedro O. Pinheiro, Negar Rostamzadeh, and Christopher J. Pal. Reinforced active learning for image segmentation. In Int. Conf. Learn. Represent., 2020.

Kai Chen, Enze Xie, Zhe Chen, Lanqing Hong, Zhenguo Li, and Dit-Yan Yeung. Integrating geometric control into text-to-image diffusion models for high-quality detection data generation via text prompt. arXiv preprint arXiv:2306.04607, 2023.

Chengxiang Fan, Muzhi Zhu, Hao Chen, Yang Liu, Weijia Wu, Huaqi Zhang, and Chunhua Shen. Divergen: Improving instance segmentation by learning wider data distribution with more diverse generative data. arXiv preprint arXiv:2405.10185, 2024.

Vitaly Feldman and Chiyuan Zhang. What neural networks memorize and why: Discovering the long tail via influence estimation. Adv. Neural Inform. Process. Syst., 33: 2881–2891, 2020.

Chun-Mei Feng, Kai Yu, Yong Liu, Salman Khan, and Wangmeng Zuo. Diverse data augmentation with diffusions for effective test-time prompt tuning. In Int. Conf. Comput. Vis., pages 2704–2714, 2023.

Yonatan Geifman and Ran El-Yaniv. Deep active learning over the long tail. arXiv preprint arXiv:1711.00941, 2017.

Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, TsungYi Lin, Ekin D Cubuk, Quoc V Le, and Barret Zoph. Simple copy-paste is a strong data augmentation method for instance segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 2918–2928, 2021.

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.

Mohamed Goudjil, Mouloud Koudil, Mouldi Bedda, and Noureddine Ghoggali. A novel active learning method using svm for text classification. Int. J. Autom. Comput., 15:290–298, 2018.

Agrim Gupta, Piotr Dollar, and Ross Girshick. Lvis: A dataset for large vocabulary instance segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 5356– 5364, 2019.

Zayd Hammoudeh and Daniel Lowd. Training data influence analysis and estimation: A survey. arXiv preprint arXiv:2212.04612, 2022.

Haoyu He, Jianfei Cai, Jing Zhang, Dacheng Tao, and Bohan Zhuang. Sensitivity-aware visual parameter-efficient fine-tuning. In Int. Conf. Comput. Vis., pages 11825– 11835, 2023.

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In IEEE Conf. Comput. Vis. Pattern Recog., pages 770–778, 2016.

Suyog Dutt Jain and Kristen Grauman. Active image segmentation propagation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 2864–2873, 2016.

Ruoxi Jia, Fan Wu, Xuehui Sun, Jiacen Xu, David Dao, Bhavya Kailkhura, Ce Zhang, Bo Li, and Dawn Song. Scalability vs. utility: Do we have to sacrifice one for the other in data importance quantification? In IEEE Conf. Comput. Vis. Pattern Recog., pages 8239–8247, 2021.

Ajay J Joshi, Fatih Porikli, and Nikolaos Papanikolopoulos. Multi-class active learning for image classification. In IEEE Conf. Comput. Vis. Pattern Recog., pages 2372– 2379. IEEE, 2009.

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollar, and Ross Girshick. Segment anything. In Int. Conf. Comput. Vis., pages 4015–4026, 2023.

Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. In Proc. Int. Conf. Mach. Learn., pages 1885–1894. PMLR, 2017.

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.

David D Lewis. A sequential algorithm for training text classifiers: Corrigendum and additional data. In Acm Sigir Forum, pages 13–19. ACM New York, NY, USA, 1995.

David D Lewis and Jason Catlett. Heterogeneous uncertainty sampling for supervised learning. In Machine learning proceedings 1994, pages 148–156. Elsevier, 1994.

Daiqing Li, Huan Ling, Seung Wook Kim, Karsten Kreis, Sanja Fidler, and Antonio Torralba. Bigdatasetgan: Synthesizing imagenet with pixel-wise annotations. In IEEE Conf. Comput. Vis. Pattern Recog., pages 21330–21340, 2022.

Feng Li, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, and Jianfeng Gao. Semantic-sam: Segment and recognize anything at any granularity. arXiv preprint arXiv:2307.04767, 2023a.

Ziyi Li, Qinye Zhou, Xiaoyun Zhang, Ya Zhang, Yanfeng Wang, and Weidi Xie. Open-vocabulary object segmentation with diffusion models. In Int. Conf. Comput. Vis., pages 7667–7676, 2023b.

Robert F Ling. Residuals and influence in regression, 1984.

Yang Liu, Muzhi Zhu, Hengtao Li, Hao Chen, Xinlong Wang, and Chunhua Shen. Matcher: Segment anything with one shot using all-purpose feature matching. arXiv preprint arXiv:2305.13310, 2023.

Zhuoming Liu, Hao Ding, Huaping Zhong, Weijia Li, Jifeng Dai, and Conghui He. Influence selection for active learning. In Int. Conf. Comput. Vis., pages 9274–9283, 2021.

Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, and Baining Guo. Swin transformer v2: Scaling up capacity and resolution. In IEEE Conf. Comput. Vis. Pattern Recog., 2022.

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.

Timo Luddecke and Alexander Ecker. Image segmentation ¨ using text and image prompts. In IEEE Conf. Comput. Vis. Pattern Recog., pages 7086–7096, 2022.

Wenjie Luo, Alex Schwing, and Raquel Urtasun. Latent structured active learning. Adv. Neural Inform. Process. Syst., 26, 2013.

Dwarikanath Mahapatra, Behzad Bozorgtabar, JeanPhilippe Thiran, and Mauricio Reyes. Efficient active learning for image classification and segmentation using a sample selection and conditional generative adversarial network. In Int. Conf. Med. Image Comput. Comput.- Assist. Interv., pages 580–588. Springer, 2018.

Hieu T Nguyen and Arnold Smeulders. Active learning using pre-clustering. In Int. Conf. Learn. Represent., page 79, 2004.

Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan. Estimating training data influence by tracing gradient descent. pages 19920–19930, 2020.

Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar R Zaiane, and Martin Jagersand. U2-net: Going deeper with nested u-structure for salient object detection. Pattern recognition, 106:107404, 2020.

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In Proc. Int. Conf. Mach. Learn., pages 8748–8763. PMLR, 2021.

Pengzhen Ren, Yun Xiao, Xiaojun Chang, Po-Yao Huang, Zhihui Li, Brij B Gupta, Xiaojiang Chen, and Xin Wang. A survey of deep active learning. ACM computing surveys (CSUR), 54(9):1–40, 2021.

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. High-resolution im- ¨ age synthesis with latent diffusion models. In IEEE Conf. Comput. Vis. Pattern Recog., pages 10684–10695, 2022a.

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. High-resolution im- ¨ age synthesis with latent diffusion models. In IEEE Conf. Comput. Vis. Pattern Recog., pages 10684–10695, 2022b.

Alvin E Roth. The Shapley value: essays in honor of Lloyd S. Shapley. Cambridge University Press, 1988.

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding. Adv. Neural Inform. Process. Syst., 35:36479–36494, 2022.

Akanksha Saran, Safoora Yousefi, Akshay Krishnamurthy, John Langford, and Jordan T. Ash. Streaming active learning with deep neural networks. In Proc. Int. Conf. Mach. Learn., pages 30005–30021. PMLR, 2023.

Ozan Sener and Silvio Savarese. Active learning for convolutional neural networks: A core-set approach. In Int. Conf. Learn. Represent., 2018.

Alex Shonenkov, Misha Konstantinov, Daria Bakshandaeva, Christoph Schuhmann, Ksenia Ivanova, and Nadiia Klokova. Deepfloyd-if, 2023.

Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick. Training region-based object detectors with online hard example mining. In IEEE Conf. Comput. Vis. Pattern Recog., pages 761–769, 2016.

Yukun Su, Jingliang Deng, Ruizhou Sun, Guosheng Lin, Hanjing Su, and Qingyao Wu. A unified transformer framework for group-based segmentation: Cosegmentation, co-saliency detection and video salient object detection. IEEE Transactions on Multimedia, 2023.

Yoad Tewel, Rinon Gal, Gal Chechik, and Yuval Atzmon. Key-locked rank one editing for text-to-image personalization. ACM SIGGRAPH 2023 Conference Proceedings, 2023.

Alexander Vezhnevets, Joachim M Buhmann, and Vittorio Ferrari. Active learning for semantic segmentation with expected change. In IEEE Conf. Comput. Vis. Pattern Recog., pages 3162–3169. IEEE, 2012.

Weiyao Wang, Matt Feiszli, Heng Wang, Jitendra Malik, and Du Tran. Open-world instance segmentation: Exploiting pseudo ground truth from learned pairwise affinity. In IEEE Conf. Comput. Vis. Pattern Recog., pages 4422– 4432, 2022.

Ji Wei, Li Jingjing, Bi Qi, Liu Tingwei, Li Wenbo, and Cheng Li. Segment anything is not always perfect: An investigation of sam on different real-world applications. Mach. Intell. Resea., pages 1–14, 2024.

Weijia Wu, Yuzhong Zhao, Hao Chen, Yuchao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, and Chunhua Shen. Datasetdm: Synthesizing data with perception annotations using diffusion models. Adv. Neural Inform. Process. Syst., 2023a.

Weijia Wu, Yuzhong Zhao, Mike Zheng Shou, Hong Zhou, and Chunhua Shen. Diffumask: Synthesizing images with pixel-level annotations for semantic segmentation using diffusion models. In Int. Conf. Comput. Vis., pages 1206–1217, 2023b.

Jiahao Xie, Wei Li, Xiangtai Li, Ziwei Liu, Yew Soon Ong, and Chen Change Loy. Mosaicfusion: Diffusion models as data augmenters for large vocabulary instance segmentation. arXiv preprint arXiv:2309.13042, 2023.

Lihe Yang, Xiaogang Xu, Bingyi Kang, Yinghuan Shi, and Hengshuang Zhao. Freemask: Synthetic images with dense annotations make stronger segmentation models. In NeurIPS, 2023.

Chih-Kuan Yeh, Joon Kim, Ian En-Hsu Yen, and Pradeep K Ravikumar. Representer point selection for explaining deep neural networks. Adv. Neural Inform. Process. Syst., 31, 2018.

Haobo Yuan, Xiangtai Li, Chong Zhou, Yining Li, Kai Chen, and Chen Change Loy. Open-vocabulary sam: Segment and recognize twenty-thousand classes interactively. arXiv preprint arXiv:2401.02955, 2024.

Yi Ke Yun and Weisi Lin. Selfreformer: Self-refined network with transformer for salient object detection. arXiv preprint arXiv:2205.11283, 2022.

Renrui Zhang, Xiangfei Hu, Bohao Li, Siyuan Huang, Hanqiu Deng, Yu Qiao, Peng Gao, and Hongsheng Li. Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners. In IEEE Conf. Comput. Vis. Pattern Recog., pages 15211–15222, 2023.

Yuxuan Zhang, Huan Ling, Jun Gao, Kangxue Yin, JeanFrancois Lafleche, Adela Barriuso, Antonio Torralba, and Sanja Fidler. Datasetgan: Efficient labeled data factory with minimal human effort. In IEEE Conf. Comput. Vis. Pattern Recog., pages 10145–10155, 2021.

Hanqing Zhao, Dianmo Sheng, Jianmin Bao, Dongdong Chen, Dong Chen, Fang Wen, Lu Yuan, Ce Liu, Wenbo Zhou, Qi Chu, Weiming Zhang, and Nenghai Yu. X-paste: Revisiting scalable copy-paste for instance segmentation using clip and stablediffusion. In Proc. Int. Conf. Mach. Learn., 2023.

Xingyi Zhou, Vladlen Koltun, and Philipp Krahenb ¨ uhl. ¨ Probabilistic two-stage detection. arXiv preprint arXiv:2103.07461, 2021.

Muzhi Zhu, Hengtao Li, Hao Chen, Chengxiang Fan, Weian Mao, Chenchen Jing, Yifan Liu, and Chunhua Shen. Segprompt: Boosting open-world segmentation via categorylevel prompt learning. In Int. Conf. Comput. Vis., pages 999–1008, 2023.

Authors:

(1) Muzhi Zhu, with equal contribution from Zhejiang University, China;

(2) Chengxiang Fan, with equal contribution from Zhejiang University, China;

(3) Hao Chen, Zhejiang University, China ([email protected]);

(4) Yang Liu, Zhejiang University, China;

(5) Weian Mao, Zhejiang University, China and The University of Adelaide, Australia;

(6) Xiaogang Xu, Zhejiang University, China;

(7) Chunhua Shen, Zhejiang University, China ([email protected]).


This paper is available on arxiv under CC BY-NC-ND 4.0 Deed (Attribution-Noncommercial-Noderivs 4.0 International) license.