Open for work - munzerdwedari@gmail.com

Generating Context-Aware Natural Answers for Questions in 3D Scenes

Mohammed Munzer Dwedari1

Matthias Nießner1

Dave Zhenyu Chen1

1Technical University of Munich

// Abstract

3D question answering is a young field in 3D vision-language that is yet to be explored. Previous methods are limited to a pre-defined answer space and cannot generate answers naturally. In this work, we pivot the question answering task to a sequence generation task to generate free-form natural answers for questions in 3D scenes (Gen3DQA). To this end, we optimize our model directly on the language rewards to secure the global sentence semantics. Here, we also adapt a pragmatic language understanding reward to further improve the sentence quality. Our method sets a new SOTA on the ScanQA benchmark (CIDEr score 72.22/66.57 on the test sets).

// Method Overview

After encoding the input scene and question into object proposals and question embeddings, they are combined into one sequence and fed into a transformer encoder. The contextualized sequence is then fed into a transformer decoder to generate the answer.

// Training

First, we train our model on word-level cross entropy loss and cross entropy object localization loss. Once the answer reconstruction loss converges, we switch to reinforcement learning where we optimize our model directly on the CIDEr score. We leverage the REINFORCE algorithm to define the loss and estimate the gradient. In addition to the answer CIDEr score, we incorporate a pragmatic helper reward that encourages correct reconstruction of the question from the generated answer, which further improves answer quality

// Comparisons

// Citation

@inproceedings{Dwedari_2023_BMVC,
	author    = {Mohammed Munzer Dwedari and Matthias Niessner and Zhenyu Chen},
	title     = {Generating Context-Aware Natural Answers for Questions in 3D Scenes},
	booktitle = {34th British Machine Vision Conference 2023, {BMVC} 2023, Aberdeen, UK, November 20-24, 2023},
	publisher = {BMVA},
	year      = {2023},
	url       = {https://papers.bmvc2023.org/0596.pdf}
}

Please contact Mohammed Munzer Dwedari for any feedback or questions.