Musketeer (All for One, and One for All): A Generalist Vision-Language Model with Task Explanation Prompts

05/11/2023
by   Zhaoyang Zhang, et al.
0

We present a sequence-to-sequence vision-language model whose parameters are jointly trained on all tasks (all for one) and fully shared among multiple tasks (one for all), resulting in a single model which we named Musketeer. The integration of knowledge across heterogeneous tasks is enabled by a novel feature called Task Explanation Prompt (TEP). TEP reduces interference among tasks, allowing the model to focus on their shared structure. With a single model, Musketeer achieves results comparable to or better than strong baselines trained on single tasks, almost uniformly across multiple tasks.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset