Preface

Motivation

1. VLMs allow us to explore modality alignment

2. MM Understanding may be important for AGI

The arguments for

The arguments against

Vision Language Models