I do research in Natural Language Processing, a field where we teach machines to understand and produce human language. My current research interest focuses on multimodal vision-language modeling, where linguistic and visual information complement each other for models to achieve human-level intelligence. At Hopkins, I worked with Professor Benjamin Van Durme on modeling causality, semantics understanding, and probing visual commonsense. I am also interested in dialogue agents, language grounding in robotics, and NLP applications in the real world. I personally speak four languages, and I believe in the power of language technologies to bring people closer and make lives easier.