Open-domain question answering (ODQA) systems come with diverse challenges — ranging from resolving conflicting information to interpreting figurative expressions and representing meaning in a human-understandable form. This dissertation presents three complementary contributions toward building more robust and interpretable QA systems.
First, we investigate QA model performance on figurative language. Introducing FigurativeQA, a benchmark of yes/no questions with figurative and literal contexts, we demonstrate that popular BERT-based QA systems underperform significantly on figurative text. However, prompting-based approaches like ChatGPT with chain-of-thought reasoning can mitigate this gap, particularly when figurative contexts are automatically simplified.
Second, we present ASQ, a novel tool for automatically generating question-answer meaning representations (QMR) from Abstract Meaning Representation (AMR) graphs. ASQ enables scalable and linguistically grounded QA dataset construction, bridging traditional formal semantics with natural language interfaces. We show that ASQ-generated questions exhibit high content fidelity and overlap with existing crowd-annotated resources like QAMR.
Finally, we explore how large language models (LLMs) handle conflicting evidence in ODQA, proposing a multi-agent framework where answers generated by different models are evaluated through a verification step. Experiments using the QACC dataset and state-of-the-art LLMs (GPT-4o, Claude 4, DeepSeek-R1) reveal that model diversity enhances answer quality, though requiring explanations during verification does not always lead to improvements.
Together, these contributions advance the interpretability, robustness, and accuracy of QA systems.
Event Host: Geetanjali Rakshit, Ph.D Candidate, Computer Science & Engineering