AI application in medicine and healthcare has undergone rapid advancement over the past decade, leveraging multiple data modalities in delivering expert-level diagnosis. However, the widespread adoption of medical AI in clinical workflows faces two significant setbacks: hallucinations— where models generate spurious or unfounded outputs—and social bias, which can lead to unfair or discriminatory results. These are two pervasive concepts comprising reliability in models. In this talk, I will showcase why previously dominant methods such as model ensembling are less optimal for emerging generative tasks that require more nuanced understanding and integration of complex multimodal datasets. I will then discuss how current strategies and insights from reinforcement learning (preference optimization) and human-centered AI aim to mitigate hallucinations and bias in medical AI models. In addition, I will demonstrate how my proposed method effectively contributes to bridging the gap in medical translational research. By examining these challenges and potential solutions, I aim to emphasize why continued research and carefully designed methodologies are essential for building truly trustworthy and equitable AI-driven healthcare tools.