Standards and Best Practices in NLP Evaluation
I design rigorous evaluation guidelines for NLP, including few-shot learning and long-form summarization which won an outstanding paper award at EACL 2023 🏆.
I’ve organized community shared tasks to evaluate NLP systems for biomedical literature retrieval and understanding, including TREC-COVID and SCIVER.
I’ve also worked on standardized benchmark development for domain fit and efficiency of language models.