Standards and Best Practices in NLP Evaluation

I design rigorous evaluation guidelines for NLP, including few-shot learning and long-form summarization which won an outstanding paper award at EACL 2023 🏆.

I’ve organized community shared tasks to evaluate NLP systems for biomedical literature retrieval and understanding, including TREC-COVID and SCIVER.

I’ve also worked on standardized benchmark development for domain fit and efficiency of language models.