NLP for Sensemaking over Large Collections
I’ve published some of the largest gold standard datasets for training and evaluating language models on scientific literature understanding tasks, including:
- SciTLDR for summarization,
- SciFact for claim verification,
- Qasper for question answering,
- MultiCite for citation discourse understanding.
Recently I’ve been interested in helping humans re-find documents they’ve seen before, even when they don’t remember identifying details.