A New Sentence Compression Dataset and Its Use in an Abstractive Generate-and-Rank Sentence Compressor Full text

Galanis Dimitrios, Androutsopoulos Ion
In Proceedings of the UCNLG+Eval: Language Generation and Evaluation Workshop, 1—11, Edinburgh, Scotland, 2011
Abstract. Sentence compression has attracted much interest in recent years, but most sentence compressors are extractive, i.e., they only delete words. There is a lack of appropriate datasets to train and evaluate abstractive sentence compressors, i.e., methods that apart from deleting words can also rephrase expressions. We present a new dataset that contains candidate extractive and abstractive compressions of source sentences. The candidate compressions are annotated with human judgements for grammaticality and meaning preservation. We discuss how the dataset was created, and how it can be used in generate-and-rank abstractive sentence compressors. We also report experimental results with a novel abstractive sentence compressor that uses the dataset.