Published on

Abstract Text Summarization Project Proposal



Abstractive Text summarization is a sub-field within the field of Natural Language Processing. Text summarization consists of taking in an inputted paragraph or sentence and summarizing it into its key components or ideas. The field of text summarization has changed since its conception the initial focus on “expert systems” and classical machine learning has now instead shifted focus to deep learning models. With this shift, an associated opportunity for teams to make new innovative models and thus literature gap has occurred.

The paper proposed by the CSS2 group on Abstractive text summarization works by extracting key ideas of a sentence and generating new sentences to represent them in a concise manner.

This is done through two mains steps; language representation and model implementation. Language implementation explores methods of transforming natural languages into a computer representation (i.e. tokenization and vectorization) whilst model implementation is the process of developing a model which can generate these sentences.


Overall, the proposed approach to tackling this problem has been excellently designed with the assistance of their project supervisor.

With the shift from expert systems to deep learning models text summarization has had a huge influx of new information and has led to multiple research gaps in a highly important and applicable field of computer science. This is especially apparent as whilst Text summarization was heavily dominated by the same group of people for multiple papers. This has changed significantly post 2017 with the rise of Google Brain and their proposed transformer-based framework for NLP. This in addition to multiple current popular models such as GloVe, Word2Vec & FastText still being recently developed has led to multiple gaps in research which can be further studied.

The proposed study aim of “To design & develop a model capable of outperforming current leaders in text summarization” has also been thoroughly thought out with it being small in scope, measurable, non-vague, and attainable within the time allocated for this studio. This paired with thoroughly explored research questions and objectives determining how they can quantitively evaluate their NLP text summarization results through ROUGE/BLEU metrics has also assisted in allowing this project to follow the scientific method.

This is further supported by the supported literature presented in their presentation. This group has contained multiple citations from multiple individuals & organizations having a balance of academic and industry-related research. Through this, the students have focused on their proposed research question and have highlighted, critiqued, and evaluated several relevant papers to their research scope. This has assisted them in finding what has gone well, proposed models pitfalls, and proposed new solutions for NLP summarization. This has further assisted the students in finding a high-quality dataset for this research study; CNN dailymail, Xsum, & Gigaword which have been carefully chosen due to their well labeled and curated list of text summarization extracts to compare with.


Conversely, whilst the proposed study is highly detailed and well designed, it suffers from several pitfalls in addition to failing to address several key questions.

Whilst the research gap has been thoroughly explained as an advent to Google Brain changing the approach of NLP text summarization models and the proposed aim is to build a unique text summarization model with several supporting papers explaining how previous methods have worked such as Masked Language Modelling (MLM) & Next Sentence Prediction (NSP). The group has failed to explain the implementation of their proposed model and how it differs from current NLP models. Whilst a technology stack such as Python using Tensorflow, Keras, Pytorch Hugging face has been implemented. A proposed algorithm has not been and this should further be explored prior to the submission of the next paper.

Additionally, to Hyper Parameter Tuning several approaches have been suggested into increasing the overall accuracy of the model, however, the metric used to evaluate models are only purely quantitative using ROUGE/BLEU.

Papers have suggested due to the nuanced nature of NLP and summarization, whilst ROUGE/BLEU are good metrics of seeing the percentage of keywords mentioned this does not evaluate readability or full context, as a mixed approach of both quantitative and qualitative is suggested. With the researchers also manually labeled their results compare to the user-written results based on a table-based metric.

Furthermore, this should additionally be added to their methodology of having incremental benchmarks and comparisons within their project.

Additionally, their project proposes a general NLP text summarization model. Being a differentiator from current models. Whilst this is a great idea (in theory), implementation of a general based model is a non-trivial problem and may be considered too difficult to implement in both a hardware sense as well as a technical sense as highly trained models such as GPT-3 are trained on millions of dollars of hardware and are fed incredible amounts of data and thus it is highly likely to fail to implement anything comparable due to financial constraints.

Additionally it is suggested to swap from a generalized text summarization model to one that focuses on a specific niche such as Academic writing, report or anything else within its own sub-field, as each are very nuanced with their own relevant Jargon & sentence structures causing issues in summarization.


Overall, the presentation was made to a high standard, going above and beyond the course requirements. It had a clearly chosen and specific research aim with a highly detailed methodology supporting and clearly explaining the approach. This combined with relevant and critical research questions and objectives have further assisted in the project, detailing the feasibility as well as scope in more depth.

The overall research gap is quite broad and the suggested topic is of high academic importance in both industry and academic research setting in addition to it also having multiple new and important breakthroughs that have not been empirically conducted yet, with most papers staying in a purely algorithmic method.

Furthermore, with additional editing in terms of further changing scope to be less broad in addition to also focusing on more methods to evaluate NLP summarization accuracy, this project has a high rate of success and is directly relevant to the field.