Early Stage Researcher at CNRS
Juliette has a background in Electrical Engineering and Information Technology and a strong interest in AI and in NLP. She is based at the LORIA in Nancy where she will focus on explaining text generation models. She enjoys dancing and running.
Lorraine Research Laboratory in Computer Science and its Applications (LORIA)
Centre National de la Recherche Scientifique (CNRS)
UMR 7503, Campus Scientifique, BP 239, F-54506
Vandoeuvre-lès-Nancy Cedex, France
Explainable Models for Text Production
· Main Supervisor: Dr. Claire Gardent, LORIA – Centre National de la Recherche Scientifique (CNRS), firstname.lastname@example.org
· PhD Co-Supervisor: Dr. Albert Gatt, Institute of Linguistics and Language Technology – Università ta’ Malta (UOM)
· Inter-sectoral Secondment Supervisor: Dr. Lina María Rojas Barahona, Learning and Natural Dialogue Teams – Orange
· Inter-sectoral Secondment Supervisor: Dr. Johannes Heinecke, Learning and Natural Dialogue Teams – Orange
PhD research topic
The broad goal of this PhD thesis will be to provide explainable models of text generation which permit identifying relevant mismatches between input and output. Two text production applications will be considered: Verbalisation of Knowledge Bases (KB) and Text Summarization. While for both these tasks, the text should match the input, for summarization, it should only express the key information contained in the input. Thus different questions arise both on how to analyse semantic adequacy and on how to explain the behaviour of a generation system.
(1) A model of text production which (i) provides a clear explanation of cases where text production fails to be semantically adequate and (ii) permits distinguishing cases of failure due to biases in the data from failure cases due to an inadequate model.
(2) An evaluation of this model on standard benchmarks for KB verbalisation and Text Summarization.
(3) An explanation model based on (i) breaking up the end-to-end decoder process in several explainable substeps and generating the output text based on both the input and on these intermediate predictions and (ii) evaluation metrics used to evaluate how accurate these intermediate predictions are and how well they correlate with success.
The challenge here will be to identify relevant substeps and evaluation criteria for adapting the method to text generation. We will decompose Natural Language Generation (NLG) into some or all of the traditional NLG modules thereby allowing for a more fine grained evaluation of how neural NLGsystems can handle the various choices that need to be made to produce a well-formed text.