paper

Search-based Negative Prompt Optimisation for Text-to-Image Generation

Abstract

Text-to-image generative models are machine learning models that take a description written in natural language as input and generate images matching this description. As with other types of generative models, text-to-image ones tend not to be precise due to various reasons, such as hallucinations or randomness, and are influenced by the input description (a.k.a. user’s prompt). Therefore, their use might lead to images that do not fully meet user’s expectations. Prompt engineering (i.e., the process of structuring text that can be interpreted and understood by a generative model) poses a significant challenge, demanding a considerable amount of manual effort to ensure high-quality image generation. In this work, we explore the use of a local search guided by sentence similarity to optimize text-to-image generation via negative prompts. Our results suggest that by using our approach, it is possible to improve the generation process, thus obtaining more accurate images with no additional human effort.

Acknowledgements

  • UCL CS ethics committee for the user study (UCL-CSREC-205-R)
  • KNOwledge Discovery and Information Systems (KNODIS)
  • VARIATIVA: Ministry of Economy and Competitiveness (MINECO) through the Spanish National R+D+i Plan and ERDF funds under Grant PID2021‑128695OB‑I00
  • Research Group T61_23R: Gobierno de Aragón (Spain)
  • Spanish Ministry of Science and Innovation under the Excellence Network AI4Software (Red2022-134647-T)