CVPR 2024 | Dynamic Prompt Optimizing for Text-to-Image Generation
📯CVPR 2024 | Dynamic Prompt Optimizing for Text-to-Image Generation
type
status
date
slug
summary
tags
category
icon
password

Contribution

  1. Dynamic fine-control prompt editing framework
  1. Effective results
      • improve image aesthetics
      • ensure semantic consistency between prompts and generated images
      • align more closely with human preferences
  1. Insightful findings
      • artist names and texture-related modifiers enhance the artistic quality of generated images
      • it is more effective to introduce these terms in the latter half of the diffusion process
      • assigning a lower weight to complex terms promotes a more balanced image generation

Task

Input
  • a pre-trained text-to-image generative model
  • user input text
Output
  • a modified prompt with fine-grained control, so that the generated image, , exhibits enhanced visual effects while remaining faithful to the semantics of the initial prompt .
where indicates the append operation.

Method

Dynamic Fine-control Prompt, DF-Prompt

is coupled with an effect range and a specific weight , resulting in a triple .
  • weights the token embeddings for controlling the overall influences of token during generation.
  • is the normalized range that delineates the start and end steps during the iterative denoising process.
Define

Overview

notion image

Stage 1: Plain Prompt Refinement,

Given
  • plain input prompt
Predict
  • suffix modifiers one by one, until the model outputs the stop sign.
  • i.e. construct
    • where .

Data Selection (Construct Dataset)

  • start with a given prompt from publicly available prompt
  • is split at a division point (the first comma in ).
  • obtain the short prompt .
  • remaining tokens form the modifier set .
Define a confidence score
  • measures the image-text relevance by using pre-trained CLIP model
  • returns the aesthetic score
  • : tolerance constant
Dataset

Fine-tuning

teacher forcing method
loss on the next token

Stage 2: DF-Prompt Generation,

Given
  • initial text
Output:
  • DF-Prompt

Method: online reinforcement learning

  • initial state: initial text .
  • action space: tripartite
    • word space
    • discrete time range space
    • discrete weight space
  • policy: policy model
    • At each step of online exploration, the model selects an action , in accordance with the policy model .
  • reward funstion

    Training

    Policy model interacts with the text-to-image model (make adjustments to the text encoder module).
    Loss function
    where measures the differences between the output modifiers of the policy model and those of the initial model .
     
    2024 | FICTION: 4D Future Interaction Prediction from VideoICML 2018 | Machine Theory of Mind
    Loading...