CVPR 2024 | Dynamic Prompt Optimizing for Text-to-Image Generation

type

status

date

slug

summary

Contribution

Dynamic fine-control prompt editing framework

Effective results

improve image aesthetics

ensure semantic consistency between prompts and generated images

align more closely with human preferences

Insightful findings

artist names and texture-related modifiers enhance the artistic quality of generated images

it is more effective to introduce these terms in the latter half of the diffusion process

assigning a lower weight to complex terms promotes a more balanced image generation

Task

Input

a pre-trained text-to-image generative model

user input text

Output

a modified prompt with fine-grained control, so that the generated image, , exhibits enhanced visual effects while remaining faithful to the semantics of the initial prompt .

where indicates the append operation.

Method

Dynamic Fine-control Prompt, DF-Prompt

is coupled with an effect range and a specific weight , resulting in a triple .

weights the token embeddings for controlling the overall influences of token during generation.

is the normalized range that delineates the start and end steps during the iterative denoising process.

Define

Overview

Stage 1: Plain Prompt Refinement,

Given

plain input prompt

Predict

suffix modifiers one by one, until the model outputs the stop sign.

i.e. construct

where .

Data Selection (Construct Dataset)

start with a given prompt from publicly available prompt

is split at a division point (the first comma in ).

obtain the short prompt .

remaining tokens form the modifier set .

Define a confidence score

measures the image-text relevance by using pre-trained CLIP model

returns the aesthetic score

: tolerance constant

Dataset

Fine-tuning

teacher forcing method

loss on the next token

Stage 2: DF-Prompt Generation,

Given

initial text

Output:

DF-Prompt

Method: online reinforcement learning

initial state: initial text .

action space: tripartite

word space
discrete time range space
discrete weight space

policy: policy model

At each step of online exploration, the model selects an action , in accordance with the policy model .

reward funstion

Training

Policy model interacts with the text-to-image model (make adjustments to the text encoder module).

Loss function

where measures the differences between the output modifiers of the policy model and those of the initial model .