Text-to-Image Alignment

SmooSense allows users to visualize and evaluate text-to-image AI performance. Its intuitive UI utilizes word scores and visual masks to help pinpoint misalignments between the text and the image, simplifying model analysis.

This example shows human labelers' feedback to text-to-image generation. Model was given text and generated image. Human labelers were asked to highlight image areas and words having alignment issues.

Data Source

Data is excerpted from: Rapidata/text-2-image-Rich-Human-Feedback-32k

Rapidata can help you get similar human labeling for your data.

Use SmooSense to visualize your data

Word scores

Name the column such that it contains word_score
Cell values should be string, json dumps of list of word and score pair. For example:

[["seven", 2.04], ["pixelated", 0.5219], ...]

Image mask

Name your column such that it contains image_mask.
Save mask data as a grayscale png file and store url in the column.
Ensure there is a column named image_url containing the corresponding image.

Text-to-Image Alignment

Data Source

Use SmooSense to visualize your data

Word scores

Image mask

Data in this page