Andrew Hutchinson
Source: www.socialmediatoday.com, July 2024
Pinterest is developing its own AI text-to-image generation process, though Pinterest’s approach is slightly different to what you’re seeing in other apps.
As outlined in a new overview from the Pinterest Engineering team, Pinterest’s “Canvas” model aims to provide generated options for product backgrounds, without altering the product shot itself as the main focus.
Which takes a little more training. Most large language models are designed to create an image based on a description, by matching the text notes from other images to the actual visual outputs. Most product shots, however, don’t describe the background within the caption, so Pinterest’s team has had to come up with a new way to isolate the background and foreground, and then make it easy to guide the tool with simple commands.
As per Pinterest:
“Training Pinterest Canvas gives us a strong base model that understands what objects look like, what their names are, and how they are typically composed into scenes. However, as previously stated, our goal is training models that can visualize or reimagine real ideas or products in new contexts.”
So, conceptually, Pinterest is looking to use its existing database of product images to establish common framing, placement and background types, in order to better facilitate AI background generation requests.
It’s a complex approach, but Pinterest has now built a system that can do this with a high level of accuracy.
“[We] use a segmentation model to generate product masks by separating the foreground and background. Existing text captions typically describe only the product while neglecting the background, which is critical to guide the background inpainting process, so we incorporate more complete and detailed captions from a visual LLM. In this stage, we train a LoRA on all UNet layers to enable rapid, parameter efficient fine-tuning. Finally, we briefly fine-tune on a curated set of highly-engaged promoted product images, to steer the model toward aesthetics that resonate with Pinners.”
So, again, the system is specifically designed to generate backgrounds based on existing Pin images, while Pinterest has also sought to align the model around certain visual styles, in order to further simplify creation.
In the end, that should enable brands to type in whatever style they like, based on common descriptors, and Pinterest’s system will be able to provide options for your product shots in that aesthetic.
It’s an interesting concept, which Pinterest is already testing with selected ad partners.
It could be a good way to create more variations of your Pin images, and enhance your product’s appeal within different design approaches.
You can read more about Pinterest’s approach to AI background generation here.