Recraft – the first AI model built for designers – beats top performing image generation models across multiple challenges
There’s a lot of exciting stuff going on in the AI art space. Midjourney v6, DALL-E 3 and other AI image generators are now capable of producing diverse and stunning, highly-detailed images. Creatives have a range of AI tools to choose from.
But professional designers have a very specific set of needs. They need to be able to create and work with both vector and raster images and generate multiple graphics in consistent styles. They need to follow brand guidelines and be able to iterate with a high level of control and precision.
A powerful AI art engine is not enough – professionals need an AI platform that’s built just for them, tools that support workflows to easily turn ideas into high quality branded graphics.
Midjourney v6, DALL-E 3 and others weren’t set up to perform design work out of the box. Designers, illustrators and marketing professionals that use AI in their daily work can’t work easily across most AI image generation platforms. Specifically, they don’t have much control over image generation and style outputs, which results in inconsistent output.
They also can’t follow their natural design workflow. Midjourney v6 is only available on Discord and DALL-E 3 is most often used as ChatGPT bot, which creates a choppy experience for designers. They want all the relevant tools in one place. And this is why we built Recraft from the ground up, starting with a foundation model that understands style control.
We are a team of AI researchers and engineers with deep experience in building foundation models from scratch. In the early days of Recraft we started with a strong understanding of the design workflow. We wanted the foundation model itself to fit perfectly into the workflows, so our users would have the most integrated experience possible.
As a result, Recraft is the first AI art platform that addresses the specific needs of professional designers. It allows them to:
We believed it would only be possible to create a platform like Recraft if we trained our foundation model from scratch – so we did exactly that. And now we are testing it against the best foundation models: DALL-E 3 and Midjourney v6.
We’ll dive into the benchmark study in a bit, but beforehand I wanted to explain what’s different about the Recraft model.
It would have been much easier to build Recraft on top of an existing open-source model. Why did we decide to do it the hard way and train our own model from scratch? Because existing open-source models were lacking some basic intelligence that designers need in order to satisfy quality expectations.
The first one is anatomical perfection. For example, the correct number of fingers on a human hand, or the correct number of body parts in the human body in a complex pose. If an AI foundation model has not been trained to draw a ballet dancer or a football player in every pose, no amount of fine-tuning will produce an image of an anatomically perfect ballet dancer in arabesque or a football player running across the goal line.
Professional designers need to deliver the highest quality images with perfect anatomy, not just creative pieces that look nice at a glance but are anatomically off. AI art generators traditionally struggle with human anatomy, so we knew we needed to train our own model.
As an early stage startup we wanted to preserve as much cash as possible, and large-scale training runs are expensive. We knew we needed to unlock possibilities, to improve product experience so we could turn it into revenue. Our aim was to keep moving the goalpost forward, to keep raising funding for additional training runs, so we could ultimately become a SOTA. But the first training run was key – we viewed it as our “one shot.”
With our model trained and performing well, we wanted to see how it would measure up against the SOTAs – namely, Midjourney v6 and DALL-E 3. It was done as an internal exercise, but we saw the results and knew we needed to share them with both the AI and the design communities.
We benchmarked Recraft vs. Midjourney v6, DALL-E 3, Stable Diffusion XL, and a handful of other players. As we ran the benchmark we noticed a huge gap in model quality between Midjourney v6, Dall-E 3 and the rest of the pack.
We saw that Recraft came out on top – slightly ahead of both Midjourney v6 and DALL-E 3 and significantly outperformed everyone else.
A detailed comparison of the benchmark study is below. It wasn’t possible to include everyone, since we’re using a huge dataset to draw comparisons and have limited resources, but we're sharing our methodology so other companies can provide their inference results. We will be happy to add them to the benchmark.
To conduct an accurate evaluation that compares Recraft with other models we utilized the PartiPrompts dataset (Cornell University, GitHub), a community standard evaluation benchmark that comprises 1632 English prompts spanning diverse categories and challenging aspects. Each prompt in the benchmark is associated with two labels: Category and Challenge.
Category indicates a broad group that a prompt belongs to, and tells you what the prompt is about. Some examples of Categories are "Indoor Scenes," "Food and Beverages," "Illustrations" and "People."
Challenge highlights an aspect which makes a prompt difficult to understand. Some examples of challenges are "Quantity," "Writing & Symbols" and "Fine-grained Detail."
The dataset mostly contains prompts that present strong challenges for the current best models.
This benchmark study compares Recraft with Midjourney v6, DALL-E 3 and the best open-source models: Stable Diffusion XL, Stable Cascade and Playground v2.5.
DALL-E 3
DALL-E 3 is available on different platforms, such as ChatGPT, Bing Image Creator and on the OpenAI API. OpenAI API has two essential parameters: style (either ‘natural’ or ‘vivid’) and quality (‘standard’ or ‘HD’). Since no information is provided about the settings used by ChatGPT and Bing Image Creator, we assessed each of them independently to ensure a comprehensive and accurate comparison.
Midjourney v6
Midjourney v6 performance was inferred through the Midjourney Discord bot. The number of parameters for Midjourney v6 and DALLE-3 is unknown, but the parameters of the other models we benchmarked are below.
Stable Diffusion XL has 2.6 billion parameters. It was configured with the default parameters 50 steps, guidance_scale=5, base network+refiner.
Playground v2.5 has 2.6 billion parameters, the same number as Stable Diffusion XL. It was configured with the default parameters 50 steps, guidance_scale=3.
Stable Cascade has 5.1 billion parameters. It was configured with the default parameters 20 prior steps, 10 decoder steps, guidance_scale=4.
Recraft has 20 billion parameters. The model is an order of magnitude bigger and more powerful than the open-source models.
As Recraft does not provide access to a “raw” model without a style specification, we employed the default settings under a Recraft style called ‘Photorealism,’ which is the most similar to the other platforms’ default style.
We started by generating four 1024x1024 images for each prompt, using each of the models, which resulted in a total of 6528 images for every model.
We then conducted a user study comparing Recraft with other models in which 1435 assessors participated.
For each sample, assessors were presented with a prompt and two images in a random order. They were tasked with selecting either "Image A is better" (assigning +1 for model A), "Image B is better" (assigning +1 for model B) or "Same" (assigning +0.5 for each model).
We followed a set of established general practices [1,2,3] to guide assessors and to govern the evaluation process. Assessors were asked to adhere to a set of criteria encompassing anatomical and scene structure correctness, image-prompt alignment, detail and texture quality and aesthetic preference.
In order to allow others to repeat this benchmarking exercise, we are publishing all the images generated by each model that we used in our comparison. We’d love to see more benchmarks follow, so publishing these images will make it more affordable for companies to repeat the study.
In the graph below, the bar indicates Recraft’s overall performance in a pairwise image comparison vs. the model indicated at left. The model that gets more than 50% wins. If a bar is green, it means Recraft won more than 50%; if it is gray, then Recraft was outperformed by the model indicated at left.
Comparison results are available in this Recraft Comparison spreadsheet for your reference. Overall results indicate that Recraft demonstrates comparable performance to DALL-E 3 and surpasses Midjourney v6 and all other models by a significant margin.
The PartiPrompts dataset is categorized into 11 Challenges. Each Challenge indicates why a prompt is considered difficult. When compared to the benchmarked open-source models, Recraft outperformed all of them on every single Challenge.
When compared to Midjourney v6 and DALL-E 3, Recraft performed better than both on some Challenges and was outperformed on others.
Recraft demonstrates the best performance in these Challenges:
Recraft demonstrates an ability to understand more complex scene descriptions in terms of positioning of objects, perspective and relationships between objects.
Below is an example from the Challenge “Quantity" that demonstrates Recraft’s ability to accurately represent the quantities and the spatial arrangement:
And here is an example from Challenge “Perspective" that shows Recraft’s ability to get perspective right vs. all others:
And another example from the Challenge “Basic” that demonstrates Recraft outperforming others on the basic prompt '101':
In the Challenges “Writing & Symbols,” “Linguistic Structures,” and "Fine-grained Detail"DALL-E 3 outperforms Recraft, but Recraft outperforms Midjourney v6.
An example of a Recraft win from the Challenge “Writing & Symbols”:
It’s worth mentioning that Recraft is not even optimized to generate text in images yet. We are working actively to improve this functionality as it is so essential to the work of designers and marketers.
Here’s an example of a Recraft win vs. all others on the Challenge "Linguistic structures":
The word "without" adds complexity in relation to objects, and only Recraft and DALL-E 3’s ‘vivid’ model are able to represent it accurately.
The DALL-E 3 result highlights something else that some models tend to do, which is to crop images. This is often the case with both DALL-E 3 and Midjourney v6.
Have a look at the nuances in the challenge "Fine-grained Detail":
In the Challenges "Complex," "Imagination" and "Style & Format," Recraft was outperformed by both Midjourney v6 and DALL-E 3.
It’s important to note that many of the prompts in those 3 Challenges specify a style. Because Recraft is a tool for designers, Recraft handles style differently than other models. Style can be specified by uploading a single or multiple images, creating a hybrid from different images or selecting from a comprehensive library of styles that have been selected expressly for graphic design projects.
The PartiPrompts dataset was also arranged into Categories, with Recraft demonstrating better performance than Midjourney v6 in all Categories except “Arts” and “Animals”.
In the graph below, the bar indicates Recraft’s performance in a pairwise image comparison vs. Midjourney v6. Recraft won more than 50% in all but 2 Categories.
The Category “Arts” has a lot of prompts that involve complex styles. As mentioned, Recraft handles style by allowing designers to import styles, create their own hybrid styles or select from the Recraft style library. This methodology was not able to showcase the scope of Recraft’s style capabilities.
And we definitely plan to investigate what's happening with animals!
In the Categories benchmark vs DALL-E 3, Recraft outperforms DALL-E 3 for all Categories from DALL-E 3’s ‘natural’ model.
When compared with DALL-E 3 ‘vivid HD’ style, Recraft shows better performance in 7 out of 12 categories. The biggest difference is again in “Arts” and “Animals”.
Another observation we have made (examples below) is that all models except Recraft frequently generate visual distortions, including cropped images, images with text signatures and bordered images. Midjourney V6 often generates images with text signatures, and DALL-E 3 generates cropped and bordered images. Open-source models show both problems.
We made every effort to ensure an objective comparison in this benchmark study, and therefore want to comment on a few things that may have had an impact on results and that may result in different outcomes for users on these models going forward.
Style preferences
Every model has its default style. While assessors were encouraged to judge images according to a clear set of criteria, it cannot be overlooked that assessor style preference may have played a part in image selection.
Prompt engineering
In this comparison, prompts from the PartiPrompts dataset were utilized in their original form without any additional prompt engineering. It's important to note that some models may require complex prompt engineering to get the best results for specific prompts.
Extended LLM functionality
Additionally, certain commercial methods, such as DALL-E 3, might employ language models (LLMs) that are capable of rewriting a prompt before it is passed to the text-to-image model.
Considering the complexity of including all such factors and selecting prompt engineering tailored to each method, we opted for an "end-to-end" comparison, where original prompts were used for all methods.
Since this benchmark study limits the focus to specific performance criteria without providing an overall understanding of what’s possible with Recraft, we wanted to point to a few things Recraft does exceptionally well.
High quality imagery
Recraft is trained to provide outstanding image quality. It is capable of producing human form in complex poses with accurate anatomy down to minute details, producing elaborate scenes and environments and understanding complex relations between objects.
Image set creation
Recraft can generate multiple images in the same style. This means designers can easily create image or icon sets with a coherent look and feel without any style prompt engineering.
Style creation
Recraft users have full control of style. Not only do they have a rich and continually expanding library of styles to choose from, they can even upload their own image and create as many visuals as they want in that style. They can upload images in a specific brand style or create an experimental style by mixing styles from multiple images.
Vector art creation
Unlike most AI tools, Recraft excels at generating vectors that designers can easily iterate. Recraft also creates raster images and photography so designers have total creative freedom.
Control and precision
Recraft lets designers work flexibly, generating images in their own styles, with precise branded color palettes. Tools are intuitive and allow for a high level of control with a simple slide, click or drag. Backgrounds or parts of the image are easily removed or modified. An image can be iterated to be simpler or more granular or used as a basis for a new image that’s between 0%-100% similar. Everything can be upscaled to achieve better image quality or increase level of detail.
Brand adherence
Recraft is designed to make it easy to create visuals that adhere to brand guidelines. Designers can specify a brand style they want illustrations or icons to follow, and can provide hex colors to generate visuals in a specific brand palette.
We are constantly fine-tuning, upgrading and expanding functionality. Recrafters are always invited to give us their feedback and feature requests, to tell us what they love or don’t love about the Recraft experience.
We ship updates at least weekly, with multiple enhancements and new features. Some are big and some small, but all noticeably contribute to a smoother, more user-friendly product that helps professional AI designers, illustrators and marketers do their work exceptionally well.