This seems more like a test of how one specific form of prompt writing works on several models.
This prompt seems too long for some models, such as flux where they dont recommend more than 22 tokens, and some models, like the ACTUAL most customizable model, Stable Diffusion, works better with simple words or phrases separated by commas.
A great opportunity to teach people some good general rules, like how simpler and more concise prompts lead to better adherence, or how important prompt order is, is lost because the author just wanted to cut and paste a description and see what happens. Now, they are led to think there’s only one or two models that actually work, and the rest aren’t even worth their time.
I also noticed that, besides two of the scenes being in the streets of London, the article didnt clearly express their goal. Was it how closely they followed the prompt? Because the author often complained about the lack of realism in some.
Was is how realistic they were? Because the author didn’t use any terms to generate realistic scenes in the second two prompts, and even made the mistake of using the term “realism” in both of them, which are actually bad practice for trying to generate photorealistic images, because no normal person describes an actual photo as “realistic”, therefore the models often include references of drawings and illustrations, which are what the words realism/realistic tend to be used for. It’s even to a point that it’s recommended that you put “realistic” in the negative prompt (when possible) to ensure you get photographs rather than artwork.
It’s suggested you specifically mention the words photo, such as in the first prompt, or, for more precision, a focal length like 35 mm (or even a specific brand and model of lens!), because that sort of description would only be attached to actual photos in the training set.