Tutorial: How to write structured prompts with JSON from scratch for everything you want
Example: opening movie sequences with Veo3
Step 1
Start from a simple structure. If you ask chatbot (kimi 2 or Chatgpt) just to write you JSON it will put a lot of fields that you do not need, such as resolution, sometimes duration. So ask it to make fields that you need.
I usually start from asking for "prompt", "camera", "style" for each shot if it is a sequence
Step 2
Try the easiest JSON prompt you have got first before adding and iterating on it. This is similar to writing code, this way it will be easier to fix a problem.
Step 3
Keep asking Kimi 2 / ChatGPT to change particular parts but keep an eye on it. Make sure nothing is added to it.
JSON and screenshots in comments 👇
Few of you asked me:
Is there an advantage of writing JSON / YAML / XML prompts vs plain English (another human language) prompts?
What is better JSON or YAML?
I will mention it again that either you use JSON, YAML, XML or other markup languages, at the moment of writing there`s no special treatment (no interpreter program built in into AI generators, be that Veo3 or any other). It all is treated as text, so yes, you can write prompts in English instead of this.
The whole idea of using JSON or other markup language is to have a structure and make it easier for a complex system (if you are making the whole long sequence in Flow for example, it becomes easier to organize).
I talked to few people who work with clients and use these in complex systems (for a long ad or a for a short film, and they find it`s easier to organize, people include enough detail and don`t put anything else that often lead to more hallucinations)
In my past job I worked as a coder, so I checked if current video generators understand prompt in JSON or YAML as something beyond just text and they treat it just like any other text. It means that all those brackets in JSON are counted as tokens, so if you care about length of your prompt bear it in mind. JSON is more widely known, YAML is shorter so if you care about number of tokens in your prompt it may be a better option.
If you prefer English, just use only necessary words and put enough detail to your prompt.