This article is also available in:

🖌️ AI Generation Creator

🖌️ AI Generation — Creator Side

Celestory combines the power of nocode (creating applications without writing code) and vibe coding (describing what you want to create in plain language and letting the AI generate the code) to quickly test, compare, and iterate on the best Artificial Intelligence models on the market. You can integrate and evaluate a wide range of models — including sovereign models (developed in Europe or open source, independent of US tech giants) like those from Mistral or Flux — directly from the Cloud interface, then download them locally once validated for production execution.

The AI Generation menu is accessible from the editor and is structured into 5 tabs:

Tab	Description
📝 Text	Conversational chat with LLMs (Large Language Models: AI models capable of understanding and generating text) for text generation
🖼️ Image	Image generation and editing via prompts
🎬 Video	Creation of videos from text, images, or existing videos
🔊 Audio	Speech synthesis, music, sound effects, and dialogues
📋 Previous generations	Complete history of all your past generations

At the top of each tab, a quota counter displays your AI credit consumption in real time (e.g., 45 / 200 generations), with a button to add AI credits.

📝 Text Tab

The Text tab offers an integrated conversational chat to interract directly with language models (LLMs — Large Language Models: AI programs trained to understand and generate text, like very advanced virtual assistants). This is the heart of textual AI generation in Celestory: you can co-write content, generate code, write narrative scripts, or query your project data naturally.

How does it work?

You enter a message in the chat field. The model receives:

The history of your previous exchanges (to maintain context)
Optionally text variables from your project attached to the message
Your current prompt (the instruction or question you send to the AI)

It generates a response that displays progressively in the chat, rendered in Markdown (a lightweight text formatting format: # Title, **bold**, - list → displays cleanly as rich text) in real time.

Chat Interface

Left panel: List of your previous conversations, automatically named. You can resume an existing conversation or create a new one.
Right panel: Chat area with model selection, token counter, message history, and input field.

Key Features

Model Selection: Choose from available models, grouped by provider.
Token Counter: The interface displays in real time the number of tokens (units of text that the AI processes — one token ≈ ¾ of a word in English; "hello" = 1 token, a 10-word sentence ≈ 13 tokens) used by the conversation history compared to the context window (maximum amount of text the model can "remember" in a single conversation) of the selected model (e.g., 12,530 / 256,000 tokens).
Input Estimation: Before sending a message, the number of additional tokens your input will consume is displayed (e.g., + 842 tokens).
Text Variable Attachment: You can attach text variables (text content stored in your Celestory project, such as a character, a game rule, or a style guide) from your project to the prompt. Each attached variable displays its token weight. This allows you to dynamically inject project content into the AI request.
Markdown Preview: AI responses are rendered in Markdown (text formatting format: # titles, **bold words**, and - lists display like in a Word document) in real time, ideal for visually validating structured documents.
History Management: Deletion of individual messages, reloading of saved conversations.

Use Cases

Use Case	Description
Vibe Coding (HTML5 Block)	Attach your HTML/CSS style guide as a variable, describe the desired interface, then copy the code into an HTML5 block in your graph. Iterate in conversation until the result is achieved.
Narrative Script Generation	Create dialogues for NPCs (Non-Player Characters: characters controlled by the game, not the player), scene descriptions, or narration texts directly from the chat, then save them as text variables.
MD Documentation	Generate structured Markdown documents (guides, sheets, game rules) by attaching your methodology as a variable. The MD preview instantly validates the rendering.
Project Data Analysis	Attach a variable containing your JSON or text data (e.g., level list, stats), ask questions to the AI to extract insights.
Content Translation	Pass your game texts (variables) to the model to translate them into several languages, conversation by conversation.
Model Comparison	Ask the same question to Mistral, Gemini, and GPT-5 Mini in three separate conversations to compare responses and choose the most suitable model.

Available Text Models

Provider	Model	Context Window (max conversation memory)
Mistral 🇫🇷	`mistral-large-2512`	256,000 tokens
Mistral 🇫🇷	`ministral-3b-2512`	256,000 tokens
Google	`gemini-3-flash-preview`	1,000,000 tokens
OpenAI	`gpt-5-mini`	400,000 tokens
Anthropic	`claude-haiku-4.5`	200,000 tokens
Meta	`llama-4-maverick`	1,000,000 tokens
xAI	`grok-4.1-fast`	2,000,000 tokens

Mistral models (sovereign French models) and Llama (Meta, open source) can be downloaded after validation for local execution.

🖼️ Image Tab

The Image tab allows for the generation and editing of images through textual descriptions, with two modes of operation.

Text to image Mode

How does it work? You describe the image you want to create in the prompt field. The model generates an entirely new image from scratch, without a source image. The more precise the prompt (style, colors, composition, atmosphere), the more the result will match your vision.

Model	Ratios/Sizes	Features
Nano Banana	1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, 4:5, 5:4, 21:9	Up to 4 images, PNG/JPG/WebP formats
Flux 2 Klein 4B	square, square_hd, portrait_3_4, landscape_4_3...	PNG/JPG/WebP formats
GPT Image 1.5	auto, 1024×1024, 1024×1536, 1536×1024	Transparent/opaque background, low/medium/high quality, low/high input fidelity, up to 4 images

Use Cases:

🎨 Game Assets: Generate background scenes, characters, objects, or interface icons tailored to your project's universe, without needing an illustrator.
🖼️ Quick Variants: Produce 4 versions of the same asset in one click to choose the best result before integration.
📖 Narrative Illustrations: Create visuals to punctuate key moments in an interactive story or training module.
🎭 Moodboards / Visual Prototyping: Quickly test different artistic directions before validating the graphic direction of a project.
🏷️ Images with Transparent Backgrounds (GPT Image 1.5): Generate interface elements (sprites, stickers, badges) with transparent backgrounds, directly integratable into Celestory.

Image to image Mode (Image Editing)

How does it work? You provide one or more reference images and a prompt describing the changes to be made. The model relies on your existing images to generate a result that preserves their structure, style, or content, while applying your textual instructions.

Model	Additional Features
Nano Banana / edit	Multiple reference images, same ratios as text-to-image
Flux 2 Klein 4B / edit	Multiple reference images
GPT Image 1.5 / edit	Multiple reference images, optional mask (inpainting : targeted retouching of an image area), transparent background, adjustable quality

Use Cases:

🖌️ Asset Reskinning: Take an existing character and apply a new style (e.g., transform a hero into a winter version or a pixel art version) without starting from scratch.
✏️ Targeted Inpainting (retouching a specific area of an image without touching the rest) (GPT Image 1.5 only): Paint a mask (a gray/black area drawn on the image to tell the AI what to modify) over a specific area of the image (e.g., a character's face, the background of a scene) and ask the AI to regenerate only that area according to a prompt, leaving the rest of the image intact.
🧩 Style Fusion: Provide multiple reference source images (e.g., an illustration with a graphic style + a photo) to create a result that combines both aesthetics.
🔄 Series Consistency: Generate several variants of the same character (different poses, expressions, outfits) while keeping the character's reference images to maintain visual consistency.

Common Parameters

Prompt (the textual instruction you give to the AI): Description of the desired image.
Ratio / Size: Output format of the image (e.g., 16:9 for landscape, 9:16 for mobile portrait, 1:1 for square).
Number of Images: Batch generation (up to 4).
Output Format: PNG (max quality, transparent background possible), JPG (lightweight, good for photos), or WebP (lightweight, supports transparency).
Reference Images: In editing mode, possibility to provide several source images for the AI to draw inspiration from.
Mask Image: Area drawn on the image to define the area to be modified (inpainting).
Advanced Mode: Free entry of the exact model name and JSON (structured data format) parameters for full control.

🎬 Video Tab

The Video tab offers 6 generation modes to cover all AI video creation cases, from quick prototyping to enriched content production.

Text to video Mode

How does it work? You describe a scene in a prompt (the text instruction you give to the AI: action, atmosphere, style, camera movement…) and the model generates an entirely synthetic video from scratch. The negative prompt (what you do NOT want to see in the generation) allows you to exclude undesirable elements (e.g., « no text, no blur »).

Model	Durations	Options
LTX-2 Fast	6 to 20 seconds	Optional audio, 1080p/1440p/2160p, 25/50 FPS (frames per second: 25 = cinema, 50 = fluid video), negative prompt
Veo 3.1 Fast (Google)	4s, 6s, 8s	Optional audio, 9:16 / 16:9, 720p/1080p, negative prompt
Kling Video v2.6 Pro	5s, 10s	Optional audio, 16:9 / 9:16 / 1:1, negative prompt
Wan v2.6 🇨🇳	5s, 10s, 15s	Reference audio, 5 ratios, 720p/1080p, negative prompt

Use Cases:

🎬 Game Cinematics: Generate intro or transition sequences for narrative games without pre-existing video resources.
📱 Vertical Content (9:16): Create micro-animations for mobile experiences or short formats (social media, ads).
🏫 Training Materials: Dynamically illustrate abstract concepts (e.g., a video on the water cycle for a pedagogical module).
🌍 Atmosphere Prototyping: Quickly test different visual atmospheres (style, colors, pace) before validating the direction of a project.

Image to video Mode

How does it work? You provide a still image and describe the movement or evolution you want to give it. The model "animates" your image while respecting its composition and content. This is ideal for bringing existing graphic assets to life.

Model	Durations	Options
LTX-2 Fast	6 to 20 seconds	Optional audio, reference image, 25/50 FPS
Veo 3.1 Fast	4s, 6s, 8s	Optional audio, ref. image, negative prompt
Wan v2.6	5s, 10s, 15s	Reference image, 720p/1080p

Use Cases:

✨ Illustration Animation: Bring an illustrated game character to life (e.g., make a dragon's wings flap, make a hero's hair float in the wind).
🏞️ Animated Backgrounds: Transform a static background scene (forest, futuristic city) into a moving video background to enrich immersion.
📸 Animated Portraits: Add slight animation to a character portrait for dialogue screens or menus.
🖼️ Transition Between Scenes: Start from a screenshot of the end of a scene to generate a fluid transition to the next scene.

Start and end frame Mode

How does it work? You provide two images: the first frame (still image, equivalent to a "photo" from a video) (start) and the last frame (end) of the video. The model automatically generates all intermediate frames to create a fluid transition between the two. You control the starting and ending points, and let the AI invent the path.

Model	Options
Veo 3.1 Fast	Start + end image, 4s/6s/8s, optional audio, 720p/1080p
Kling Video O1 Standard	Start + end image, 5s/10s
Kling Video O1	Start + end image, 5s/10s
Wan FLF2V	Start + end image, 5s, 480p/720p, negative prompt

Use Cases:

🔄 Character Morphing: Show the transformation of a character (e.g., a frog becoming a prince) by defining the initial and final states.
🚀 Storyboard Transitions: In the pre-production phase, generate intermediate content between two key frames of a storyboard.
🗺️ Map Zooms: Start from a global view (frame 1) and arrive at a specific point on the map (frame 2) with an automatically generated zoom.
⚡ Reveal Effects: Switch from an empty screen (start) to a full scene (end) for dramatic intros.

Image and Audio to Video Mode (AI Avatar)

How does it work? You provide a portrait image (photo or illustration of a face) and an audio file (voice, dialogue, song). The model synchronizes lip movements and facial expressions in the image with the audio to create a video of a character speaking or singing realistically.

Model	Usage
Kling Video AI Avatar v2	Portrait image + audio → speaking avatar video
ByteDance OmniHuman v1.5	Image + audio, 720p/1080p

Use Cases:

🎙️ NPC Video: Create non-player characters that speak directly to the player in an ultra-immersive way, from a simple illustration + a TTS audio file generated in the Audio tab.
📺 AI Presenters: Generate a presenter avatar for training modules or onboarding videos without needing a real camera.
🌍 Video Localization: Use translated audio to make the same avatar "speak" in several languages, with automatic lip-syncing.
🎭 Dubbing Existing Assets: Take an existing character illustration and make it "speak" by synchronizing recorded or TTS-generated dialogue audio.

Video to Video Mode (Extension / Retake)

How does it work? You start from an existing video that you want to modify or extend. The model analyzes your source video and generates either a natural continuation (extension) or an alternative version from a defined starting point in the video (retake).

Model	Usage
Veo 3.1 Fast / extend	Extension of existing video by +7 seconds
LTX-2 / retake	Regeneration from a starting point in the video (startTime)

Use Cases:

➕ Cinematic Extension: Lengthen a video that is too short by naturally extending the action or animation already underway.
✏️ Video End Correction: If the last part of a video is not suitable, define a cut-off point and regenerate the end alternatively.
🔁 Loop Creation: Generate a continuation of a video that can be seamlessly chained with the beginning to create looping animations.

Motion Control

How does it work? You combine a character image and a motion reference video (video of a real person moving). The model applies the movements from the reference video to your character from the image, creating an animated video where your character reproduces the captured gestures exactly.

Model	Options
Kling Video v2.6 / motion-control	Image + motion video, character orientation (image or video), 5s/10s durations

Use Cases:

💃 Character Animations: Make a game character dance, run, or gesture by applying real motion captures.
🥋 Action Scenes: Use reference videos of martial arts or sports to animate character fights.
🎓 Educational Characters: Have a trainer avatar perform didactic gestures (pointing, showing, greeting) in an e-learning module.

Common Video Parameters

Prompt: Textual description of the video.
Negative prompt: Elements to exclude from the generation.
Duration: Variable according to the model.
Aspect ratio: 16:9, 9:16, 4:3, 1:1...
Resolution: 480p to 2160p according to the model.
FPS: 25 or 50 (LTX-2 models).
Audio: Optional, mandatory, or by reference file.
Start / end image(s): To guide video generation.
Advanced Mode: Custom JSON parameters.

🔊 Audio Tab

The Audio tab offers 4 sound generation modes based on ElevenLabs to cover all needs: voice, musical atmosphere, effects, and dialogues.

Text to Speech Mode (TTS)

How does it work? You enter text in the prompt and select a voice from the 20 available. The model generates an audio file of the chosen voice reading the text, with fine control over stability, similarity, and speaking speed.

Model	Parameters
ElevenLabs TTS Turbo v2.5	20 voices (Aria, Roger, Sarah, Laura, Charlie, George, Callum, River, Liam, Charlotte, Alice, Matilda, Will, Jessica, Eric, Chris, Brian, Daniel, Lily, Bill), stability, similarity boost, speed (0.7–1.2×), language code

Use Cases:

🎙️ Scene Narration: Generate the voiceover that accompanies transitions or key moments in a narrative game.
💬 NPC Dialogues: Produce audio dialogue lines for each non-player character, with a different voice for each character.
📚 Accessibility: Generate audio versions of texts to read to make a training session or experience accessible to visually impaired people.
🌍 Multilingual Dubbing: Couple with translated text generation to produce NPC audios in multiple languages with the same voice.

Music

How does it work? You describe in text the style, atmosphere, and emotion of the music you want (e.g., « epic action music for a fight scene, orchestral, crescendo tension »). The model generates an original musical track matching your description.

Model	Parameters
ElevenLabs Music	Prompt text describing the desired music

Use Cases:

🎮 Game Soundtracks: Create unique background music for each zone, scene, or player's emotional state.
📖 Narrative Music: Accompany reading or narration sequences with background music tailored to the story's tone.
🏋️ Training Ambiance: Generate light, non-distracting background music for e-learning modules.
🎯 Audio Prototyping: Quickly test several musical directions for a project before commissioning a final composition.

Sound Effects

How does it work? You describe the desired sound effect in text (e.g., « distant thunder sound approaching », « mechanical button click »). You control the duration (from 0.5 to 22 seconds) and prompt influence (from 0 = very free to 1 = very faithful). The loop option allows for the creation of seamless loop effects.

Model	Parameters
ElevenLabs Sound Effects v2	Duration (0.5–22 seconds), prompt influence (0–1), loopable

Use Cases:

💥 Interaction Effects: Generate tailor-made feedback sounds (button click, validation, error, alert) for your experience's interfaces.
🌊 Looping Audio Environments: Create background sounds (rain, forest, crowd, sea) to be activated in a loop to immerse the player in an environment.
⚔️ Action Effects: Hits, explosions, magic, creaking doors — quickly generate all the contextual sounds of a game scene.
🔔 Notification Sounds: Produce custom alert, success, or victory sounds for player reward moments.

Multi-voice Dialogue

How does it work? You use a specialized Rich Text editor to compose a conversation between several characters. Each text segment can be assigned a different voice from the 20 available. The model generates a continuous audio file that chains all the lines with the corresponding voices, faithfully reproducing the interaction.

Model	Parameters
ElevenLabs Text-to-Dialogue v3	Multi-voice rich editor, 20 voices assignable per line, stability, speaker boost, language code

Use Cases:

🗣️ Ready-to-use Dialogue Scenes: Generate a complete conversation between two NPCs without having to edit audio files separately — the result is a single chained audio file.
🎭 Audio Cutscenes: Produce dramatic audio sequences with multiple characters for narrative-heavy games, without real dubbing actors.
📻 Podcasts / Microlearning: Create educational dialogues between a fictive trainer and learner to animate training modules engagingly.
🌍 Multilingual Dialogues: With the language code parameter, force each voice to speak in the target language for localized productions.

📋 Previous Generations Tab

This tab gives access to the complete history of all your AI generations (text, image, video, sound). You can:

Consult past results
Filter by type (text, image, video, audio) and source (editor or in-game)
Paginate the history to navigate through older generations
Find and reuse outputs already generated

💡 Best Practices: Competency Prompts and Monitoring

Constraining the Model with Competency Prompts

To get the best results, it is essential to constrain the AI model by providing precise instructions and best practices. This is called a competency prompt: a reference document (style guide, methodology, business rules) that you attach to your requests to guide generation.

Concrete Examples:

Vibe Coding (HTML5 Block): When you use AI generation to create interfaces via the HTML5 block, attach your HTML/CSS style guide to the prompt. The model will then respect your colors, typographies, and components.
Markdown Document Generation: When you create text documents via the MD preview of the Text menu, attach a Markdown documentary methodology (type structure, titling conventions, glossary) so that the AI produces documents compliant with your standards.

Thanks to the system of text variables attachable to the chat, you can store your style guides and methodologies as project variables and inject them with one click into each AI conversation.

Major Use Case: Vibe Coding with the HTML5 Block

One of the most powerful use cases of Celestory is vibe coding:

Open the AI chat in the Text tab.
Attach your HTML/CSS style guide as a text variable.
Describe the desired interface, animation, or component in natural language.
Copy the generated code into an HTML5 block of your graph.
Instantly view the result in the nocode engine.
Iterate through AI conversations until the desired result is achieved — without manually writing a single line of code.

Token Monitoring by Model

Celestory displays the token counter per model in real time:

History Tokens: how many tokens your conversation has already consumed vs. the model's context window.
Input Tokens: how many tokens your next message will add (including attached variables).
Generation Quota: total number of generations used out of your monthly quota.

Monitor these metrics to:

Optimize your prompts and reduce consumption.
Choose the right model for the quality/cost ratio of your use case.
Anticipate quota overages before a production launch.

Advanced Mode

For each generation type (image, video, audio), an advanced mode allows for manual entry of:

The exact model name (to access models not listed).
Parameters in JSON format for full control over the request sent to the API.

Updated on: 04/03/2026

Was this article helpful?

Thank you!

🖌️ AI Generation Creator

🖌️ AI Generation — Creator Side

AI Generation Menu

📝 Text Tab

How does it work?

Chat Interface

Key Features

Use Cases

Available Text Models

🖼️ Image Tab

Text to image Mode

Image to image Mode (Image Editing)

Common Parameters

🎬 Video Tab

Text to video Mode

Image to video Mode

Start and end frame Mode

Image and Audio to Video Mode (AI Avatar)

Video to Video Mode (Extension / Retake)

Motion Control

Common Video Parameters

🔊 Audio Tab

Text to Speech Mode (TTS)

Music

Sound Effects

Multi-voice Dialogue

📋 Previous Generations Tab

💡 Best Practices: Competency Prompts and Monitoring

Constraining the Model with Competency Prompts

Major Use Case: Vibe Coding with the HTML5 Block

Token Monitoring by Model

Advanced Mode