๐๏ธ AI Generation Creator
๐๏ธ AI Generation โ Creator Side
Celestory combines the power of nocode (creating applications without writing code) and vibe coding (describing what you want to create in plain language and letting the AI generate the code) to quickly test, compare, and iterate on the best Artificial Intelligence models on the market. You can integrate and evaluate a wide range of models โ including sovereign models (developed in Europe or open source, independent of US tech giants) like those from Mistral or Flux โ directly from the Cloud interface, then download them locally once validated for production execution.
AI Generation Menu
The AI Generation menu is accessible from the editor and is structured into 5 tabs:
Tab | Description |
|---|---|
๐ Text | Conversational chat with LLMs (Large Language Models: AI models capable of understanding and generating text) for text generation |
๐ผ๏ธ Image | Image generation and editing via prompts |
๐ฌ Video | Creation of videos from text, images, or existing videos |
๐ Audio | Speech synthesis, music, sound effects, and dialogues |
๐ Previous generations | Complete history of all your past generations |
At the top of each tab, a quota counter displays your AI credit consumption in real time (e.g., 45 / 200 generations), with a button to add AI credits.
๐ Text Tab
The Text tab offers an integrated conversational chat to interract directly with language models (LLMs โ Large Language Models: AI programs trained to understand and generate text, like very advanced virtual assistants). This is the heart of textual AI generation in Celestory: you can co-write content, generate code, write narrative scripts, or query your project data naturally.
How does it work?
You enter a message in the chat field. The model receives:
- The history of your previous exchanges (to maintain context)
- Optionally text variables from your project attached to the message
- Your current prompt (the instruction or question you send to the AI)
It generates a response that displays progressively in the chat, rendered in Markdown (a lightweight text formatting format: # Title, **bold**, - list โ displays cleanly as rich text) in real time.
Chat Interface
- Left panel: List of your previous conversations, automatically named. You can resume an existing conversation or create a new one.
- Right panel: Chat area with model selection, token counter, message history, and input field.
Key Features
- Model Selection: Choose from available models, grouped by provider.
- Token Counter: The interface displays in real time the number of tokens (units of text that the AI processes โ one token โ ยพ of a word in English; "hello" = 1 token, a 10-word sentence โ 13 tokens) used by the conversation history compared to the context window (maximum amount of text the model can "remember" in a single conversation) of the selected model (e.g.,
12,530 / 256,000 tokens). - Input Estimation: Before sending a message, the number of additional tokens your input will consume is displayed (e.g.,
+ 842 tokens). - Text Variable Attachment: You can attach text variables (text content stored in your Celestory project, such as a character, a game rule, or a style guide) from your project to the prompt. Each attached variable displays its token weight. This allows you to dynamically inject project content into the AI request.
- Markdown Preview: AI responses are rendered in Markdown (text formatting format:
# titles,**bold words**, and- listsdisplay like in a Word document) in real time, ideal for visually validating structured documents. - History Management: Deletion of individual messages, reloading of saved conversations.
Use Cases
Use Case | Description |
|---|---|
Vibe Coding (HTML5 Block) | Attach your HTML/CSS style guide as a variable, describe the desired interface, then copy the code into an HTML5 block in your graph. Iterate in conversation until the result is achieved. |
Narrative Script Generation | Create dialogues for NPCs (Non-Player Characters: characters controlled by the game, not the player), scene descriptions, or narration texts directly from the chat, then save them as text variables. |
MD Documentation | Generate structured Markdown documents (guides, sheets, game rules) by attaching your methodology as a variable. The MD preview instantly validates the rendering. |
Project Data Analysis | Attach a variable containing your JSON or text data (e.g., level list, stats), ask questions to the AI to extract insights. |
Content Translation | Pass your game texts (variables) to the model to translate them into several languages, conversation by conversation. |
Model Comparison | Ask the same question to Mistral, Gemini, and GPT-5 Mini in three separate conversations to compare responses and choose the most suitable model. |
Available Text Models
Provider | Model | Context Window (max conversation memory) |
|---|---|---|
Mistral ๐ซ๐ท | | 256,000 tokens |
Mistral ๐ซ๐ท | | 256,000 tokens |
| 1,000,000 tokens | |
OpenAI | | 400,000 tokens |
Anthropic | | 200,000 tokens |
Meta | | 1,000,000 tokens |
xAI | | 2,000,000 tokens |
Mistral models (sovereign French models) and Llama (Meta, open source) can be downloaded after validation for local execution.
๐ผ๏ธ Image Tab
The Image tab allows for the generation and editing of images through textual descriptions, with two modes of operation.
Text to image Mode
How does it work? You describe the image you want to create in the prompt field. The model generates an entirely new image from scratch, without a source image. The more precise the prompt (style, colors, composition, atmosphere), the more the result will match your vision.
Model | Ratios/Sizes | Features |
|---|---|---|
Nano Banana | 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, 4:5, 5:4, 21:9 | Up to 4 images, PNG/JPG/WebP formats |
Flux 2 Klein 4B | square, square_hd, portrait_3_4, landscape_4_3... | PNG/JPG/WebP formats |
GPT Image 1.5 | auto, 1024ร1024, 1024ร1536, 1536ร1024 | Transparent/opaque background, low/medium/high quality, low/high input fidelity, up to 4 images |
Use Cases:
- ๐จ Game Assets: Generate background scenes, characters, objects, or interface icons tailored to your project's universe, without needing an illustrator.
- ๐ผ๏ธ Quick Variants: Produce 4 versions of the same asset in one click to choose the best result before integration.
- ๐ Narrative Illustrations: Create visuals to punctuate key moments in an interactive story or training module.
- ๐ญ Moodboards / Visual Prototyping: Quickly test different artistic directions before validating the graphic direction of a project.
- ๐ท๏ธ Images with Transparent Backgrounds (GPT Image 1.5): Generate interface elements (sprites, stickers, badges) with transparent backgrounds, directly integratable into Celestory.
Image to image Mode (Image Editing)
How does it work? You provide one or more reference images and a prompt describing the changes to be made. The model relies on your existing images to generate a result that preserves their structure, style, or content, while applying your textual instructions.
Model | Additional Features |
|---|---|
Nano Banana / edit | Multiple reference images, same ratios as text-to-image |
Flux 2 Klein 4B / edit | Multiple reference images |
GPT Image 1.5 / edit | Multiple reference images, optional mask (inpainting : targeted retouching of an image area), transparent background, adjustable quality |
Use Cases:
- ๐๏ธ Asset Reskinning: Take an existing character and apply a new style (e.g., transform a hero into a winter version or a pixel art version) without starting from scratch.
- โ๏ธ Targeted Inpainting (retouching a specific area of an image without touching the rest) (GPT Image 1.5 only): Paint a mask (a gray/black area drawn on the image to tell the AI what to modify) over a specific area of the image (e.g., a character's face, the background of a scene) and ask the AI to regenerate only that area according to a prompt, leaving the rest of the image intact.
- ๐งฉ Style Fusion: Provide multiple reference source images (e.g., an illustration with a graphic style + a photo) to create a result that combines both aesthetics.
- ๐ Series Consistency: Generate several variants of the same character (different poses, expressions, outfits) while keeping the character's reference images to maintain visual consistency.
Common Parameters
- Prompt (the textual instruction you give to the AI): Description of the desired image.
- Ratio / Size: Output format of the image (e.g., 16:9 for landscape, 9:16 for mobile portrait, 1:1 for square).
- Number of Images: Batch generation (up to 4).
- Output Format: PNG (max quality, transparent background possible), JPG (lightweight, good for photos), or WebP (lightweight, supports transparency).
- Reference Images: In editing mode, possibility to provide several source images for the AI to draw inspiration from.
- Mask Image: Area drawn on the image to define the area to be modified (inpainting).
- Advanced Mode: Free entry of the exact model name and JSON (structured data format) parameters for full control.
๐ฌ Video Tab
The Video tab offers 6 generation modes to cover all AI video creation cases, from quick prototyping to enriched content production.
Text to video Mode
How does it work? You describe a scene in a prompt (the text instruction you give to the AI: action, atmosphere, style, camera movementโฆ) and the model generates an entirely synthetic video from scratch. The negative prompt (what you do NOT want to see in the generation) allows you to exclude undesirable elements (e.g., ยซ no text, no blur ยป).
Model | Durations | Options |
|---|---|---|
LTX-2 Fast | 6 to 20 seconds | Optional audio, 1080p/1440p/2160p, 25/50 FPS (frames per second: 25 = cinema, 50 = fluid video), negative prompt |
Veo 3.1 Fast (Google) | 4s, 6s, 8s | Optional audio, 9:16 / 16:9, 720p/1080p, negative prompt |
Kling Video v2.6 Pro | 5s, 10s | Optional audio, 16:9 / 9:16 / 1:1, negative prompt |
Wan v2.6 ๐จ๐ณ | 5s, 10s, 15s | Reference audio, 5 ratios, 720p/1080p, negative prompt |
Use Cases:
- ๐ฌ Game Cinematics: Generate intro or transition sequences for narrative games without pre-existing video resources.
- ๐ฑ Vertical Content (9:16): Create micro-animations for mobile experiences or short formats (social media, ads).
- ๐ซ Training Materials: Dynamically illustrate abstract concepts (e.g., a video on the water cycle for a pedagogical module).
- ๐ Atmosphere Prototyping: Quickly test different visual atmospheres (style, colors, pace) before validating the direction of a project.
Image to video Mode
How does it work? You provide a still image and describe the movement or evolution you want to give it. The model "animates" your image while respecting its composition and content. This is ideal for bringing existing graphic assets to life.
Model | Durations | Options |
|---|---|---|
LTX-2 Fast | 6 to 20 seconds | Optional audio, reference image, 25/50 FPS |
Veo 3.1 Fast | 4s, 6s, 8s | Optional audio, ref. image, negative prompt |
Wan v2.6 | 5s, 10s, 15s | Reference image, 720p/1080p |
Use Cases:
- โจ Illustration Animation: Bring an illustrated game character to life (e.g., make a dragon's wings flap, make a hero's hair float in the wind).
- ๐๏ธ Animated Backgrounds: Transform a static background scene (forest, futuristic city) into a moving video background to enrich immersion.
- ๐ธ Animated Portraits: Add slight animation to a character portrait for dialogue screens or menus.
- ๐ผ๏ธ Transition Between Scenes: Start from a screenshot of the end of a scene to generate a fluid transition to the next scene.
Start and end frame Mode
How does it work? You provide two images: the first frame (still image, equivalent to a "photo" from a video) (start) and the last frame (end) of the video. The model automatically generates all intermediate frames to create a fluid transition between the two. You control the starting and ending points, and let the AI invent the path.
Model | Options |
|---|---|
Veo 3.1 Fast | Start + end image, 4s/6s/8s, optional audio, 720p/1080p |
Kling Video O1 Standard | Start + end image, 5s/10s |
Kling Video O1 | Start + end image, 5s/10s |
Wan FLF2V | Start + end image, 5s, 480p/720p, negative prompt |
Use Cases:
- ๐ Character Morphing: Show the transformation of a character (e.g., a frog becoming a prince) by defining the initial and final states.
- ๐ Storyboard Transitions: In the pre-production phase, generate intermediate content between two key frames of a storyboard.
- ๐บ๏ธ Map Zooms: Start from a global view (frame 1) and arrive at a specific point on the map (frame 2) with an automatically generated zoom.
- โก Reveal Effects: Switch from an empty screen (start) to a full scene (end) for dramatic intros.
Image and Audio to Video Mode (AI Avatar)
How does it work? You provide a portrait image (photo or illustration of a face) and an audio file (voice, dialogue, song). The model synchronizes lip movements and facial expressions in the image with the audio to create a video of a character speaking or singing realistically.
Model | Usage |
|---|---|
Kling Video AI Avatar v2 | Portrait image + audio โ speaking avatar video |
ByteDance OmniHuman v1.5 | Image + audio, 720p/1080p |
Use Cases:
- ๐๏ธ NPC Video: Create non-player characters that speak directly to the player in an ultra-immersive way, from a simple illustration + a TTS audio file generated in the Audio tab.
- ๐บ AI Presenters: Generate a presenter avatar for training modules or onboarding videos without needing a real camera.
- ๐ Video Localization: Use translated audio to make the same avatar "speak" in several languages, with automatic lip-syncing.
- ๐ญ Dubbing Existing Assets: Take an existing character illustration and make it "speak" by synchronizing recorded or TTS-generated dialogue audio.
Video to Video Mode (Extension / Retake)
How does it work? You start from an existing video that you want to modify or extend. The model analyzes your source video and generates either a natural continuation (extension) or an alternative version from a defined starting point in the video (retake).
Model | Usage |
|---|---|
Veo 3.1 Fast / extend | Extension of existing video by +7 seconds |
LTX-2 / retake | Regeneration from a starting point in the video (startTime) |
Use Cases:
- โ Cinematic Extension: Lengthen a video that is too short by naturally extending the action or animation already underway.
- โ๏ธ Video End Correction: If the last part of a video is not suitable, define a cut-off point and regenerate the end alternatively.
- ๐ Loop Creation: Generate a continuation of a video that can be seamlessly chained with the beginning to create looping animations.
Motion Control
How does it work? You combine a character image and a motion reference video (video of a real person moving). The model applies the movements from the reference video to your character from the image, creating an animated video where your character reproduces the captured gestures exactly.
Model | Options |
|---|---|
Kling Video v2.6 / motion-control | Image + motion video, character orientation (image or video), 5s/10s durations |
Use Cases:
- ๐ Character Animations: Make a game character dance, run, or gesture by applying real motion captures.
- ๐ฅ Action Scenes: Use reference videos of martial arts or sports to animate character fights.
- ๐ Educational Characters: Have a trainer avatar perform didactic gestures (pointing, showing, greeting) in an e-learning module.
Common Video Parameters
- Prompt: Textual description of the video.
- Negative prompt: Elements to exclude from the generation.
- Duration: Variable according to the model.
- Aspect ratio: 16:9, 9:16, 4:3, 1:1...
- Resolution: 480p to 2160p according to the model.
- FPS: 25 or 50 (LTX-2 models).
- Audio: Optional, mandatory, or by reference file.
- Start / end image(s): To guide video generation.
- Advanced Mode: Custom JSON parameters.
๐ Audio Tab
The Audio tab offers 4 sound generation modes based on ElevenLabs to cover all needs: voice, musical atmosphere, effects, and dialogues.
Text to Speech Mode (TTS)
How does it work? You enter text in the prompt and select a voice from the 20 available. The model generates an audio file of the chosen voice reading the text, with fine control over stability, similarity, and speaking speed.
Model | Parameters |
|---|---|
ElevenLabs TTS Turbo v2.5 | 20 voices (Aria, Roger, Sarah, Laura, Charlie, George, Callum, River, Liam, Charlotte, Alice, Matilda, Will, Jessica, Eric, Chris, Brian, Daniel, Lily, Bill), stability, similarity boost, speed (0.7โ1.2ร), language code |
Use Cases:
- ๐๏ธ Scene Narration: Generate the voiceover that accompanies transitions or key moments in a narrative game.
- ๐ฌ NPC Dialogues: Produce audio dialogue lines for each non-player character, with a different voice for each character.
- ๐ Accessibility: Generate audio versions of texts to read to make a training session or experience accessible to visually impaired people.
- ๐ Multilingual Dubbing: Couple with translated text generation to produce NPC audios in multiple languages with the same voice.
Music
How does it work? You describe in text the style, atmosphere, and emotion of the music you want (e.g., ยซ epic action music for a fight scene, orchestral, crescendo tension ยป). The model generates an original musical track matching your description.
Model | Parameters |
|---|---|
ElevenLabs Music | Prompt text describing the desired music |
Use Cases:
- ๐ฎ Game Soundtracks: Create unique background music for each zone, scene, or player's emotional state.
- ๐ Narrative Music: Accompany reading or narration sequences with background music tailored to the story's tone.
- ๐๏ธ Training Ambiance: Generate light, non-distracting background music for e-learning modules.
- ๐ฏ Audio Prototyping: Quickly test several musical directions for a project before commissioning a final composition.
Sound Effects
How does it work? You describe the desired sound effect in text (e.g., ยซ distant thunder sound approaching ยป, ยซ mechanical button click ยป). You control the duration (from 0.5 to 22 seconds) and prompt influence (from 0 = very free to 1 = very faithful). The loop option allows for the creation of seamless loop effects.
Model | Parameters |
|---|---|
ElevenLabs Sound Effects v2 | Duration (0.5โ22 seconds), prompt influence (0โ1), loopable |
Use Cases:
- ๐ฅ Interaction Effects: Generate tailor-made feedback sounds (button click, validation, error, alert) for your experience's interfaces.
- ๐ Looping Audio Environments: Create background sounds (rain, forest, crowd, sea) to be activated in a loop to immerse the player in an environment.
- โ๏ธ Action Effects: Hits, explosions, magic, creaking doors โ quickly generate all the contextual sounds of a game scene.
- ๐ Notification Sounds: Produce custom alert, success, or victory sounds for player reward moments.
Multi-voice Dialogue
How does it work? You use a specialized Rich Text editor to compose a conversation between several characters. Each text segment can be assigned a different voice from the 20 available. The model generates a continuous audio file that chains all the lines with the corresponding voices, faithfully reproducing the interaction.
Model | Parameters |
|---|---|
ElevenLabs Text-to-Dialogue v3 | Multi-voice rich editor, 20 voices assignable per line, stability, speaker boost, language code |
Use Cases:
- ๐ฃ๏ธ Ready-to-use Dialogue Scenes: Generate a complete conversation between two NPCs without having to edit audio files separately โ the result is a single chained audio file.
- ๐ญ Audio Cutscenes: Produce dramatic audio sequences with multiple characters for narrative-heavy games, without real dubbing actors.
- ๐ป Podcasts / Microlearning: Create educational dialogues between a fictive trainer and learner to animate training modules engagingly.
- ๐ Multilingual Dialogues: With the language code parameter, force each voice to speak in the target language for localized productions.
๐ Previous Generations Tab
This tab gives access to the complete history of all your AI generations (text, image, video, sound). You can:
- Consult past results
- Filter by type (text, image, video, audio) and source (editor or in-game)
- Paginate the history to navigate through older generations
- Find and reuse outputs already generated
๐ก Best Practices: Competency Prompts and Monitoring
Constraining the Model with Competency Prompts
To get the best results, it is essential to constrain the AI model by providing precise instructions and best practices. This is called a competency prompt: a reference document (style guide, methodology, business rules) that you attach to your requests to guide generation.
Concrete Examples:
- Vibe Coding (HTML5 Block): When you use AI generation to create interfaces via the HTML5 block, attach your HTML/CSS style guide to the prompt. The model will then respect your colors, typographies, and components.
- Markdown Document Generation: When you create text documents via the MD preview of the Text menu, attach a Markdown documentary methodology (type structure, titling conventions, glossary) so that the AI produces documents compliant with your standards.
Thanks to the system of text variables attachable to the chat, you can store your style guides and methodologies as project variables and inject them with one click into each AI conversation.
Major Use Case: Vibe Coding with the HTML5 Block
One of the most powerful use cases of Celestory is vibe coding:
- Open the AI chat in the Text tab.
- Attach your HTML/CSS style guide as a text variable.
- Describe the desired interface, animation, or component in natural language.
- Copy the generated code into an HTML5 block of your graph.
- Instantly view the result in the nocode engine.
- Iterate through AI conversations until the desired result is achieved โ without manually writing a single line of code.
Token Monitoring by Model
Celestory displays the token counter per model in real time:
- History Tokens: how many tokens your conversation has already consumed vs. the model's context window.
- Input Tokens: how many tokens your next message will add (including attached variables).
- Generation Quota: total number of generations used out of your monthly quota.
Monitor these metrics to:
- Optimize your prompts and reduce consumption.
- Choose the right model for the quality/cost ratio of your use case.
- Anticipate quota overages before a production launch.
Advanced Mode
For each generation type (image, video, audio), an advanced mode allows for manual entry of:
- The exact model name (to access models not listed).
- Parameters in JSON format for full control over the request sent to the API.
Updated on: 04/03/2026
Thank you!
