Articles on: Celestory AI
This article is also available in:

๐Ÿ–Œ๏ธ AI Generation Creator

๐Ÿ–Œ๏ธ AI Generation โ€” Creator Side


Celestory combines the power of nocode (creating applications without writing code) and vibe coding (describing what you want to create in plain language and letting the AI generate the code) to quickly test, compare, and iterate on the best Artificial Intelligence models on the market. You can integrate and evaluate a wide range of models โ€” including sovereign models (developed in Europe or open source, independent of US tech giants) like those from Mistral or Flux โ€” directly from the Cloud interface, then download them locally once validated for production execution.



AI Generation Menu


The AI Generation menu is accessible from the editor and is structured into 5 tabs:


Tab

Description

๐Ÿ“ Text

Conversational chat with LLMs (Large Language Models: AI models capable of understanding and generating text) for text generation

๐Ÿ–ผ๏ธ Image

Image generation and editing via prompts

๐ŸŽฌ Video

Creation of videos from text, images, or existing videos

๐Ÿ”Š Audio

Speech synthesis, music, sound effects, and dialogues

๐Ÿ“‹ Previous generations

Complete history of all your past generations


At the top of each tab, a quota counter displays your AI credit consumption in real time (e.g., 45 / 200 generations), with a button to add AI credits.



๐Ÿ“ Text Tab


The Text tab offers an integrated conversational chat to interract directly with language models (LLMs โ€” Large Language Models: AI programs trained to understand and generate text, like very advanced virtual assistants). This is the heart of textual AI generation in Celestory: you can co-write content, generate code, write narrative scripts, or query your project data naturally.


How does it work?


You enter a message in the chat field. The model receives:

  1. The history of your previous exchanges (to maintain context)
  2. Optionally text variables from your project attached to the message
  3. Your current prompt (the instruction or question you send to the AI)


It generates a response that displays progressively in the chat, rendered in Markdown (a lightweight text formatting format: # Title, **bold**, - list โ†’ displays cleanly as rich text) in real time.


Chat Interface

  • Left panel: List of your previous conversations, automatically named. You can resume an existing conversation or create a new one.
  • Right panel: Chat area with model selection, token counter, message history, and input field.


Key Features

  • Model Selection: Choose from available models, grouped by provider.
  • Token Counter: The interface displays in real time the number of tokens (units of text that the AI processes โ€” one token โ‰ˆ ยพ of a word in English; "hello" = 1 token, a 10-word sentence โ‰ˆ 13 tokens) used by the conversation history compared to the context window (maximum amount of text the model can "remember" in a single conversation) of the selected model (e.g., 12,530 / 256,000 tokens).
  • Input Estimation: Before sending a message, the number of additional tokens your input will consume is displayed (e.g., + 842 tokens).
  • Text Variable Attachment: You can attach text variables (text content stored in your Celestory project, such as a character, a game rule, or a style guide) from your project to the prompt. Each attached variable displays its token weight. This allows you to dynamically inject project content into the AI request.
  • Markdown Preview: AI responses are rendered in Markdown (text formatting format: # titles, **bold words**, and - lists display like in a Word document) in real time, ideal for visually validating structured documents.
  • History Management: Deletion of individual messages, reloading of saved conversations.


Use Cases


Use Case

Description

Vibe Coding (HTML5 Block)

Attach your HTML/CSS style guide as a variable, describe the desired interface, then copy the code into an HTML5 block in your graph. Iterate in conversation until the result is achieved.

Narrative Script Generation

Create dialogues for NPCs (Non-Player Characters: characters controlled by the game, not the player), scene descriptions, or narration texts directly from the chat, then save them as text variables.

MD Documentation

Generate structured Markdown documents (guides, sheets, game rules) by attaching your methodology as a variable. The MD preview instantly validates the rendering.

Project Data Analysis

Attach a variable containing your JSON or text data (e.g., level list, stats), ask questions to the AI to extract insights.

Content Translation

Pass your game texts (variables) to the model to translate them into several languages, conversation by conversation.

Model Comparison

Ask the same question to Mistral, Gemini, and GPT-5 Mini in three separate conversations to compare responses and choose the most suitable model.


Available Text Models


Provider

Model

Context Window (max conversation memory)

Mistral ๐Ÿ‡ซ๐Ÿ‡ท

mistral-large-2512

256,000 tokens

Mistral ๐Ÿ‡ซ๐Ÿ‡ท

ministral-3b-2512

256,000 tokens

Google

gemini-3-flash-preview

1,000,000 tokens

OpenAI

gpt-5-mini

400,000 tokens

Anthropic

claude-haiku-4.5

200,000 tokens

Meta

llama-4-maverick

1,000,000 tokens

xAI

grok-4.1-fast

2,000,000 tokens


Mistral models (sovereign French models) and Llama (Meta, open source) can be downloaded after validation for local execution.



๐Ÿ–ผ๏ธ Image Tab


The Image tab allows for the generation and editing of images through textual descriptions, with two modes of operation.


Text to image Mode


How does it work? You describe the image you want to create in the prompt field. The model generates an entirely new image from scratch, without a source image. The more precise the prompt (style, colors, composition, atmosphere), the more the result will match your vision.


Model

Ratios/Sizes

Features

Nano Banana

1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, 4:5, 5:4, 21:9

Up to 4 images, PNG/JPG/WebP formats

Flux 2 Klein 4B

square, square_hd, portrait_3_4, landscape_4_3...

PNG/JPG/WebP formats

GPT Image 1.5

auto, 1024ร—1024, 1024ร—1536, 1536ร—1024

Transparent/opaque background, low/medium/high quality, low/high input fidelity, up to 4 images


Use Cases:

  • ๐ŸŽจ Game Assets: Generate background scenes, characters, objects, or interface icons tailored to your project's universe, without needing an illustrator.
  • ๐Ÿ–ผ๏ธ Quick Variants: Produce 4 versions of the same asset in one click to choose the best result before integration.
  • ๐Ÿ“– Narrative Illustrations: Create visuals to punctuate key moments in an interactive story or training module.
  • ๐ŸŽญ Moodboards / Visual Prototyping: Quickly test different artistic directions before validating the graphic direction of a project.
  • ๐Ÿท๏ธ Images with Transparent Backgrounds (GPT Image 1.5): Generate interface elements (sprites, stickers, badges) with transparent backgrounds, directly integratable into Celestory.



Image to image Mode (Image Editing)


How does it work? You provide one or more reference images and a prompt describing the changes to be made. The model relies on your existing images to generate a result that preserves their structure, style, or content, while applying your textual instructions.


Model

Additional Features

Nano Banana / edit

Multiple reference images, same ratios as text-to-image

Flux 2 Klein 4B / edit

Multiple reference images

GPT Image 1.5 / edit

Multiple reference images, optional mask (inpainting : targeted retouching of an image area), transparent background, adjustable quality


Use Cases:

  • ๐Ÿ–Œ๏ธ Asset Reskinning: Take an existing character and apply a new style (e.g., transform a hero into a winter version or a pixel art version) without starting from scratch.
  • โœ๏ธ Targeted Inpainting (retouching a specific area of an image without touching the rest) (GPT Image 1.5 only): Paint a mask (a gray/black area drawn on the image to tell the AI what to modify) over a specific area of the image (e.g., a character's face, the background of a scene) and ask the AI to regenerate only that area according to a prompt, leaving the rest of the image intact.
  • ๐Ÿงฉ Style Fusion: Provide multiple reference source images (e.g., an illustration with a graphic style + a photo) to create a result that combines both aesthetics.
  • ๐Ÿ”„ Series Consistency: Generate several variants of the same character (different poses, expressions, outfits) while keeping the character's reference images to maintain visual consistency.


Common Parameters

  • Prompt (the textual instruction you give to the AI): Description of the desired image.
  • Ratio / Size: Output format of the image (e.g., 16:9 for landscape, 9:16 for mobile portrait, 1:1 for square).
  • Number of Images: Batch generation (up to 4).
  • Output Format: PNG (max quality, transparent background possible), JPG (lightweight, good for photos), or WebP (lightweight, supports transparency).
  • Reference Images: In editing mode, possibility to provide several source images for the AI to draw inspiration from.
  • Mask Image: Area drawn on the image to define the area to be modified (inpainting).
  • Advanced Mode: Free entry of the exact model name and JSON (structured data format) parameters for full control.



๐ŸŽฌ Video Tab


The Video tab offers 6 generation modes to cover all AI video creation cases, from quick prototyping to enriched content production.



Text to video Mode


How does it work? You describe a scene in a prompt (the text instruction you give to the AI: action, atmosphere, style, camera movementโ€ฆ) and the model generates an entirely synthetic video from scratch. The negative prompt (what you do NOT want to see in the generation) allows you to exclude undesirable elements (e.g., ยซ no text, no blur ยป).


Model

Durations

Options

LTX-2 Fast

6 to 20 seconds

Optional audio, 1080p/1440p/2160p, 25/50 FPS (frames per second: 25 = cinema, 50 = fluid video), negative prompt

Veo 3.1 Fast (Google)

4s, 6s, 8s

Optional audio, 9:16 / 16:9, 720p/1080p, negative prompt

Kling Video v2.6 Pro

5s, 10s

Optional audio, 16:9 / 9:16 / 1:1, negative prompt

Wan v2.6 ๐Ÿ‡จ๐Ÿ‡ณ

5s, 10s, 15s

Reference audio, 5 ratios, 720p/1080p, negative prompt


Use Cases:

  • ๐ŸŽฌ Game Cinematics: Generate intro or transition sequences for narrative games without pre-existing video resources.
  • ๐Ÿ“ฑ Vertical Content (9:16): Create micro-animations for mobile experiences or short formats (social media, ads).
  • ๐Ÿซ Training Materials: Dynamically illustrate abstract concepts (e.g., a video on the water cycle for a pedagogical module).
  • ๐ŸŒ Atmosphere Prototyping: Quickly test different visual atmospheres (style, colors, pace) before validating the direction of a project.



Image to video Mode


How does it work? You provide a still image and describe the movement or evolution you want to give it. The model "animates" your image while respecting its composition and content. This is ideal for bringing existing graphic assets to life.


Model

Durations

Options

LTX-2 Fast

6 to 20 seconds

Optional audio, reference image, 25/50 FPS

Veo 3.1 Fast

4s, 6s, 8s

Optional audio, ref. image, negative prompt

Wan v2.6

5s, 10s, 15s

Reference image, 720p/1080p


Use Cases:

  • โœจ Illustration Animation: Bring an illustrated game character to life (e.g., make a dragon's wings flap, make a hero's hair float in the wind).
  • ๐Ÿž๏ธ Animated Backgrounds: Transform a static background scene (forest, futuristic city) into a moving video background to enrich immersion.
  • ๐Ÿ“ธ Animated Portraits: Add slight animation to a character portrait for dialogue screens or menus.
  • ๐Ÿ–ผ๏ธ Transition Between Scenes: Start from a screenshot of the end of a scene to generate a fluid transition to the next scene.



Start and end frame Mode


How does it work? You provide two images: the first frame (still image, equivalent to a "photo" from a video) (start) and the last frame (end) of the video. The model automatically generates all intermediate frames to create a fluid transition between the two. You control the starting and ending points, and let the AI invent the path.


Model

Options

Veo 3.1 Fast

Start + end image, 4s/6s/8s, optional audio, 720p/1080p

Kling Video O1 Standard

Start + end image, 5s/10s

Kling Video O1

Start + end image, 5s/10s

Wan FLF2V

Start + end image, 5s, 480p/720p, negative prompt


Use Cases:

  • ๐Ÿ”„ Character Morphing: Show the transformation of a character (e.g., a frog becoming a prince) by defining the initial and final states.
  • ๐Ÿš€ Storyboard Transitions: In the pre-production phase, generate intermediate content between two key frames of a storyboard.
  • ๐Ÿ—บ๏ธ Map Zooms: Start from a global view (frame 1) and arrive at a specific point on the map (frame 2) with an automatically generated zoom.
  • โšก Reveal Effects: Switch from an empty screen (start) to a full scene (end) for dramatic intros.



Image and Audio to Video Mode (AI Avatar)


How does it work? You provide a portrait image (photo or illustration of a face) and an audio file (voice, dialogue, song). The model synchronizes lip movements and facial expressions in the image with the audio to create a video of a character speaking or singing realistically.


Model

Usage

Kling Video AI Avatar v2

Portrait image + audio โ†’ speaking avatar video

ByteDance OmniHuman v1.5

Image + audio, 720p/1080p


Use Cases:

  • ๐ŸŽ™๏ธ NPC Video: Create non-player characters that speak directly to the player in an ultra-immersive way, from a simple illustration + a TTS audio file generated in the Audio tab.
  • ๐Ÿ“บ AI Presenters: Generate a presenter avatar for training modules or onboarding videos without needing a real camera.
  • ๐ŸŒ Video Localization: Use translated audio to make the same avatar "speak" in several languages, with automatic lip-syncing.
  • ๐ŸŽญ Dubbing Existing Assets: Take an existing character illustration and make it "speak" by synchronizing recorded or TTS-generated dialogue audio.



Video to Video Mode (Extension / Retake)


How does it work? You start from an existing video that you want to modify or extend. The model analyzes your source video and generates either a natural continuation (extension) or an alternative version from a defined starting point in the video (retake).


Model

Usage

Veo 3.1 Fast / extend

Extension of existing video by +7 seconds

LTX-2 / retake

Regeneration from a starting point in the video (startTime)


Use Cases:

  • โž• Cinematic Extension: Lengthen a video that is too short by naturally extending the action or animation already underway.
  • โœ๏ธ Video End Correction: If the last part of a video is not suitable, define a cut-off point and regenerate the end alternatively.
  • ๐Ÿ” Loop Creation: Generate a continuation of a video that can be seamlessly chained with the beginning to create looping animations.



Motion Control


How does it work? You combine a character image and a motion reference video (video of a real person moving). The model applies the movements from the reference video to your character from the image, creating an animated video where your character reproduces the captured gestures exactly.


Model

Options

Kling Video v2.6 / motion-control

Image + motion video, character orientation (image or video), 5s/10s durations


Use Cases:

  • ๐Ÿ’ƒ Character Animations: Make a game character dance, run, or gesture by applying real motion captures.
  • ๐Ÿฅ‹ Action Scenes: Use reference videos of martial arts or sports to animate character fights.
  • ๐ŸŽ“ Educational Characters: Have a trainer avatar perform didactic gestures (pointing, showing, greeting) in an e-learning module.



Common Video Parameters

  • Prompt: Textual description of the video.
  • Negative prompt: Elements to exclude from the generation.
  • Duration: Variable according to the model.
  • Aspect ratio: 16:9, 9:16, 4:3, 1:1...
  • Resolution: 480p to 2160p according to the model.
  • FPS: 25 or 50 (LTX-2 models).
  • Audio: Optional, mandatory, or by reference file.
  • Start / end image(s): To guide video generation.
  • Advanced Mode: Custom JSON parameters.



๐Ÿ”Š Audio Tab


The Audio tab offers 4 sound generation modes based on ElevenLabs to cover all needs: voice, musical atmosphere, effects, and dialogues.



Text to Speech Mode (TTS)


How does it work? You enter text in the prompt and select a voice from the 20 available. The model generates an audio file of the chosen voice reading the text, with fine control over stability, similarity, and speaking speed.


Model

Parameters

ElevenLabs TTS Turbo v2.5

20 voices (Aria, Roger, Sarah, Laura, Charlie, George, Callum, River, Liam, Charlotte, Alice, Matilda, Will, Jessica, Eric, Chris, Brian, Daniel, Lily, Bill), stability, similarity boost, speed (0.7โ€“1.2ร—), language code


Use Cases:

  • ๐ŸŽ™๏ธ Scene Narration: Generate the voiceover that accompanies transitions or key moments in a narrative game.
  • ๐Ÿ’ฌ NPC Dialogues: Produce audio dialogue lines for each non-player character, with a different voice for each character.
  • ๐Ÿ“š Accessibility: Generate audio versions of texts to read to make a training session or experience accessible to visually impaired people.
  • ๐ŸŒ Multilingual Dubbing: Couple with translated text generation to produce NPC audios in multiple languages with the same voice.



Music


How does it work? You describe in text the style, atmosphere, and emotion of the music you want (e.g., ยซ epic action music for a fight scene, orchestral, crescendo tension ยป). The model generates an original musical track matching your description.


Model

Parameters

ElevenLabs Music

Prompt text describing the desired music


Use Cases:

  • ๐ŸŽฎ Game Soundtracks: Create unique background music for each zone, scene, or player's emotional state.
  • ๐Ÿ“– Narrative Music: Accompany reading or narration sequences with background music tailored to the story's tone.
  • ๐Ÿ‹๏ธ Training Ambiance: Generate light, non-distracting background music for e-learning modules.
  • ๐ŸŽฏ Audio Prototyping: Quickly test several musical directions for a project before commissioning a final composition.



Sound Effects


How does it work? You describe the desired sound effect in text (e.g., ยซ distant thunder sound approaching ยป, ยซ mechanical button click ยป). You control the duration (from 0.5 to 22 seconds) and prompt influence (from 0 = very free to 1 = very faithful). The loop option allows for the creation of seamless loop effects.


Model

Parameters

ElevenLabs Sound Effects v2

Duration (0.5โ€“22 seconds), prompt influence (0โ€“1), loopable


Use Cases:

  • ๐Ÿ’ฅ Interaction Effects: Generate tailor-made feedback sounds (button click, validation, error, alert) for your experience's interfaces.
  • ๐ŸŒŠ Looping Audio Environments: Create background sounds (rain, forest, crowd, sea) to be activated in a loop to immerse the player in an environment.
  • โš”๏ธ Action Effects: Hits, explosions, magic, creaking doors โ€” quickly generate all the contextual sounds of a game scene.
  • ๐Ÿ”” Notification Sounds: Produce custom alert, success, or victory sounds for player reward moments.



Multi-voice Dialogue


How does it work? You use a specialized Rich Text editor to compose a conversation between several characters. Each text segment can be assigned a different voice from the 20 available. The model generates a continuous audio file that chains all the lines with the corresponding voices, faithfully reproducing the interaction.


Model

Parameters

ElevenLabs Text-to-Dialogue v3

Multi-voice rich editor, 20 voices assignable per line, stability, speaker boost, language code


Use Cases:

  • ๐Ÿ—ฃ๏ธ Ready-to-use Dialogue Scenes: Generate a complete conversation between two NPCs without having to edit audio files separately โ€” the result is a single chained audio file.
  • ๐ŸŽญ Audio Cutscenes: Produce dramatic audio sequences with multiple characters for narrative-heavy games, without real dubbing actors.
  • ๐Ÿ“ป Podcasts / Microlearning: Create educational dialogues between a fictive trainer and learner to animate training modules engagingly.
  • ๐ŸŒ Multilingual Dialogues: With the language code parameter, force each voice to speak in the target language for localized productions.



๐Ÿ“‹ Previous Generations Tab


This tab gives access to the complete history of all your AI generations (text, image, video, sound). You can:

  • Consult past results
  • Filter by type (text, image, video, audio) and source (editor or in-game)
  • Paginate the history to navigate through older generations
  • Find and reuse outputs already generated



๐Ÿ’ก Best Practices: Competency Prompts and Monitoring


Constraining the Model with Competency Prompts


To get the best results, it is essential to constrain the AI model by providing precise instructions and best practices. This is called a competency prompt: a reference document (style guide, methodology, business rules) that you attach to your requests to guide generation.


Concrete Examples:


  • Vibe Coding (HTML5 Block): When you use AI generation to create interfaces via the HTML5 block, attach your HTML/CSS style guide to the prompt. The model will then respect your colors, typographies, and components.
  • Markdown Document Generation: When you create text documents via the MD preview of the Text menu, attach a Markdown documentary methodology (type structure, titling conventions, glossary) so that the AI produces documents compliant with your standards.


Thanks to the system of text variables attachable to the chat, you can store your style guides and methodologies as project variables and inject them with one click into each AI conversation.


Major Use Case: Vibe Coding with the HTML5 Block


One of the most powerful use cases of Celestory is vibe coding:

  1. Open the AI chat in the Text tab.
  2. Attach your HTML/CSS style guide as a text variable.
  3. Describe the desired interface, animation, or component in natural language.
  4. Copy the generated code into an HTML5 block of your graph.
  5. Instantly view the result in the nocode engine.
  6. Iterate through AI conversations until the desired result is achieved โ€” without manually writing a single line of code.


Token Monitoring by Model


Celestory displays the token counter per model in real time:

  • History Tokens: how many tokens your conversation has already consumed vs. the model's context window.
  • Input Tokens: how many tokens your next message will add (including attached variables).
  • Generation Quota: total number of generations used out of your monthly quota.


Monitor these metrics to:

  • Optimize your prompts and reduce consumption.
  • Choose the right model for the quality/cost ratio of your use case.
  • Anticipate quota overages before a production launch.


Advanced Mode


For each generation type (image, video, audio), an advanced mode allows for manual entry of:

  • The exact model name (to access models not listed).
  • Parameters in JSON format for full control over the request sent to the API.


Updated on: 04/03/2026

Was this article helpful?

Share your feedback

Cancel

Thank you!