Text-to-video model

A compilation video generated using OpenAI's Sora 2 text-to-video model

A text-to-video model is a form of generative artificial intelligence that uses a natural language description as input to produce a video relevant to the input text.^[1] Advancements during the 2020s in the generation of high-quality, text-conditioned videos have largely been driven by the development of video diffusion models.^[2]

Models

There are different models, including open source models. Chinese-language input^[3] CogVideo is the earliest text-to-video model "of 9.4 billion parameters" to be developed, with its demo version of open source codes first presented on GitHub in 2022.^[4] That year, Meta Platforms released a partial text-to-video model called "Make-A-Video",^[5]^[6]^[7] and Google's Brain (later Google DeepMind) introduced Imagen Video, a text-to-video model with 3D U-Net.^[8]^[6]^[9]^[10]^[11]

In February 2023, Runway released Gen-1 and Gen-2, among the first commercially available text-to-video and video-to-video models accessible to the public through a web interface. Gen-1, initially released as a video-to-video model, allowed users to transform existing video footage using text or image prompts.^[12] Gen-2, introduced in March 2023 and made publicly available in June 2023, added text-to-video capabilities, enabling users to generate videos from text prompts alone.^[13]

In March 2023, a research paper titled "VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation" was published, presenting a novel approach to video generation.^[14] The VideoFusion model decomposes the diffusion process into two components: base noise and residual noise, which are shared across frames to ensure temporal coherence. By utilizing a pre-trained image diffusion model as a base generator, the model efficiently generated high-quality and coherent videos. Fine-tuning the pre-trained model on video data addressed the domain gap between image and video data, enhancing the model's ability to produce realistic and consistent video sequences.^[15] In the same month, Adobe introduced Firefly AI as part of its features.^[16]

In January 2024, Google announced development of a text-to-video model named Lumiere which is anticipated to integrate advanced video editing capabilities.^[17] Matthias Niessner and Lourdes Agapito at AI company Synthesia work on developing 3D neural rendering techniques that can synthesise realistic video by using 2D and 3D neural representations of shape, appearances, and motion for controllable video synthesis of avatars.^[18] In June 2024, Luma Labs launched its Dream Machine video tool.^[19]^[20] That same month,^[21] Kuaishou extended its Kling AI text-to-video model to international users. In July 2024, TikTok owner ByteDance released Jimeng AI in China, through its subsidiary, Faceu Technology.^[22] By September 2024, the Chinese AI company MiniMax debuted its video-01 model, joining other established AI model companies like Zhipu AI, Baichuan, and Moonshot AI, which contribute to China's involvement in AI technology.^[23] In December 2024 Lightricks launched LTX Video as an open source model.^[24]

Alternative approaches to text-to-video models include^[25] Google's Phenaki, Hour One, Colossyan,^[3] Runway's Gen-3 Alpha,^[26]^[27] and OpenAI's Sora,^[28]^[29] Several additional text-to-video models, such as Plug-and-Play, Text2LIVE, and TuneAVideo, have emerged.^[30] FLUX.1 developer Black Forest Labs has announced its text-to-video model SOTA.^[31] Google was preparing to launch a video generation tool named Veo for YouTube Shorts in 2025.^[32] In May 2025, Google launched the Veo 3 iteration of the model. It was noted for its impressive audio generation capabilities, which were a previous limitation for text-to-video models.^[33] In July 2025 Lightricks released an update to LTX Video capable of generating clips reaching 60 seconds,^[34]^[35] and in October 2025 it released LTX-2, with audio capabilities built in.^[36]

Architecture and training

There are several architectures that have been used to create text-to-video models. Similar to text-to-image models, these models can be trained using Recurrent Neural Networks (RNNs) such as long short-term memory (LSTM) networks, which has been used for Pixel Transformation Models and Stochastic Video Generation Models, which aid in consistency and realism respectively.^[37] An alternative for these include transformer models. Generative adversarial networks (GANs), Variational autoencoders (VAEs), — which can aid in the prediction of human motion^[38] — and diffusion models have also been used to develop the image generation aspects of the model.^[39]

Text-video datasets used to train models include, but are not limited to, WebVid-10M, HDVILA-100M, CCV, ActivityNet, and Panda-70M.^[40]^[41] These datasets contain millions of original videos of interest, generated videos, captioned-videos, and textual information that help train models for accuracy. Text-video datasets used to train models include, but are not limited to PromptSource, DiffusionDB, and VidProM.^[40]^[41] These datasets provide the range of text inputs needed to teach models how to interpret a variety of textual prompts.

The video generation process involves synchronizing the text inputs with video frames, ensuring alignment and consistency throughout the sequence. This predictive process is subject to decline in quality as the length of the video increases due to resource limitations.^[41] The Will Smith Eating Spaghetti test is a benchmark for models.^[42]

Limitations

Despite the rapid evolution of text-to-video models in their performance, a primary limitation is that they are very computationally heavy which limits its capacity to provide high quality and lengthy outputs.^[43]^[44] Additionally, these models require a large amount of specific training data to be able to generate high quality and coherent outputs, which brings about the issue of accessibility.^[44]^[43]

Moreover, models may misinterpret textual prompts, resulting in video outputs that deviate from the intended meaning. This can occur due to limitations in capturing semantic context embedded in text, which affects the model's ability to align generated video with the user's intended message.^[44]^[41] Various models, including Make-A-Video, Imagen Video, Phenaki, CogVideo, GODIVA, and NUWA, are currently being tested and refined to enhance their alignment capabilities and overall performance in text-to-video generation.^[44]

Another issue with the outputs is that text or fine details in AI-generated videos often appear garbled, a problem that stable diffusion models also struggle with. Examples include distorted hands and unreadable text.

Ethics

The deployment of text-to-video models raises ethical considerations related to content generation. These models have the potential to create inappropriate or unauthorized content, including explicit material, graphic violence, misinformation, and likenesses of real individuals without consent.^[40] Ensuring that AI-generated content complies with established standards for safe and ethical usage is essential, as content generated by these models may not always be easily identified as harmful or misleading. The ability of AI to recognize and filter out NSFW or copyrighted content remains an ongoing challenge, with implications for both creators and audiences.^[40]

Impacts and applications

Text-to-video models offer a broad range of applications that may benefit various fields, from educational and promotional to creative industries. These models can streamline content creation for training videos, movie previews, gaming assets, and visualizations, making it easier to generate content.^[45]

During the Russo-Ukrainian war, fake videos made with Artificial Intelligence were created as part of a propaganda war against Ukraine and shared in social media. These included depictions of children in the Ukrainian Armed Forces, fake ads targeting children encouraging them to denounce critics of the Ukrainian government, or fictitious statements by Ukrainian President Volodymyr Zelenskyy about the country's surrender, among others.^[46]^[47]^[48]^[49]^[50]^[51]

Movies

Kaur vs Kore is the first Indian feature film made using generative AI which features dual role for the AI character of Sunny Leone, set to release in 2026.^[52]^[53]^[54]

Chiranjeevi Hanuman – The Eternal is an Indian movie made entirely using Generative AI created by Vijay Subramaniam which is set for theatrical release in 2026. The movie was widely criticised by the Film makers in the Bollywood industry for entirely relying on AI and use of AI was seen as an existential threat to their career.^[55]^[56]^[57]

Series

Mahabharat: Ek Dharmayudh is an Indian mythological OTT series released on October 2025 and streamed on JioHotstar. It is recognized as the first series created entirely using artificial intelligence to generate visuals and character animations and consists of 100 episodes.^[58]^[59]^[60]

Comparison of models


Model/Product	Company	Year released	Status	Key features	Capabilities	Pricing	Video length	Supported languages
Synthesia	Synthesia	2019	Released	AI avatars, multilingual support for 60+ languages, customization options^[61]	Specialized in realistic AI avatars for corporate training and marketing^[61]	Subscription-based, starting around $30/month	Varies based on subscription	60+
Vexub	Vexub	2023	Released	Text-to-video from prompt, focus on TikTok and YouTube storytelling formats for social media^[62]	Generates AI videos (1–15 mins) from text prompts; includes editing and voice features^[62]	Subscription-based, with various plans	Up to ~15 minutes	70+
InVideo AI	InVideo	2021	Released	AI-powered video creation, large stock library, AI talking avatars^[61]	Tailored for social media content with platform-specific templates^[61]	Free plan available, Paid plans starting at $16/month	Varies depending on content type	Multiple (not specified)
Fliki	Fliki AI	2022	Released	Text-to-video with AI avatars and voices, extensive language and voice support^[61]	Supports 65+ AI avatars and 2,000+ voices in 70 languages^[61]	Free plan available, Paid plans starting at $30/month	Varies based on subscription	70+
Runway Gen-2	Runway AI	2023	Released	Multimodal video generation from text, images, or videos^[63]	High-quality visuals, various modes like stylization and storyboard^[63]	Free trial, Paid plans (details not specified)	Up to 16 seconds	Multiple (not specified)
Pika Labs	Pika Labs	2024	Beta	Dynamic video generation, camera and motion customization^[64]	User-friendly, focused on natural dynamic generation^[64]	Currently free during beta	Flexible, supports longer videos with frame continuation	Multiple (not specified)
Runway Gen-3 Alpha	Runway AI	2024	Alpha	Enhanced visual fidelity, photorealistic humans, fine-grained temporal control^[65]	Ultra-realistic video generation with precise key-framing and industry-level customization^[65]	Free trial available, custom pricing for enterprises	Up to 10 seconds per clip, extendable	Multiple (not specified)
Google Veo	Google	2024	Released	Google Gemini prompting, voice acting, sound effects, background music. Cinema style realistic videos.^[66]	Can generate very realistic and detailed character models/scenes/clips, with accommodating and matching voice acting, ambient sounds, and background music. Ability to extend clips with continuity.^[67]	Varies ($250 Google Pro/Ultra AI subscription, and additional AI credit Top-Ups)	Eight seconds for individual clips (however clips can be continued/extended as separate clips)	50+
OpenAI Sora	OpenAI	2024	Alpha	Deep language understanding, high-quality cinematic visuals, multi-shot videos^[68]	Capable of creating detailed, dynamic, and emotionally expressive videos; still under development with safety measures^[68]	Pricing not yet disclosed	Expected to generate longer videos; duration specifics TBD	Multiple (not specified)
Runway Gen-4	Runway	2025	Released	Consistent characters across scenes,^[69] world consistency,^[70] camera control, physics simulation	Generates 5-10 second clips with consistent characters, objects, and environments across multiple shots^[71]	Credit-based subscription, part of paid plans	5-10 seconds	Multiple (not specified)

References

^ Artificial Intelligence Index Report 2023 (PDF) (Report). Stanford Institute for Human-Centered Artificial Intelligence. p. 98. Multiple high quality text-to-video models, AI systems that can generate video clips from prompted text, were released in 2022.
^ Melnik, Andrew; Ljubljanac, Michal; Lu, Cong; Yan, Qi; Ren, Weiming; Ritter, Helge (6 May 2024). "Video Diffusion Models: A Survey". arXiv:2405.03150 [cs.CV].
^ ^a ^b Wodecki, Ben (11 August 2023). "Text-to-Video Generative AI Models: The Definitive List". AI Business. Informa. Retrieved 18 November 2024.
^ CogVideo, THUDM, 12 October 2022, retrieved 12 October 2022
^ Davies, Teli (29 September 2022). "Make-A-Video: Meta AI's New Model For Text-To-Video Generation". Weights & Biases. Retrieved 12 October 2022.
^ ^a ^b Monge, Jim Clyde (3 August 2022). "This AI Can Create Video From Text Prompt". Medium. Retrieved 12 October 2022.
^ "Meta's Make-A-Video AI creates videos from text". www.fonearena.com. Retrieved 12 October 2022.
^ "google: Google takes on Meta, introduces own video-generating AI". The Economic Times. 6 October 2022. Retrieved 12 October 2022.
^ "Nuh-uh, Meta, we can do text-to-video AI, too, says Google". The Register. Retrieved 12 October 2022.
^ "Papers with Code - See, Plan, Predict: Language-guided Cognitive Planning with Video Prediction". paperswithcode.com. Retrieved 12 October 2022.
^ "Papers with Code - Text-driven Video Prediction". paperswithcode.com. Retrieved 12 October 2022.
^ page, Will Douglas Heavenarchive. "The original startup behind Stable Diffusion has launched a generative AI for video". MIT Technology Review. Retrieved 17 October 2025.
^ Wiggers, Kyle (9 June 2023). "Runway's Gen-2 shows the limitations of today's text-to-video tech". TechCrunch. Retrieved 17 October 2025.
^ Luo, Zhengxiong; Chen, Dayou; Zhang, Yingya; Huang, Yan; Wang, Liang; Shen, Yujun; Zhao, Deli; Zhou, Jingren; Tan, Tieniu (2023). "VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation". arXiv:2303.08320 [cs.CV].
^ Luo, Zhengxiong; Chen, Dayou; Zhang, Yingya; Huang, Yan; Wang, Liang; Shen, Yujun; Zhao, Deli; Zhou, Jingren; Tan, Tieniu (2023). "VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation". arXiv:2303.08320 [cs.CV].
^ "Adobe launches Firefly Video model and enhances image, vector and design models. Adobe Newsroom". Adobe Inc. 10 October 2024. Retrieved 18 November 2024.
^ Yirka, Bob (26 January 2024). "Google announces the development of Lumiere, an AI-based next-generation text-to-video generator". Tech Xplore. Retrieved 18 November 2024.
^ "Text to Speech for Videos". Synthesia.io. Retrieved 17 October 2023.
^ Nuñez, Michael (12 June 2024). "Luma AI debuts 'Dream Machine' for realistic video generation, heating up AI media race". VentureBeat. Retrieved 18 November 2024.
^ Fink, Charlie. "Apple Debuts Intelligence, Mistral Raises $600 Million, New AI Text-To-Video". Forbes. Retrieved 18 November 2024.
^ Franzen, Carl (12 June 2024). "What you need to know about Kling, the AI video generator rival to Sora that's wowing creators". VentureBeat. Retrieved 18 November 2024.
^ "ByteDance joins OpenAI's Sora rivals with AI video app launch". Reuters. 6 August 2024. Retrieved 18 November 2024.
^ "Chinese ai "tiger" minimax launches text-to-video-generating model to rival OpenAI's sora". Yahoo! Finance. 2 September 2024. Retrieved 18 November 2024.
^ Requiroso, Kelvene (15 December 2024). "Lightricks' LTXV Model Breaks Speed Records, Generating 5-Second AI Video Clips in 4 Seconds". eWEEK. Retrieved 24 July 2025.
^ Text2Video-Zero, Picsart AI Research (PAIR), 12 August 2023, retrieved 12 August 2023
^ Kemper, Jonathan (1 July 2024). "Runway's Sora competitor Gen-3 Alpha now available". THE DECODER. Retrieved 18 November 2024.
^ "Generative AI's Next Frontier Is Video". Bloomberg.com. 20 March 2023. Retrieved 18 November 2024.
^ "OpenAI teases 'Sora,' its new text-to-video AI model". NBC News. 15 February 2024. Retrieved 18 November 2024.
^ Kelly, Chris (25 June 2024). "Toys R Us creates first brand film to use OpenAI's text-to-video tool". Marketing Dive. Informa. Retrieved 18 November 2024.
^ Jin, Jiayao; Wu, Jianhang; Xu, Zhoucheng; Zhang, Hang; Wang, Yaxin; Yang, Jielong (4 August 2023). "Text to Video: Enhancing Video Generation Using Diffusion Models and Reconstruction Network". 2023 2nd International Conference on Computing, Communication, Perception and Quantum Technology (CCPQT). IEEE. pp. 108–114. doi:10.1109/CCPQT60491.2023.00024. ISBN 979-8-3503-4269-7.
^ "Announcing Black Forest Labs". Black Forest Labs. 1 August 2024. Retrieved 18 November 2024.
^ Forlini, Emily Dreibelbis (18 September 2024). "Google's veo text-to-video AI generator is coming to YouTube shorts". PC Magazine. Retrieved 18 November 2024.
^ Subin, Jennifer Elias,Samantha (20 May 2025). "Google launches Veo 3, an AI video generator that incorporates audio". CNBC. Retrieved 22 May 2025.{{cite web}}: CS1 maint: multiple names: authors list (link)
^ Fink, Charlie. "LTX Video Breaks The 60-Second Barrier, Redefining AI Video As A Longform Medium". Forbes. Retrieved 24 July 2025.
^ "Lightricks' latest release lets creators direct long-form AI-generated videos in real time". SiliconANGLE. 16 July 2025. Retrieved 24 July 2025.
^ Shahaf, Tal (23 October 2025). "Lightricks unveils powerful AI video model challenging OpenAI and Google". Ynetglobal. Retrieved 25 October 2025.
^ Bhagwatkar, Rishika; Bachu, Saketh; Fitter, Khurshed; Kulkarni, Akshay; Chiddarwar, Shital (17 December 2020). "A Review of Video Generation Approaches". 2020 International Conference on Power, Instrumentation, Control and Computing (PICC). IEEE. pp. 1–5. doi:10.1109/PICC51425.2020.9362485. ISBN 978-1-7281-7590-4.
^ Kim, Taehoon; Kang, ChanHee; Park, JaeHyuk; Jeong, Daun; Yang, ChangHee; Kang, Suk-Ju; Kong, Kyeongbo (3 January 2024). "Human Motion Aware Text-to-Video Generation with Explicit Camera Control". 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE. pp. 5069–5078. doi:10.1109/WACV57701.2024.00500. ISBN 979-8-3503-1892-0.
^ Singh, Aditi (9 May 2023). "A Survey of AI Text-to-Image and AI Text-to-Video Generators". 2023 4th International Conference on Artificial Intelligence, Robotics and Control (AIRC). IEEE. pp. 32–36. arXiv:2311.06329. doi:10.1109/AIRC57904.2023.10303174. ISBN 979-8-3503-4824-8.
^ ^a ^b ^c ^d Miao, Yibo; Zhu, Yifan; Dong, Yinpeng; Yu, Lijia; Zhu, Jun; Gao, Xiao-Shan (8 September 2024). "T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models". arXiv:2407.05965 [cs.CV].
^ ^a ^b ^c ^d Zhang, Ji; Mei, Kuizhi; Wang, Xiao; Zheng, Yu; Fan, Jianping (August 2018). "From Text to Video: Exploiting Mid-Level Semantics for Large-Scale Video Classification". 2018 24th International Conference on Pattern Recognition (ICPR). IEEE. pp. 1695–1700. doi:10.1109/ICPR.2018.8545513. ISBN 978-1-5386-3788-3.
^ Placido, Dani Di. "Google's AI Passed The 'Will Smith Eating Spaghetti' Test". Forbes. Archived from the original on 3 June 2025. Retrieved 1 June 2025.
^ ^a ^b Bhagwatkar, Rishika; Bachu, Saketh; Fitter, Khurshed; Kulkarni, Akshay; Chiddarwar, Shital (17 December 2020). "A Review of Video Generation Approaches". 2020 International Conference on Power, Instrumentation, Control and Computing (PICC). IEEE. pp. 1–5. doi:10.1109/PICC51425.2020.9362485. ISBN 978-1-7281-7590-4.
^ ^a ^b ^c ^d Singh, Aditi (9 May 2023). "A Survey of AI Text-to-Image and AI Text-to-Video Generators". 2023 4th International Conference on Artificial Intelligence, Robotics and Control (AIRC). IEEE. pp. 32–36. arXiv:2311.06329. doi:10.1109/AIRC57904.2023.10303174. ISBN 979-8-3503-4824-8.
^ Singh, Aditi (9 May 2023). "A Survey of AI Text-to-Image and AI Text-to-Video Generators". 2023 4th International Conference on Artificial Intelligence, Robotics and Control (AIRC). IEEE. pp. 32–36. arXiv:2311.06329. doi:10.1109/AIRC57904.2023.10303174. ISBN 979-8-3503-4824-8.
^ ქურასბედიანი, ალექსი (9 June 2025). "AI-Generated Photo Of Ukrainian Children In Military Uniforms Circulated Online | Mythdetector.com". Retrieved 16 June 2025.
^ "Fake Ukraine ad urges kids to report relatives enjoying Russian music". euronews. 28 March 2025. Retrieved 16 June 2025.
^ "Photos of Ukrainian children generated by artificial intelligence". behindthenews.ua. 26 June 2024. Retrieved 16 June 2025.
^ "Fake Ukrainian TV advert urges children to report relatives listening to Russian music".
^ "Deepfake video of Zelenskyy could be 'tip of the iceberg' in info war, experts warn". NPR. 16 March 2022. Retrieved 16 June 2025.
^ "Ukraine war: Deepfake video of Zelenskyy telling Ukrainians to 'lay down arms' debunked". Sky News. Retrieved 16 June 2025.
^ "'Kaur vs KORE': Sunny Leone is all set to play double role in an AI-driven feature film - Report". The Times of India. 24 September 2025. ISSN 0971-8257. Retrieved 26 October 2025.
^ Farzeen, Sana (24 September 2025). "Exclusive: Sunny Leone sets trend with AI-driven feature film 'Kaur vs KORE'". India Today. Retrieved 26 October 2025.
^ "Sunny Leone takes on dual role as human and AI avatar in 'Kaur vs KORE'". @mathrubhumi. 24 September 2025. Retrieved 26 October 2025.
^ Sharma, Manoj (26 October 2025). "From Mahabharat to Hanuman: Collective Artists Network's Vijay Subramaniam on how AI is shaping India's entertainment future". Fortune India. Retrieved 26 October 2025.
^ PTI (20 August 2025). "Anurag Kashyap slams producer Vijay Subramaniam over AI generated film 'Chiranjeevi Hanuman'". The Hindu. ISSN 0971-751X. Retrieved 26 October 2025.
^ "AI-generated film Chiranjeevi Hanuman announced, Vikramaditya Motwane not pleased about it: 'So it begins'". Hindustan Times. 19 August 2025. Retrieved 26 October 2025.
^ "Mahabharat: Ek Dharmayudh OTT release: When and where to watch India's first AI-powered mythological series". The Economic Times. 23 October 2025. ISSN 0013-0389. Retrieved 26 October 2025.
^ "'With AI Mahabharat, viewers will get to experience the same story in a new way'". The Times of India. 18 October 2025. ISSN 0971-8257. Retrieved 26 October 2025.
^ Ramachandran, Naman (10 October 2025). "AI-Powered 'Mahabharat' Trailer Bows Ahead of India Debut (EXCLUSIVE)". Variety. Retrieved 26 October 2025.
^ ^a ^b ^c ^d ^e ^f "Top AI Video Generation Models of 2024". Deepgram. Retrieved 30 August 2024.
^ ^a ^b "Vexub – Text-to-video AI generator". Vexub. Retrieved 25 June 2025.
^ ^a ^b "Runway Research | Gen-2: Generate novel videos with text, images or video clips". runwayml.com. Retrieved 30 August 2024.
^ ^a ^b Sharma, Shubham (26 December 2023). "Pika Labs' text-to-video AI platform opens to all: Here's how to use it". VentureBeat. Retrieved 30 August 2024.
^ ^a ^b "Runway Research | Introducing Gen-3 Alpha: A New Frontier for Video Generation". runwayml.com. Retrieved 30 August 2024.
^ "Meet Flow, AI-powered filmmaking with Veo 3". blogs.google.com. 20 May 2025. Retrieved 6 July 2025.
^ "Google Veo DeepMind". google.com. Retrieved 6 July 2025.
^ ^a ^b "Sora | OpenAI". openai.com. Retrieved 30 August 2024.
^ Nuñez, Michael (31 March 2025). "Runway Gen-4 solves AI video's biggest problem: character consistency across scenes". VentureBeat. Archived from the original on 21 July 2025. Retrieved 17 October 2025.
^ "Runway's New Gen-4 AI System Promises the Most Predictable Media Creation Yet | No Film School". nofilmschool.com. Retrieved 17 October 2025.
^ Wiggers, Kyle (31 March 2025). "Runway releases an impressive new video-generating AI model". TechCrunch. Retrieved 17 October 2025.

[AIIR-1] Artificial Intelligence Index Report 2023 (PDF) (Report). Stanford Institute for Human-Centered Artificial Intelligence. p. 98. Multiple high quality text-to-video models, AI systems that can generate video clips from prompted text, were released in 2022.

[2] Melnik, Andrew; Ljubljanac, Michal; Lu, Cong; Yan, Qi; Ren, Weiming; Ritter, Helge (6 May 2024). "Video Diffusion Models: A Survey". arXiv:2405.03150 [cs.CV].

[:5-3] Wodecki, Ben (11 August 2023). "Text-to-Video Generative AI Models: The Definitive List". AI Business. Informa. Retrieved 18 November 2024.

[4] CogVideo, THUDM, 12 October 2022, retrieved 12 October 2022

[5] Davies, Teli (29 September 2022). "Make-A-Video: Meta AI's New Model For Text-To-Video Generation". Weights & Biases. Retrieved 12 October 2022.

[Monge-6] Monge, Jim Clyde (3 August 2022). "This AI Can Create Video From Text Prompt". Medium. Retrieved 12 October 2022.

[7] "Meta's Make-A-Video AI creates videos from text". www.fonearena.com. Retrieved 12 October 2022.

[8] "google: Google takes on Meta, introduces own video-generating AI". The Economic Times. 6 October 2022. Retrieved 12 October 2022.

[9] "Nuh-uh, Meta, we can do text-to-video AI, too, says Google". The Register. Retrieved 12 October 2022.

[10] "Papers with Code - See, Plan, Predict: Language-guided Cognitive Planning with Video Prediction". paperswithcode.com. Retrieved 12 October 2022.

[11] "Papers with Code - Text-driven Video Prediction". paperswithcode.com. Retrieved 12 October 2022.

[12] , Will Douglas Heavenarchive. "The original startup behind Stable Diffusion has launched a generative AI for video". MIT Technology Review. Retrieved 17 October 2025.

[13] Wiggers, Kyle (9 June 2023). "Runway's Gen-2 shows the limitations of today's text-to-video tech". TechCrunch. Retrieved 17 October 2025.

[14] Luo, Zhengxiong; Chen, Dayou; Zhang, Yingya; Huang, Yan; Wang, Liang; Shen, Yujun; Zhao, Deli; Zhou, Jingren; Tan, Tieniu (2023). "VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation". arXiv:2303.08320 [cs.CV].

[15] Luo, Zhengxiong; Chen, Dayou; Zhang, Yingya; Huang, Yan; Wang, Liang; Shen, Yujun; Zhao, Deli; Zhou, Jingren; Tan, Tieniu (2023). "VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation". arXiv:2303.08320 [cs.CV].

[16] "Adobe launches Firefly Video model and enhances image, vector and design models. Adobe Newsroom". Adobe Inc. 10 October 2024. Retrieved 18 November 2024.

[17] Yirka, Bob (26 January 2024). "Google announces the development of Lumiere, an AI-based next-generation text-to-video generator". Tech Xplore. Retrieved 18 November 2024.

[18] "Text to Speech for Videos". Synthesia.io. Retrieved 17 October 2023.

[19] Nuñez, Michael (12 June 2024). "Luma AI debuts 'Dream Machine' for realistic video generation, heating up AI media race". VentureBeat. Retrieved 18 November 2024.

[20] Fink, Charlie. "Apple Debuts Intelligence, Mistral Raises $600 Million, New AI Text-To-Video". Forbes. Retrieved 18 November 2024.

[21] Franzen, Carl (12 June 2024). "What you need to know about Kling, the AI video generator rival to Sora that's wowing creators". VentureBeat. Retrieved 18 November 2024.

[22] "ByteDance joins OpenAI's Sora rivals with AI video app launch". Reuters. 6 August 2024. Retrieved 18 November 2024.

[23] "Chinese ai "tiger" minimax launches text-to-video-generating model to rival OpenAI's sora". Yahoo! Finance. 2 September 2024. Retrieved 18 November 2024.

[24] Requiroso, Kelvene (15 December 2024). "Lightricks' LTXV Model Breaks Speed Records, Generating 5-Second AI Video Clips in 4 Seconds". eWEEK. Retrieved 24 July 2025.

[25] Text2Video-Zero, Picsart AI Research (PAIR), 12 August 2023, retrieved 12 August 2023

[26] Kemper, Jonathan (1 July 2024). "Runway's Sora competitor Gen-3 Alpha now available". THE DECODER. Retrieved 18 November 2024.

[27] "Generative AI's Next Frontier Is Video". Bloomberg.com. 20 March 2023. Retrieved 18 November 2024.

[28] "OpenAI teases 'Sora,' its new text-to-video AI model". NBC News. 15 February 2024. Retrieved 18 November 2024.

[29] Kelly, Chris (25 June 2024). "Toys R Us creates first brand film to use OpenAI's text-to-video tool". Marketing Dive. Informa. Retrieved 18 November 2024.

[30] Jin, Jiayao; Wu, Jianhang; Xu, Zhoucheng; Zhang, Hang; Wang, Yaxin; Yang, Jielong (4 August 2023). "Text to Video: Enhancing Video Generation Using Diffusion Models and Reconstruction Network". 2023 2nd International Conference on Computing, Communication, Perception and Quantum Technology (CCPQT). IEEE. pp. 108–114. doi:10.1109/CCPQT60491.2023.00024. ISBN 979-8-3503-4269-7.

[31] "Announcing Black Forest Labs". Black Forest Labs. 1 August 2024. Retrieved 18 November 2024.

[32] Forlini, Emily Dreibelbis (18 September 2024). "Google's veo text-to-video AI generator is coming to YouTube shorts". PC Magazine. Retrieved 18 November 2024.

[33] Subin, Jennifer Elias,Samantha (20 May 2025). "Google launches Veo 3, an AI video generator that incorporates audio". CNBC. Retrieved 22 May 2025.{{cite web}}: CS1 maint: multiple names: authors list (link)

[34] Fink, Charlie. "LTX Video Breaks The 60-Second Barrier, Redefining AI Video As A Longform Medium". Forbes. Retrieved 24 July 2025.

[35] "Lightricks' latest release lets creators direct long-form AI-generated videos in real time". SiliconANGLE. 16 July 2025. Retrieved 24 July 2025.

[36] Shahaf, Tal (23 October 2025). "Lightricks unveils powerful AI video model challenging OpenAI and Google". Ynetglobal. Retrieved 25 October 2025.

[:02-37] Bhagwatkar, Rishika; Bachu, Saketh; Fitter, Khurshed; Kulkarni, Akshay; Chiddarwar, Shital (17 December 2020). "A Review of Video Generation Approaches". 2020 International Conference on Power, Instrumentation, Control and Computing (PICC). IEEE. pp. 1–5. doi:10.1109/PICC51425.2020.9362485. ISBN 978-1-7281-7590-4.

[38] Kim, Taehoon; Kang, ChanHee; Park, JaeHyuk; Jeong, Daun; Yang, ChangHee; Kang, Suk-Ju; Kong, Kyeongbo (3 January 2024). "Human Motion Aware Text-to-Video Generation with Explicit Camera Control". 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE. pp. 5069–5078. doi:10.1109/WACV57701.2024.00500. ISBN 979-8-3503-1892-0.

[:12-39] Singh, Aditi (9 May 2023). "A Survey of AI Text-to-Image and AI Text-to-Video Generators". 2023 4th International Conference on Artificial Intelligence, Robotics and Control (AIRC). IEEE. pp. 32–36. arXiv:2311.06329. doi:10.1109/AIRC57904.2023.10303174. ISBN 979-8-3503-4824-8.

[:23-40] Miao, Yibo; Zhu, Yifan; Dong, Yinpeng; Yu, Lijia; Zhu, Jun; Gao, Xiao-Shan (8 September 2024). "T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models". arXiv:2407.05965 [cs.CV].

[:32-41] Zhang, Ji; Mei, Kuizhi; Wang, Xiao; Zheng, Yu; Fan, Jianping (August 2018). "From Text to Video: Exploiting Mid-Level Semantics for Large-Scale Video Classification". 2018 24th International Conference on Pattern Recognition (ICPR). IEEE. pp. 1695–1700. doi:10.1109/ICPR.2018.8545513. ISBN 978-1-5386-3788-3.

[42] Placido, Dani Di. "Google's AI Passed The 'Will Smith Eating Spaghetti' Test". Forbes. Archived from the original on 3 June 2025. Retrieved 1 June 2025.

[:03-43] Bhagwatkar, Rishika; Bachu, Saketh; Fitter, Khurshed; Kulkarni, Akshay; Chiddarwar, Shital (17 December 2020). "A Review of Video Generation Approaches". 2020 International Conference on Power, Instrumentation, Control and Computing (PICC). IEEE. pp. 1–5. doi:10.1109/PICC51425.2020.9362485. ISBN 978-1-7281-7590-4.

[:13-44] Singh, Aditi (9 May 2023). "A Survey of AI Text-to-Image and AI Text-to-Video Generators". 2023 4th International Conference on Artificial Intelligence, Robotics and Control (AIRC). IEEE. pp. 32–36. arXiv:2311.06329. doi:10.1109/AIRC57904.2023.10303174. ISBN 979-8-3503-4824-8.

[:14-45] Singh, Aditi (9 May 2023). "A Survey of AI Text-to-Image and AI Text-to-Video Generators". 2023 4th International Conference on Artificial Intelligence, Robotics and Control (AIRC). IEEE. pp. 32–36. arXiv:2311.06329. doi:10.1109/AIRC57904.2023.10303174. ISBN 979-8-3503-4824-8.

[46] ქურასბედიანი, ალექსი (9 June 2025). "AI-Generated Photo Of Ukrainian Children In Military Uniforms Circulated Online | Mythdetector.com". Retrieved 16 June 2025.

[47] "Fake Ukraine ad urges kids to report relatives enjoying Russian music". euronews. 28 March 2025. Retrieved 16 June 2025.

[48] "Photos of Ukrainian children generated by artificial intelligence". behindthenews.ua. 26 June 2024. Retrieved 16 June 2025.

[49] "Fake Ukrainian TV advert urges children to report relatives listening to Russian music".

[50] "Deepfake video of Zelenskyy could be 'tip of the iceberg' in info war, experts warn". NPR. 16 March 2022. Retrieved 16 June 2025.

[51] "Ukraine war: Deepfake video of Zelenskyy telling Ukrainians to 'lay down arms' debunked". Sky News. Retrieved 16 June 2025.

[52] "'Kaur vs KORE': Sunny Leone is all set to play double role in an AI-driven feature film - Report". The Times of India. 24 September 2025. ISSN 0971-8257. Retrieved 26 October 2025.

[53] Farzeen, Sana (24 September 2025). "Exclusive: Sunny Leone sets trend with AI-driven feature film 'Kaur vs KORE'". India Today. Retrieved 26 October 2025.

[54] "Sunny Leone takes on dual role as human and AI avatar in 'Kaur vs KORE'". @mathrubhumi. 24 September 2025. Retrieved 26 October 2025.

[55] Sharma, Manoj (26 October 2025). "From Mahabharat to Hanuman: Collective Artists Network's Vijay Subramaniam on how AI is shaping India's entertainment future". Fortune India. Retrieved 26 October 2025.

[56] PTI (20 August 2025). "Anurag Kashyap slams producer Vijay Subramaniam over AI generated film 'Chiranjeevi Hanuman'". The Hindu. ISSN 0971-751X. Retrieved 26 October 2025.

[57] "AI-generated film Chiranjeevi Hanuman announced, Vikramaditya Motwane not pleased about it: 'So it begins'". Hindustan Times. 19 August 2025. Retrieved 26 October 2025.

[58] "Mahabharat: Ek Dharmayudh OTT release: When and where to watch India's first AI-powered mythological series". The Economic Times. 23 October 2025. ISSN 0013-0389. Retrieved 26 October 2025.

[59] "'With AI Mahabharat, viewers will get to experience the same story in a new way'". The Times of India. 18 October 2025. ISSN 0971-8257. Retrieved 26 October 2025.

[60] Ramachandran, Naman (10 October 2025). "AI-Powered 'Mahabharat' Trailer Bows Ahead of India Debut (EXCLUSIVE)". Variety. Retrieved 26 October 2025.

[:3-61] ^ ^a ^b ^c ^d ^e ^f "Top AI Video Generation Models of 2024". Deepgram. Retrieved 30 August 2024.

[:6-62] "Vexub – Text-to-video AI generator". Vexub. Retrieved 25 June 2025.

[:0-63] "Runway Research | Gen-2: Generate novel videos with text, images or video clips". runwayml.com. Retrieved 30 August 2024.

[:1-64] Sharma, Shubham (26 December 2023). "Pika Labs' text-to-video AI platform opens to all: Here's how to use it". VentureBeat. Retrieved 30 August 2024.

[:2-65] "Runway Research | Introducing Gen-3 Alpha: A New Frontier for Video Generation". runwayml.com. Retrieved 30 August 2024.

[googlev1-66] "Meet Flow, AI-powered filmmaking with Veo 3". blogs.google.com. 20 May 2025. Retrieved 6 July 2025.

[googlev2-67] "Google Veo DeepMind". google.com. Retrieved 6 July 2025.

[:4-68] "Sora | OpenAI". openai.com. Retrieved 30 August 2024.

[69] Nuñez, Michael (31 March 2025). "Runway Gen-4 solves AI video's biggest problem: character consistency across scenes". VentureBeat. Archived from the original on 21 July 2025. Retrieved 17 October 2025.

[70] "Runway's New Gen-4 AI System Promises the Most Predictable Media Creation Yet | No Film School". nofilmschool.com. Retrieved 17 October 2025.

[71] Wiggers, Kyle (31 March 2025). "Runway releases an impressive new video-generating AI model". TechCrunch. Retrieved 17 October 2025.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

[63]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

Model Iteration	Release Date	Key Capability Advances	Max Duration	Resolution
Runway Gen-2	Early 2024	Image-conditioned generation, improved prompt fidelity	4-16s	720p
Runway Gen-3 Alpha	June 2024	Multimodal (text/image/video) inputs, enhanced temporal modeling	10s+	720p+
Sora (v1)	Feb 2024	Spatiotemporal transformers, complex scene causality	60s	1080p
Sora 2	Sep 2025	Physics-aware simulation, advanced controls	60s+	1080p
Kling 1.x	Mid-2024	Motion brushes, basic 3D awareness	10s	1080p
Kling 2.0/2.5	2025	Cinematic aesthetics, extended sequencing	2min+	1080p

History

Media collections

Text-to-video model

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

Text-to-video model

Models

Architecture and training

Limitations

Ethics

Impacts and applications

Movies

Series

Comparison of models

See also

References