PractiTest announced a new MCP (Model Context Protocol) capability that connects AI Models like ChatGPT and Claude directly to PractiTest’s project data. Teams can use real context to generate tests from requirements, suggest edge cases, analyze coverage gaps, and then create and link approved outputs back into PractiTest. “AI is only as reliable as the
AI-driven audio-to-video generation for dynamic content creation via stable diffusion and CNN-augmented transformers – Scientific Reports
References
-
Zhou, P. et al. A survey on generative AI and LLM for video generation, understanding, and streaming. ArXiv (Cornell University). https://doi.org/10.48550/arxiv.2404.16038 (2024).
Google Scholar
-
Kim, D., Joo, D. & Kim, J. TiVGAN: text to image to video generation with Step-by-Step evolutionary generator. IEEE Access. 8, 153113–153122 (2020).
Google Scholar
-
Vondrick, C., Pirsiavash, H. & Torralba, A. Generating Videos with Scene Dynamics. In Proceedings of the Advances in Neural Information Processing Systems, 613–621. (2016).
-
Singh, P. & Reibman, A. R. Task-aware image quality estimators for face detection. EURASIP J. Image Video Process. 2024 (1). https://doi.org/10.1186/s13640-024-00660-1 (2024).
-
Waseem, S. et al. Multiattention-based approach for deepfake face and expression swap detection and localization. J Image Video Proc. 14 (2023). (2023). https://doi.org/10.1186/s13640-023-00614-z
-
Bain, M., Nagrani, A., Varol, G. & Zisserman, A. Condensed Movies: Story-based Video Generation with Sparse Annotations. In ECCV. (2022).
-
Soomro, K., Zamir, A. R. & Shah, M. Ucf101: A dataset of 101 human actions classes from videos in the wild. ArXiv Preprint arXiv :12120402 (2012).
-
Xu, J., Mei, T. & Yao, T. and Yong Rui. Msr-vtt: A large video description dataset for bridging video and language. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5288–5296. (2016).
-
Yu, S. et al. Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks. arXiv preprint arXiv:2202.10571 (2022).
-
Mittal, G., Marwah, T. & Balasubramanian, V. N. Sync-draw: Automatic video generation using deep recurrent attentive architectures, in: Proceedings of the 25th ACM internationalconference on Multimedia, pp. 1096–1104. (2017).
-
Goodfellow, I. J. et al. Generative adversarial networks. ArXiv (Cornell University). https://doi.org/10.48550/arxiv.1406.2661 (2014).
Google Scholar
-
Villegas, R., Yang, J., Hong, S., Lin, X. & Lee, H. June. Decomposing motion and content for natural video sequence prediction. (2017). https://arxiv.org/abs/1706.08033
-
Höppe, T., Mehrjou, A., Bauer, S., Nielsen, D. & Dittadi, A. Diffusion models for video prediction and infilling. arXiv.org. (2022)., June 15 https://arxiv.org/abs/2206.07696
-
Chen, B., Wang, W., Wang, J. & Chen, X. Video Imagination from a Single Image with Transformation Generation. ArXiv (Cornell University). (2017). https://doi.org/10.48550/arxiv.1706.04124
-
Tulyakov, S., Liu, M. Y., Yang, X. & Kautz, J. Mocogan: Decomposing motion and content for video generation.arXiv:1707.04993. (2017).
-
Yan, W., Zhang, Y., Abbeel, P. & Srinivas, A. VideoGPT: Video Generation using VQ-VAE and Transformers. arXiv preprint arXiv:2104.10157. (2021).
-
Berg, T. L., Berg, A. C. & Shih, J. Automatic Attribute Discovery and Characterization from Noisy Web Data (In ECCV, 2010).
-
Huang, J. et al. Large language models can Self-Improve. arXiv.org. (2022)., October 20 https://arxiv.org/abs/2210.11610
-
Mittal, A., Wang, Z. & Divakaran, A. Sync-Draw: Synchronizing sketches with audio using LSTMs. IEEE International Conference on Multimedia and Expo (ICME). (2017).
-
Fan, L., Chen, Y. & Cheng, Y. Federated learning for mitigating bias in generative models. International Conference on Artificial Intelligence and Statistics (AISTATS). (2024).
-
Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-Resolution Image Synthesis with Latent Diffusion Models. arXiv.org. (2021)., December 20 https://arxiv.org/abs/2112.10752
-
Satapathy, S. K. & Parmar, D. Video Generation by Summarizing the Generated Transcript, 2023 3rd Asian Conference on Innovation in Technology (ASIANCON), Ravet IN, India, pp. 1–5, (2023). https://doi.org/10.1109/ASIANCON58793.2023.10270304
-
Shankar, M. G. & Surendran, D. An effective video captioning based on Language description using a novel graylag deep Kookaburra reinforcement learning. J. Image Video Proc. 2025 (1). https://doi.org/10.1186/s13640-024-00662-z (2025).
-
Saito, M., Matsumoto, E. & Saito, S. Temporal generative adversarial nets with singular value clipping, in Proc. IEEE Int. Conf. Comput. Vis., Oct. p. 28302839. (2017).
-
Abu Sufian AI-Generated videos and deepfakes: A technical primer. TechRxiv August. 12 https://doi.org/10.36227/techrxiv.172348990.01007128/v1 (2024).
-
Mansimov, E. Jr., Parisotto, E., Ba, J. L. & Salakhutdinov, R. (2016). & Department of Computer Science, University of Toronto. Generating images from captions with attention. In ICLR 2016.
-
Pan, Y., Mei, T., Yao, T., Li, H. & Rui, Y. Jointly modelling embedding and translation to bridge video and text. Conference on Computer Vision and Pattern Recognition (CVPR) (2016).
-
Zhu, X. et al. RMER-DT: robust multimodal emotion recognition in conversational contexts based on diffusion and Transformers. Inform. Fusion. 123, 103268. https://doi.org/10.1016/j.inffus.2025.103268 (2025).
Google Scholar
-
Esser, P., Chiu, J., Atighehchian, P., Granskog, J. & Germanidis, A. Structure and Content-Guided Video Synthesis with Diffusion Models. arXiv.org. (2023)., February 6 https://arxiv.org/abs/2302.03011
-
Wang, R. et al. RAFT: robust adversarial fusion transformer for multimodal sentiment analysis. Array 100445. https://doi.org/10.1016/j.array.2025.100445 (2025).
-
Wang, R. et al. CIME: contextual Interaction-Based multimodal emotion analysis with enhanced semantic information. IEEE Trans. Comput. Social Syst. 1–11. https://doi.org/10.1109/tcss.2025.3572495 (2025).
-
Wang, R. et al. Contrastive-Based removal of negative information in multimodal emotion analysis. Cogn. Comput. 17 (3). https://doi.org/10.1007/s12559-025-10463-9 (2025).
-
Huang, Y., Zhu, X., Wang, R., Xie, Y. & Fong, S. A dynamic Global–Local Spatiotemporal graph framework for Multi-City PM2.5 Long-Term forecasting. Remote Sens. 17 (16), 2750. https://doi.org/10.3390/rs17162750 (2025).
Google Scholar
-
Wang, J. et al. Knowledge generation and distillation for road segmentation in intelligent transportation systems. IEEE Trans. Intell. Transp. Syst. 1–13. https://doi.org/10.1109/tits.2025.3577794 (2025).
-
Ye, Y. et al. Advancing federated domain generalization in ophthalmology: vision enhancement and consistency assurance for multicenter fundus image segmentation. Pattern Recogn. 111993. https://doi.org/10.1016/j.patcog.2025.111993 (2025).
-
Gao, M. et al. Towards trustworthy image super-resolution via symmetrical and recursive artificial neural network. Image and Vision Computing, 105519. (2025). https://doi.org/10.1016/j.imavis.2025.105519
-
Zhu, X. et al. A client-server based recognition system: Non-contact single/multiple emotional and behavioral state assessment methods. Comput. Methods Programs Biomed. 260, 108564. https://doi.org/10.1016/j.cmpb.2024.108564 (2024).
Google Scholar
-
Guo, S., Li, Q., Gao, M., Zhu, X. & Rida, I. Generalizable deepfake detection via Spatial kernel selection and halo attention network. Image Vis. Comput. 105582. https://doi.org/10.1016/j.imavis.2025.105582 (2025).
-
Song, W. et al. Deepfake detection via feature refinement and enhancement network. Image Vis. Comput. 105663. https://doi.org/10.1016/j.imavis.2025.105663 (2025).
-
Salimans, T. et al. Improved techniques for training GANs, in Proc. Adv. Neural Inf. Process. Syst., p. 22342242. (2016).
-
Panayotov, V., Chen, G., Povey, D. & Khudanpur, S. Librispeech: An ASR corpus based on public domain audio books, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia, 2015, pp. 5206–5210, South Brisbane, QLD, Australia, 2015, pp. 5206–5210, (2015). https://doi.org/10.1109/ICASSP.2015.7178964
-
Balaji, Y. et al. eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers. arXiv.org. https://arxiv.org/abs/2211.01324 (2022), November 2.
-
Lin, T. Y. et al. Microsoft COCO: common objects in context. In Computer Vision – ECCV 2014. ECCV 2014 Vol. 8693 (eds Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. et al.) (Springer, 2014). https://doi.org/10.1007/978-3-319-10602-1_48.
Google Scholar
-
Blattmann, A. et al. Stable video diffusion: scaling latent video diffusion models to large datasets. ArXiv Preprint (2023). arXiv:2311.15127.
-
Nyame, L. & Staphord, B. Generative Artificial Intelligence Trend on Video Generation Preprints. (2024). https://doi.org/10.20944/preprints202409.0195.v1
-
Wu, J. et al. Tune-A-Video: One-Shot Tuning of Text-to-Video Diffusion Models. (2023). In CVPR.
-
Zhang, C., Zhang, C., Zhang, M. & Kweon, I. S. Text-to-image diffusion models in generative ai: A survey, arXiv preprint arXiv:2303.07909 (2023).
-
Fan, F., Luo, C., Gao, W. & Zhan, J. AIGCBench: Comprehensive evaluation of Image-to-Video content generated by AI. arXiv.org. (2024)., January 3 https://arxiv.org/abs/2401.01651
-
Weng, W. et al. Art-v: Autoregressive text-to-video generation with diffusion models, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7395–7405. (2024).
-
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. arXiv (Cornell University). (2020). https://doi.org/10.48550/arxiv.2006.11239
-
Blattmann, A. et al. Align your latents: High-Resolution video synthesis with latent diffusion models. ArXiv (Cornell University). https://doi.org/10.48550/arxiv.2304.08818 (2023).
Google Scholar
-
Ma, X. et al. Latte: Latent diffusion transformer for video generation. arXiv.org. (2024)., January 5 https://arxiv.org/abs/2401.03048v1
-
Li, C. et al. A survey on Long video Generation: Challenges, methods, and Prospects. arXiv (Cornell University). (2024). https://doi.org/10.48550/arxiv.2403.16407
-
Unterthiner, T., Nessler, B., Heigold, G., Aichbauer, M. & Hochreiter, S. Towards Accurate Generative Models of Video: A New Metric & Challenges. arXiv preprint arXiv:1812.01717. (2018).
-
Hessel, M. et al. AViTAR: Adversarial Video-to-Audio Retrieval. arXiv preprint arXiv:2107.06818. (2021).
-
Liu, Y. et al. EvalCrafter: Benchmarking and Evaluating Large Video Generation Models. arXiv.org. (2023)., October 17 https://arxiv.org/abs/2310.11440
-
Patterson, D. et al. Carbon emissions and large neural network training. arXiv.org. (2021)., April 21 https://arxiv.org/abs/2104.10350
Download references
