Cloud-Scale Data Engineering: Real-Time Streaming Pipelines and Intelligent Infrastructure
Abstract
In the era of big data and continuous digital transformation, real-time data processing has become a strategic imperative for modern enterprises. This study investigates the performance, scalability, and resilience of cloud-scale data engineering architectures by comparing four real-time streaming pipelines: Kafka + Flink, Pulsar + Spark, Pub/Sub + Dataflow, and Kinesis + Lambda. Each configuration was deployed across leading cloud platforms using Kubernetes-based orchestration and evaluated under controlled load simulations ranging from 10,000 to 500,000 events per second. Key metrics such as latency, throughput, message loss rate, resource utilization, and system recovery time were analyzed using ANOVA, multivariate regression, and survival analysis. The results reveal that Pub/Sub + Dataflow delivers the best overall performance with the lowest latency, highest throughput, and superior fault tolerance, while Kinesis + Lambda trails due to higher latency and resource strain under load. Regression analysis identifies CPU usage and input load as dominant performance predictors. Kaplan-Meier survival curves further emphasize the operational resilience of each architecture under stress. These findings offer valuable insights into building scalable, intelligent data pipelines that leverage cloud-native features such as autoscaling, serverless processing, and predictive infrastructure management. The study contributes a validated framework for designing and optimizing real-time streaming systems tailored to dynamic enterprise environments.
Letters in High Energy Physics (LHEP) is an open access journal. The articles in LHEP are distributed according to the terms of the creative commons license CC-BY 4.0. Under the terms of this license, copyright is retained by the author while use, distribution and reproduction in any medium are permitted provided proper credit is given to original authors and sources.
Terms of Submission
By submitting an article for publication in LHEP, the submitting author asserts that:
1. The article presents original contributions by the author(s) which have not been published previously in a peer-reviewed medium and are not subject to copyright protection.
2. The co-authors of the article, if any, as well as any institution whose approval is required, agree to the publication of the article in LHEP.