Cloud-Scale Data Engineering: Real-Time Streaming Pipelines and Intelligent Infrastructure

  • Soumya Banerjee et al.
Keywords: Real-Time Streaming, Cloud-Native Infrastructure, Data Engineering, Apache Kafka, Apache Flink, Performance Optimization, AIOps, Survival Analysis, Autoscaling, Regression Analysis

Abstract

In the era of big data and continuous digital transformation, real-time data processing has become a strategic imperative for modern enterprises. This study investigates the performance, scalability, and resilience of cloud-scale data engineering architectures by comparing four real-time streaming pipelines: Kafka + Flink, Pulsar + Spark, Pub/Sub + Dataflow, and Kinesis + Lambda. Each configuration was deployed across leading cloud platforms using Kubernetes-based orchestration and evaluated under controlled load simulations ranging from 10,000 to 500,000 events per second. Key metrics such as latency, throughput, message loss rate, resource utilization, and system recovery time were analyzed using ANOVA, multivariate regression, and survival analysis. The results reveal that Pub/Sub + Dataflow delivers the best overall performance with the lowest latency, highest throughput, and superior fault tolerance, while Kinesis + Lambda trails due to higher latency and resource strain under load. Regression analysis identifies CPU usage and input load as dominant performance predictors. Kaplan-Meier survival curves further emphasize the operational resilience of each architecture under stress. These findings offer valuable insights into building scalable, intelligent data pipelines that leverage cloud-native features such as autoscaling, serverless processing, and predictive infrastructure management. The study contributes a validated framework for designing and optimizing real-time streaming systems tailored to dynamic enterprise environments.

Author Biography

Soumya Banerjee et al.

Soumya Banerjee 1, Karan Luniya 2, Prakash Wagle 3
1 Engineering Manager
2 Senior Software Engineer at DoorDash
3 Senior Software Engineer

Published
2025-01-09
Section
Regular Issue