SNCF DelayFlow — TGV Delay Analytics Pipeline
End-to-end ETL pipeline processing TGV punctuality data (SNCF Open Data) with PySpark. Includes distributed data transformation with Spark SQL, structured Parquet storage, a Random Forest predictive model (Scikit-learn) for average delay forecasting, and an interactive Streamlit dashboard diagnosing root causes across infrastructure, traffic, and rolling stock dimensions.