SNCF DelayFlow — TGV Delay Analytics Pipeline
End-to-end ETL pipeline processing TGV punctuality data (SNCF Open Data) across 3 sources (~18,000 records) with PySpark. Includes Spark SQL transformation, Parquet storage, a Random Forest Regressor (Scikit-learn) for delay prediction, and an interactive Streamlit/Plotly dashboard diagnosing root causes across infrastructure, traffic, and rolling stock dimensions.