This project explores 10 years of U.S. airline performance data (2009β2018), derived from ~3GB of flight operation records. The dataset was processed using Hadoop in Cloudera, queried with Hive, and visualized using Power BI dashboards.
-
Time Frame: 2009 β 2018
-
Data Size: ~3GB (raw CSV format)
-
Tools Used:
- Hadoop (Cloudera)
- Hive (SQL-like query execution)
- Power BI (Visualization)
-
Focus: Arrival/Departure delays, on-time performance, diversions, carrier efficiency, distance vs. air time
- On-Time Arrival Rate (%)
- Average Arrival & Departure Delay (mins)
- % of Flights Delayed
- Longest Delay (hrs)
- Average Air Time (hrs)
- Carrier Delay Performance
- Scheduled vs. Actual Flight Time
- Taxi-in and Taxi-out times
- Diversion trends by airport
Metric | Best Year | Worst Year |
---|---|---|
On-Time Rate | 2016 (75%) | 2014 (47%) |
Arrival Delay Avg | 2016 (21m) | 2017 (46 Minutes) |
Departure Delay Avg | 2016 (19m) | 2014 (46 Minutes) |
% Flights Delayed | 2016 (8%) | 2014 (38%) |
Longest Delay | 2017 (13.6h) | 2010 (9.9 Hours) |
Avg Air Time | 2018 (3.16h) | 2012 (1.21 Hours) |
Insight: 2016 stands out as the most efficient year. 2014 and 2017 were the most problematic for delays.
Dashboard:

Insights:
- Flights: 3,503 | 15% delayed
- Avg Delays: Arrival 27m, Departure 30m
- Top Delayed Carriers: YV, OH
- Diversion Hotspots: IAH, SLC, DEN
Recommendations:
- Optimize carrier YV's scheduling and maintenance.
- Increase gate availability at IAH, SLC.
- Train ground staff on delay mitigation.
- Adjust scheduling buffers during peak.
- Monitor taxi-in/out times at high-volume hubs.
Dashboard:

Insights:
- Flights: 9,576 | 16% delayed | Longest Delay: 9.9h
- Delayed carriers: MQ, US, DL
- Diversions from DEN, ATL, PHX
Recommendations:
- Improve carrier MQβs routing.
- Reevaluate ATL gate turnarounds.
- Predict peak delays using Hive ML models.
- Optimize crew resourcing for UA and US.
- Coordinate taxiing schedules at DEN.
Dashboard:

Insights:
- Flights: 2,383 | 28% delayed | Avg Delays: 35m arrival
- OO (62.57%) had the most flights with significant delays
Recommendations:
- Audit operations of OO and MQ.
- Reduce diversions at SFO and SEA.
- Improve weather-response turnaround.
- Introduce automated taxi routing.
- Enable dynamic gate reallocation.
Dashboard:

Insights:
- Flights: 1,006 | 17% delayed | Shortest average air time
- Primary carriers: OO (70%), MQ
Recommendations:
- Consolidate OO as primary operator.
- Reduce turnaround time at SLC and DEN.
- Maintain short-haul consistency.
- Integrate data-driven scheduling using Hive.
- Use taxi-in analytics to limit ground hold.
Dashboard:

Insights:
- Flights: 1,095 | 28% delayed | On-Time: 45%
- Key carriers: WN (85%), VX
Recommendations:
- Improve coordination between WN & VX.
- Reduce taxiing times at BWI, DEN.
- Optimize route planning from MCI.
- Review scheduling lags in short-hauls.
- Enhance delay notification systems.
Dashboard:

Insights:
- Flights: 1,006 | 38% delayed | Worst year overall
- Avg Delays: 41m arrival, 46m departure
Recommendations:
- Redesign flight windows for EV.
- Reinforce maintenance pre-checks.
- Add gate buffer in peak slots.
- Install gate availability prediction tools.
- Focus on weather-based diversion strategy.
Dashboard:

Insights:
- Flights: 1,009 | 24% delayed | Top carriers: OO, NK, MQ
Recommendations:
- Reschedule late-night flights.
- Automate diversion responses.
- Reduce LAX and SEA congestion.
- Align MQ flights with lower-delay corridors.
- Optimize HA ground operations.
Dashboard:

Insights:
- Flights: 1,022 | 8% delayed | Best performing year
- Carriers: DL (66%), AS
Recommendations:
- Use DLβs 2016 operations as baseline.
- Replicate AS's air time consistency.
- Promote taxi-in strategies across hubs.
- Enable real-time ETL feedback loops.
- Expand on high-performance route modeling.
Dashboard:

Insights:
- Flights: 1,011 | 22% delayed | Longest Delay: 13.58h
- Carriers: AA, B6, HA, NK
Recommendations:
- Audit 13h+ delays and implement escalation protocols.
- Reengineer AA and B6 flight planning.
- Improve taxi sequencing at BOS and JFK.
- Strengthen pre-boarding logistics.
- Apply AI to predict gate hold times.
Dashboard:

Insights:
- Flights: 1,939 | 13% delayed | Avg Air Time: 3.16 hrs
- Carriers: UA (89%), AS
Recommendations:
- Expand UA-AS scheduling sync.
- Reduce gate congestion at SFO, LAX.
- Add layover buffers for long-haul.
- Reinforce diversion handling protocol.
- Track long-route reliability KPIs.
Dashboard:

- Model operations after 2016βs success year.
- Focus delay interventions on MQ, EV, YV, NK.
- Reduce diversions from IAH, SFO, LAX using Hive traffic patterns.
- Implement Hive-based delay prediction systems.
- Use Power BI alerts for SLA breaches in real-time.
.
βββ hive_queries/ # Hive queries for processing
βββ dashboards/ # Power BI dashboards
βββ data_config/ # Hadoop ingestion configs
βββ dax_queries/ # DAX Queries Used in Project
βββ insights/ # Year-wise detailed markdowns
βββ README.md # Main documentation