System Case StudyProduction Systems & Provenance
Building a Market Strength Dashboard: The Full Story
From problem to production: temporal correctness, regime detection, fund-flow pipelines, and the operational failures that shaped the system.
Decision-grade gap
Why the dashboard exists and what ordinary market surfaces failed to make visible.
I built the Market Strength Dashboard because I was tired of checking six different sources every morning to answer one question: is the macro environment getting stronger or weaker? FRED has the data. Tiingo has the data. ETF flow providers have the data. But nobody combines them into a single surface that respects temporal boundaries and tells you how trustworthy the signal is. The gap was not a lack of charts. It was the absence of a decision-grade view.
Decision-grade means more than pretty visualizations. It means every number on the screen was computed using only information that was available at that point in time. It means stale inputs are flagged, not hidden. It means the system tells you when a data source is degraded rather than silently interpolating. Most financial dashboards do not make these promises. They show you the latest data and let you assume it was always that clean. The distinction matters if you are making real allocation decisions based on what you see.
System surface
How the first screen organizes macro risk, fragility, stress, breadth, rates, and flows.
When you open the dashboard, the first screen shows an action-matrix summary: recession risk, market fragility, systemic stress, economic-health breadth, rates and risk context, and a capital-flows panel with top inflows, outflows, and sector rotation. It answers four questions at a glance: should I be leaning risk-on or defensive, is the macro backdrop improving or deteriorating, are ETF flows confirming that view, and how stale is the data behind the answer.
Source provenance
Why provider effective dates, processing time, and build timestamps have to stay separate.
The system pulls from FRED and ALFRED for macroeconomic series, Tiingo and yfinance for market data, and Polygon for ETF fund flows. Each source has its own update frequency, revision history, and publication lag. FRED series revise silently. ETF flow providers report with varying delays. Market data has its own calendar gaps around holidays and quarter-end. Treating all of these as if they update on the same schedule is the first mistake most dashboards make. Separating provider effective date, processed date, and local build timestamp into distinct fields was one of the earliest and most important architectural decisions.
The backend is Python with Flask and Gunicorn. The data pipeline uses pandas and NumPy for transformation, DuckDB and PyArrow for analytical storage, and Parquet for intermediate artifacts. Modeling includes scikit-learn and XGBoost with SHAP for feature importance, hmmlearn for Hidden Markov Model regime inference, and custom code for nowcasts, historical analogs, revision risk scoring, absorption ratio, and transfer entropy. The entire pipeline rebuilds daily at 5:30 PM ET on Railway through one authoritative endpoint. There is exactly one path for data to enter the system, which makes debugging provenance issues tractable.
Temporal correctness
Where lookahead bias entered the pipeline and how the system was rebuilt around live knowledge.
The hardest engineering problem was not building the pipeline. It was making the pipeline honest. Temporal correctness means the system never uses information it would not have had at the time. This sounds simple until you try to enforce it across dozens of data series with different publication schedules. Label embargoes in recession backtests were initially too short, letting the model see outcomes it would not have known about in real time. Nowcasts were being fit on the full history instead of walk-forward. HMM regime probabilities were computed with the Viterbi algorithm, which uses the entire sequence, instead of forward-only filtered probabilities suitable for live interpretation. Each of these was a form of lookahead bias. Each made the backtests look better than the live system would actually perform. Fixing them required rebuilding the pipeline around the distinction between what the system knows now and what it knew then.
Operational failures
What partial provider outages and mounted storage taught about surfacing trust state.
The failures that taught me the most were not the obvious ones. Partial provider failures were silently interpreted as neutral fund flows. When a data source returned incomplete data instead of an error, the pipeline treated missing values as zero movement rather than flagging the gap. The fix required explicit completeness checks: if a provider returns fewer records than expected, the pipeline marks that source as degraded. The most important operational failure was a Railway mounted-storage incident. The platform's storage volume served older archive data than the repo image contained because Railway mounts persist across deploys and the volume had not been invalidated after a schema change. The dashboard rendered without errors but displayed stale signals. Nothing in the UI indicated anything was wrong. The fix was adding provenance checks that verify archive vintage against the current build timestamp at startup.
Production lessons
What is proven now and what the system would formalize earlier next time.
If I were starting over, three things would change. First, I would formalize mounted-storage provenance checks from day one rather than discovering the need through a production incident. Second, I would separate the UI more cleanly into confirmed state, forecast, and experimental surfaces so that users can immediately tell which numbers are based on released data and which are model projections. Third, I would tighten the quarter-end market-calendar logic behind the fund-flow heuristics. Quarter-end rebalancing creates flow patterns that look like real rotation signals but are mechanical. The current system handles this, but the logic was added reactively rather than designed in from the start.
The dashboard is deployed on Railway and in active daily use. The macro surface and fund-flows panel are production-verified. The system processes 31 macro series, 19 market tickers, 26 ETF flow tickers, and 15 fund-flow composites on a daily rebuild schedule. A 10-year fund-flow archive holds 56,720 records. An operations page surfaces build status, data freshness, and source health directly rather than hiding pipeline state. A newer bearish-earnings module has a narrow operator-only watchlist API proven in production, with a broader version in planning.
Building this system taught me that the interesting engineering problems in data work are rarely about the data itself. They are about time, provenance, and trust. When does a number become available? Has it been revised since you last saw it? Can you prove that your model did not see the future? These questions are not glamorous, but getting them wrong makes everything downstream unreliable. The dashboard works not because the models are clever but because the pipeline is honest about what it knows and when it knew it.
Argument index
Concepts
Decision-grade view
A dashboard surface that reports signal strength together with freshness and trust state.
Temporal correctness
Ensuring every computation uses only information available at the time being modeled.
Source provenance
Keeping provider effective dates, processing dates, and build timestamps distinct.
Operational trust
Surfacing source degradation and storage-version mismatches instead of hiding them.
Evidence
Pipeline composition
The body names FRED, ALFRED, Tiingo, yfinance, Polygon, DuckDB, Parquet, and model libraries used in production.
Lookahead repairs
The case study identifies label embargoes, walk-forward nowcasts, and filtered HMM probabilities as corrected failure modes.
Production footprint
The article reports daily rebuild timing, source-health surfaces, and concrete archive counts from the running system.
The continuation path points readers to the project surface for broader portfolio context.