Active · v2 in progressMar 2026 – ongoing

DSE Market Prediction

Forecasting next-day directional moves on the Dhaka Stock Exchange using XGBoost on engineered features — with honest, leakage-aware backtesting.

Python
XGBoost
Machine Learning
Financial Analytics
Data Science

01 — Problem

Why this project

Most published forecasting work on the DSE either uses leaky features (lookahead bias in technical indicators) or evaluates on a single train/test split, both of which inflate performance.

I wanted to see how much real, regime-robust predictive signal there is in publicly available daily data for the DSE Broad Index (DSEX) and a handful of large-caps — and how that signal degrades under walk-forward evaluation.

02 — Approach

How I tackled it

01
Pull daily OHLCV for DSEX and 10 large-caps from publicly archived end-of-day files. Align to business-day calendar.
02
Engineer features: lagged returns (1, 5, 10 days), realised volatility (5d, 20d), volume z-score, simple technical indicators (RSI, MACD), and sector dummies.
03
Frame the problem as binary classification: P(next-day return > 0). Avoids needing to calibrate continuous returns and is what most strategy code cares about anyway.
04
Walk-forward backtest with an expanding window — retrain every 60 trading days, evaluate on the next 60.
05
Optimise XGBoost hyperparameters via Bayesian search on the first training window only; held those fixed for fairness across regimes.

03 — Data sources

Where the data came from

Source	Via	Rows
DSEX daily OHLCV	DSE archive CSV	~3,200 rows
Large-cap OHLCV (10 tickers)	DSE archive CSV	~32,000 rows
Sector classifications	Hand-curated mapping	—

04 — Pipeline

End-to-end flow

01
Ingest
DSE EOD CSV files → pandas DataFrame
02
Calendar alignment
join on business-day index
03
Feature engineering
lags, vol, RSI, MACD, sector
04
Target
binary: next-day excess return > 0
05
Walk-forward split
60-day windows, expanding train
06
XGBoost training
Bayesian-tuned hyperparams (held fixed)
07
Out-of-sample metrics
directional accuracy, AUC, log-loss
08
Backtest
long/short on signal, transaction costs included

05 — Code

A key snippet

Walk-forward training loop (simplified)

snippet.pythonpython

import xgboost as xgb
from sklearn.metrics import roc_auc_score

WINDOW, STEP = 60, 60
results = []

for i in range(WINDOW, len(df), STEP):
    train = df.iloc[:i]
    test  = df.iloc[i:i + STEP]

    X_train, y_train = train[FEATURES], train["target"]
    X_test,  y_test  = test[FEATURES],  test["target"]

    model = xgb.XGBClassifier(
        n_estimators=400, max_depth=4, learning_rate=0.04,
        subsample=0.8, colsample_bytree=0.7,
        eval_metric="logloss", tree_method="hist",
    )
    model.fit(X_train, y_train)

    p = model.predict_proba(X_test)[:, 1]
    results.append({
        "start":     test.index[0],
        "auc":       roc_auc_score(y_test, p),
        "dir_acc":   ((p > 0.5) == y_test).mean(),
    })

06 — Results

What it shipped

Metric	Value
Out-of-sample AUC (median)	{{TODO: 0.5x}}
Directional accuracy	{{TODO: 5x%}}
Backtest Sharpe (incl. costs)	{{TODO: x.x}}
Walk-forward windows	{{TODO}}
Top feature by importance	{{TODO}}

Caveat: Performance is regime-dependent — the model does best in trending periods and clearly worse in choppy, low-volume stretches. The directional accuracy on a single test split was misleadingly high (~62%); the walk-forward median is the honest number.

07 — Lessons

What I learned

If you only run a single train/test split, you will fool yourself. Walk-forward is the minimum bar for time-series finance work.
Most of my 'gains' from feature engineering disappeared once I closed every avenue for lookahead bias (e.g. using same-day VWAP as a feature).
Transaction costs and slippage eat a surprising amount of edge on DSE — bid-ask spreads on smaller stocks are wider than I expected.
Tree models beat LSTMs handily here, because the dataset is small (~3,200 days) and the structure is mostly tabular.
Next iteration (v2): adding macro and sector features, regime-conditional models, and a public live dashboard.

08 — Links

References

Source on GitHub

All projects