Evangelos

Evangelos

Building a Simple Momentum Portfolio in Python: From Market Data to Backtesting

PythonpandasFinancial DataBacktesting

GitHub Repository

The full implementation for this project is available here: https://github.com/evangelos-com/momentum-portfolio-research

Introduction

This post walks through how a simple systematic investment idea can be implemented as a modular Python data pipeline.

The strategy here is a simple cross-sectional momentum approach (see Jegadeesh and Titman (1993)). The emphasis is on system design and implementation rather than the signal itself.

Specifically, the goal is to show how a clean engineering design can support:

  • time-series data processing
  • cross-sectional ranking
  • portfolio construction logic
  • backtesting workflows

System Overview

The system is structured into three main components:

  1. Signal generation
  2. Portfolio construction
  3. Backtesting engine

Each component is intentionally decoupled to improve testability and maintainability.

Data Model

The system operates on a panel dataset (a table where each row represents a specific asset at a specific point in time) with the following structure:

  • date
  • ticker
  • price
  • return

This allows both:

  • time-series operations (analysis of each ticker over time)
  • cross-sectional operations (comparison of all tickers at a given date)

Momentum Signal

The momentum signal is computed as a rolling percentage change over a user-defined window:

df["momentum"] = df.groupby("ticker")["price"].pct_change(window)

This produces a time-series feature representing recent price trends per asset.

A key design choice is that missing values are not removed at this stage. Instead, they are preserved for downstream handling.

Cross-Sectional Ranking

At each date, assets are ranked by their momentum score:

df["rank"] = (
    df.groupby("date")["momentum"]
    .rank(ascending=False, method="first", na_option="bottom")
)

df["selected"] = df["rank"] <= top_n

A snapshot of the ranking output for a 2020-05-27:

Ticker Price Momentum Rank Selected
TSLA 54.68 0.597 1 True
NVDA 8.49 0.371 2 True
AMZN 120.52 0.284 3 True
MSFT 173.18 0.100 4 False
HD 213.95 0.088 5 False
META 227.36 0.033 6 False
UNH 274.52 0.014 7 False
AAPL 77.07 0.014 8 False
GOOG 70.31 -0.023 9 False
JPM 86.53 -0.254 10 False

Important implementation details:

  • NaN momentum values are ranked last rather than dropped
  • selection is expressed as a boolean flag rather than filtering rows

NaN values occur where there is insufficient lookback history and are treated as a "pre-signal state", not missing data. They are retained, ranked last, and excluded from selection, allowing the dataset to remain structurally consistent.

As a result, the portfolio only becomes active once valid signals are available, which explains the flat section at the start of the equity curve.

Portfolio Construction

The portfolio logic is implemented as a discrete rebalancing system.

At each rebalance date:

  • selected assets receive equal weights
  • weights remain constant until the next rebalance date

Step 1: Initial allocation

Take the selected stocks at this rebalance date and split the portfolio equally across them.

weight = 1.0 / len(selected)
df.loc[mask, "weight"] = weight

Step 2: Weight persistence

Weights are carried forward until the next rebalance date.

The portfolio is only updated at discrete points in time and remains unchanged in between. In practice, this is equivalent to setting weights on rebalance dates and forward-filling them across subsequent days.

From an engineering perspective, this can be viewed as:

  • event-driven state updates (rebalance dates)
  • followed by state propagation over time

Backtesting Engine

Portfolio returns are computed by aggregating weighted asset returns.

df["weighted_return"] = df["weight"] * df["return"]

portfolio_returns = df.groupby("date")["weighted_return"].sum()

This produces a daily time series of portfolio performance.

Performance Aggregation and Benchmark Comparison

Returns are compounded to produce a cumulative performance curve:

portfolio_cum = (1 + portfolio_returns).cumprod()

The same transformation is applied to SPY (which tracks the overall market) to compare the strategy against simply buying and holding.

This ensures both the strategy and benchmark are evaluated using the same return calculation.

Results

Below is the resulting equity curve from the backtest:

Figure 1

GitHub repository: https://github.com/evangelos-com/momentum-portfolio-research

Observations

  • The system implements a full data pipeline from raw prices to portfolio simulation
  • The modular design allows each stage to be independently tested and modified
  • Performance depends heavily on:
    • the chosen asset set
    • signal window
    • rebalance frequency

Engineering Notes and Key Takeaways

The system is structured around three decoupled components with clear responsibilities. The SignalEngine handles signal generation and ranking, the PortfolioBuilder handles allocation logic, and the BacktestEngine focuses on performance computation. This makes each part easy to understand and modify independently.

Each step transforms a DataFrame into another DataFrame without maintaining internal state. This improves reproducibility and makes debugging more straightforward.

Portfolio weights are carried forward between rebalance dates so that each day has an explicit view of the portfolio composition. This makes it easy to inspect how the portfolio evolves over time.

More broadly, this project highlights that financial workflows are essentially data engineering problems, which is fun for me! At their core, they involve transformations over time-series datasets, grouping operations across entities, and state propagation through time.

Running the full backtest shows that the pipeline is lightweight in practice. The process uses around 350 MB of RAM, indicating that Pandas is sufficient for this research-scale workload under the current design.

Conclusion

This project demonstrates how a simple momentum-based strategy can be implemented as a clean, modular Python system.

While the strategy itself is intentionally basic, the architecture reflects patterns commonly used in production financial data systems:

  • ETL-style preprocessing
  • signal generation and ranking pipelines
  • stateful portfolio construction
  • time-series aggregation

The full code is available here: https://github.com/evangelos-com/momentum-portfolio-research




Extra Note

This mini blog is built with Next.js, TypeScript, and React.