Gate Square “Creator Certification Incentive Program” — Recruiting Outstanding Creators!
Join now, share quality content, and compete for over $10,000 in monthly rewards.
How to Apply:
1️⃣ Open the App → Tap [Square] at the bottom → Click your [avatar] in the top right.
2️⃣ Tap [Get Certified], submit your application, and wait for approval.
Apply Now: https://www.gate.com/questionnaire/7159
Token rewards, exclusive Gate merch, and traffic exposure await you!
Details: https://www.gate.com/announcements/article/47889
Insight into Returns: How to Build Price Prediction Models Using a Systematic Approach
This article systematically analyzes the entire process of constructing predictive signals in quantitative investing. Facing an environment with extremely low information noise ratios in financial markets, this paper reveals a systematic approach to building effective predictive signals through deconstructing four core components: data preparation, feature engineering, machine learning modeling, and portfolio allocation. The content originates from an article by sysls, organized, compiled, and written by Foresight News.
(Background recap: Can we track the next insider trader on Polymarket? Absolutely, and the threshold is not high)
(Additional background: A comprehensive guide to trading concepts (IX): How many times leverage should be used? All-in or incremental positions?)
Table of Contents
How to construct effective predictive signals in an environment with extremely low information noise ratios in financial markets? This article provides a systematic answer.
By deconstructing the four core stages of a quantitative strategy—data preparation, feature engineering, machine learning modeling, and portfolio allocation—the article reveals that most strategy failures are often due to issues at the data and feature levels, rather than the models themselves. It emphasizes key techniques for handling high-dimensional financial features, suitable scenarios for different model families, and a crucial insight: improving signal purity by “deconstructing sources of returns and predicting specific signals.” This serves as a reference for quantitative researchers and investors aiming to establish robust and interpretable predictive systems.
Introduction
In the field of systematic investing, a predictive signal refers to a mathematical model capable of forecasting future asset returns based on input feature data. The core architecture of many quantitative strategies essentially revolves around the generation, optimization, and asset allocation based on such signals, forming an automated process.
This process appears straightforward: data collection → feature processing → machine learning prediction → portfolio construction. However, financial prediction is a typical domain characterized by high noise and low signal-to-noise ratio. Daily volatility often reaches about 2%, while the true predictability per day is roughly around 1 basis point.
Therefore, most information within models is essentially market noise. How to build robust and effective predictive signals in such a harsh environment becomes a fundamental capability of systematic investing.
Core Process Framework
A complete machine learning system for return prediction usually follows a standardized four-stage process, with each stage tightly interconnected:
Stage One: Data Layer — The “Raw Material” of Strategies
Includes traditional data such as asset prices, trading volume, fundamental reports, as well as alternative data (e.g., satellite images, consumption trends). Data quality directly determines the upper limit of the strategy’s potential. Most strategy failures can be traced back to issues at the data source rather than the model itself.
Stage Two: Feature Layer — The “Refinery” of Information
Transforms raw data into structured features recognizable by models. This is a key step that condenses domain knowledge, for example:
The quality of feature construction often has a greater impact than model choice.
Stage Three: Prediction Layer — The “Engine” of Algorithms
Uses machine learning models to predict future returns based on features. The main challenge is balancing model complexity: capturing nonlinear patterns while avoiding overfitting to noise. Besides directly predicting returns, models can also target specific structured signals (e.g., event-driven returns) to obtain sources of alpha with low correlation.
Stage Four: Allocation Layer — The “Realizer” of Signals
Converts predicted values into actionable portfolio weights. Classic approaches include cross-sectional ranking, long-short pairs, etc. This stage must be closely coupled with transaction cost models and risk constraints.
The entire process is a chain dependency: a weakness in any link constrains the final performance. In practice, allocating more resources to data quality and feature engineering often yields higher returns.
Data Source Classification
Feature Engineering: The Art and Science Combined
Features are quantifiable attributes that can independently or jointly predict future returns. Their construction heavily depends on a deep understanding of market mechanisms. The academic and industry communities have established several classic factor systems, such as:
Key Techniques in Feature Processing
Model Selection Guide
After preparing features, the next step is choosing algorithms. There is no universally best model; each has its advantages suited to different scenarios.
Linear Models
Advantages: Highly interpretable, computationally efficient, good at preventing overfitting. Can incorporate nonlinearities via interaction terms.
Tree Ensemble Models
Random Forests and Gradient Boosting Trees (XGBoost, LightGBM) excel at capturing nonlinear relationships and interactions automatically.
When complex interactions and nonlinearities are significant, these models are preferred. They are more computationally intensive but modern interpretability tools have improved their transparency.
Neural Networks
Neural networks offer powerful representation capabilities, capable of modeling highly complex patterns. However, they require large data volumes, are sensitive to hyperparameters, and tend to overfit noise in low signal-to-noise environments. Use only when data is abundant and the team has deep tuning expertise.
Core Modeling Recommendations
Designing Prediction Targets: The Art
Traditional approach predicts asset returns directly, but returns are a mixture of multiple factor signals, making prediction difficult and noisy. A better approach is to decompose return sources and model specific dominant logic:
For example, stock price reactions after earnings revisions are mainly driven by the revision event itself. Predicting the “revision magnitude” or “event-period return” directly can avoid unrelated noise. Flexible design of prediction targets is a key way to improve signal purity.
From Signal to Portfolio Implementation
Predictions must be monetized into actual holdings:
Building a Robust System: Key Principles
Conclusion
Predictive signals are the cornerstone of systematic investing. Their effective construction relies on a systematic grasp of the entire chain—data, features, models, and allocation.
In the low signal-to-noise battlefield of financial data, simple models combined with rigorous out-of-sample validation often outperform overly complex black-box systems. Always start with concise, interpretable frameworks, and only increase complexity gradually when necessary.