SSRN-id3104847
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
REQUISITES 13
TABLE 1.2
Common Pitfalls in Financial ML
# Category Pitfall Solution Chapter
(c) 2018 by Marcos Lopez de Prado. Reprinted with permission. All rights reserved. Full version available at https://goo.gl/w6gMdq
1 Epistemological The Sisyphus paradigm The meta-strategy
paradigm
2 Epistemological Research through Feature importance
backtesting
analysis
3 Data processing Chronological
The volume clock 2
sampling
4 Data processing Integer differentiation Fractional
differentiation
5
5 Classification Fixed-time horizon
labeling
6 Classification Learning side and size
simultaneously
7 Classification Weighting of non-IID
samples
The triple-barrier 3
method
Meta-labeling 3
Uniqueness weighting;
sequential
bootstrapping
8 Evaluation Cross-validation
leakage
Purging and
embargoing
9 Evaluation Walk-forward
Combinatorial purged
(historical) backtesting cross-validation
10 Evaluation Backtest overfitting Backtesting on
synthetic data; the
deflated Sharpe ratio
1
8
4
7,9
11,12
10–16
work so well when you run your algorithms on financial data, this book will help
you. Sometimes you may not understand the financial rationale behind some structures
(e.g., meta-labeling, the triple-barrier method, fracdiff), but bear with me: Once
you have managed an investment portfolio long enough, the rules of the game will
become clearer to you, along with the meaning of these chapters.
1.5 REQUISITES
Investment management is one of the most multi-disciplinary areas of research, and
this book reflects that fact. Understanding the various sections requires a practical
knowledge of ML, market microstructure, portfolio management, mathematical
finance, statistics, econometrics, linear algebra, convex optimization, discrete
math, signal processing, information theory, object-oriented programming, parallel
processing, and supercomputing.
Python has become the de facto standard language for ML, and I have to assume
that you are an experienced developer. You must be familiar with scikit-learn
(sklearn), pandas, numpy, scipy, multiprocessing, matplotlib and a few other libraries.
Electronic copy available at: https://ssrn.com/abstract=3104847