03.02.2023 Views

SSRN-id3104847

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

REQUISITES 13

TABLE 1.2

Common Pitfalls in Financial ML

# Category Pitfall Solution Chapter

(c) 2018 by Marcos Lopez de Prado. Reprinted with permission. All rights reserved. Full version available at https://goo.gl/w6gMdq

1 Epistemological The Sisyphus paradigm The meta-strategy

paradigm

2 Epistemological Research through Feature importance

backtesting

analysis

3 Data processing Chronological

The volume clock 2

sampling

4 Data processing Integer differentiation Fractional

differentiation

5

5 Classification Fixed-time horizon

labeling

6 Classification Learning side and size

simultaneously

7 Classification Weighting of non-IID

samples

The triple-barrier 3

method

Meta-labeling 3

Uniqueness weighting;

sequential

bootstrapping

8 Evaluation Cross-validation

leakage

Purging and

embargoing

9 Evaluation Walk-forward

Combinatorial purged

(historical) backtesting cross-validation

10 Evaluation Backtest overfitting Backtesting on

synthetic data; the

deflated Sharpe ratio

1

8

4

7,9

11,12

10–16

work so well when you run your algorithms on financial data, this book will help

you. Sometimes you may not understand the financial rationale behind some structures

(e.g., meta-labeling, the triple-barrier method, fracdiff), but bear with me: Once

you have managed an investment portfolio long enough, the rules of the game will

become clearer to you, along with the meaning of these chapters.

1.5 REQUISITES

Investment management is one of the most multi-disciplinary areas of research, and

this book reflects that fact. Understanding the various sections requires a practical

knowledge of ML, market microstructure, portfolio management, mathematical

finance, statistics, econometrics, linear algebra, convex optimization, discrete

math, signal processing, information theory, object-oriented programming, parallel

processing, and supercomputing.

Python has become the de facto standard language for ML, and I have to assume

that you are an experienced developer. You must be familiar with scikit-learn

(sklearn), pandas, numpy, scipy, multiprocessing, matplotlib and a few other libraries.

Electronic copy available at: https://ssrn.com/abstract=3104847

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!