Advanced Data Analytics Using Python_ With Machine Learning, Deep Learning and NLP Examples ( 2023)
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Chapter 3
Supervised Learning Using Python
In investment banking, different indices are calculated as a weighted
average of instruments. Thus, when an index goes high, it is expected
that instruments in the index with a positive weight will also go high and
those with a negative weight will go low. The trader trades accordingly.
Generally, indices consist of a large number of instruments (more than
ten). In high-frequency algorithmic trading, it is tough to send so many
orders in a fraction of a second. Using principal component analysis,
traders realize the index as a smaller set of instruments to commence with
the trading. Singular value decomposition is a popular algorithm that
is used both in principal component analysis and in factor analysis. In
this chapter, I will discuss it in detail. Before that, I will cover the Pearson
correlation, which is simple to use. That’s why it is a popular method of
dimensionality reduction. Dimensionality reduction is also required for
categorical data. Suppose a retailer wants to know whether a city is an
important contributor to sales volume; this can be measured by using
mutual information, which will also be covered in this chapter.
Correlation Analysis
There are different measures of correlation. I will limit this discussion
to the Pearson correlation only. For two variables, x and y, the Pearson
correlation is as follows:
r =
å
i
å
i
( x - x) ( y - y)
i
i
( x - x) ( y - y)
i
å
2 2
i
i
The value of r will vary from -1 to +1. The formula clearly shows that
when x is greater than its average, then y is also greater, and therefore the r
value is bigger. In other words, if x increases, then y increases, and then r is
greater. So, if r is nearer to 1, it means that x and y are positively correlated.
Similarly, if r is nearer to -1, it means that x and y are negatively correlated.
50