09.10.2023 Views

Advanced Data Analytics Using Python_ With Machine Learning, Deep Learning and NLP Examples ( 2023)

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 3

Supervised Learning Using Python

In investment banking, different indices are calculated as a weighted

average of instruments. Thus, when an index goes high, it is expected

that instruments in the index with a positive weight will also go high and

those with a negative weight will go low. The trader trades accordingly.

Generally, indices consist of a large number of instruments (more than

ten). In high-frequency algorithmic trading, it is tough to send so many

orders in a fraction of a second. Using principal component analysis,

traders realize the index as a smaller set of instruments to commence with

the trading. Singular value decomposition is a popular algorithm that

is used both in principal component analysis and in factor analysis. In

this chapter, I will discuss it in detail. Before that, I will cover the Pearson

correlation, which is simple to use. That’s why it is a popular method of

dimensionality reduction. Dimensionality reduction is also required for

categorical data. Suppose a retailer wants to know whether a city is an

important contributor to sales volume; this can be measured by using

mutual information, which will also be covered in this chapter.

Correlation Analysis

There are different measures of correlation. I will limit this discussion

to the Pearson correlation only. For two variables, x and y, the Pearson

correlation is as follows:

r =

å

i

å

i

( x - x) ( y - y)

i

i

( x - x) ( y - y)

i

å

2 2

i

i

The value of r will vary from -1 to +1. The formula clearly shows that

when x is greater than its average, then y is also greater, and therefore the r

value is bigger. In other words, if x increases, then y increases, and then r is

greater. So, if r is nearer to 1, it means that x and y are positively correlated.

Similarly, if r is nearer to -1, it means that x and y are negatively correlated.

50

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!