09.10.2023 Views

Advanced Data Analytics Using Python_ With Machine Learning, Deep Learning and NLP Examples ( 2023)

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 7

Analytics at Scale

job.setOutputValueClass(Text.class);

FileOutputFormat.setOutputPath(job, new

Path(args[2]));

System.out.println(job.waitForCompletion(true));

}

}

Spark

After Hadoop, Spark is the next and latest revolution in big data technology.

The major advantage of Spark is that it gives a unified interface to the entire

big data stack. Previously, if you needed a SQL- like interface for big data,

you would use Hive. If you needed real-time data processing, you would

use Storm. If you wanted to build a machine learning model, you would use

Mahout. Spark brings all these facilities under one umbrella. In addition, it

enables in-memory computation of big data, which makes the processing

very fast. Figure 7-5 describes all the components of Spark.

Figure 7-5. The components of Spark

166

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!