09.10.2023 Views

Advanced Data Analytics Using Python_ With Machine Learning, Deep Learning and NLP Examples ( 2023)

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 2

ETL with Python (Structured Data)

out = out + "," + str(email.get_payload

()).replace("\n"," ").replace(","," ")

print out,"\n"

conf = open(sys.argv[1],'w')

conf.write("path=" + config["path"] + "\n")

conf.write("folder=" + config["folder"] + "\n")

for usr in users.keys():

conf.write("name="+ usr +",value=" + users[usr] + "\n")

conf.close()

Sample config file for above code.

path=/cygdrive/c/share/enron_mail_20110402/enron_mail_20110402/

maildir

folder=Inbox

name=storey-g,value=142

name=ybarbo-p,value=775

name=tycholiz-b,value=602

Topical Crawling

Topical crawlers are intelligent crawlers that retrieve information from

anywhere on the Web. They start with a URL and then find links present in

the pages under it; then they look at new URLs, bypassing the scalability

limitations of universal search engines. This is done by distributing

the crawling process across users, queries, and even client computers.

Crawlers can use the context available to infinitely loop through the links

with a goal of systematically locating a highly relevant, focused page.

Web searching is a complicated task. A large chunk of machine

learning work is being applied to find the similarity between pages, such as

the maximum number of URLs fetched or visited.

42

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!