09.10.2023 Views

Advanced Data Analytics Using Python_ With Machine Learning, Deep Learning and NLP Examples ( 2023)

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 2

ETL with Python (Structured Data)

start = istart,

).execute()

import json

with open('data1.json', 'w') as fp:

json.dump(res, fp)

# pprint.pprint(type(res))

# pprint.pprint(res)

def get_email_ph(link_text, pdf=None):

if pdf==True:

from textract import process

text = process(link_text)

else:

text = link_text

# print text

import re

email = []

ph = []

valid_ph = re.compile("[789][0-9]{9}$")

valid = re.compile("[A-Za-z]+[@]{1}[A-Za-z]+\.[a-z]+")

for token in re.split(r'[,\s]',text):

# for token in nltk.tokenize(text):

# print token

a = valid.match(token)

b = valid_ph.match(token)

if a != None:

print a.group()

email.append(a.group())

if b != None:

print b.group()

ph.append(b.group())

return email, ph

45

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!