Web Libraries

Importing Data

Reading a CSV File

Importing a csv from a web URL can be done with the UrlLib package as follows

from urllib.request import urlretrieve

urlretrieve('http://onlinefilepath', 'local_file_path.csv')

Reading a JSON file

import json

with open("file_path.json") as json_file:
    json_data = json.load(json_file)

This would create a dictionary obejct json_data allowing us to loop over or manipulate the data retrieved.

Retrieving from APIs

import requests
r = requests.get('http://url')
json_data = r.json()

Scraping the web

Using the 'requests' package

import requests
raw = requests.get('http://url')
text = raw.text

We can scrap any webpage with one line of code using the requests package and then simply addressing the text attribute of any raw result produced by the requests.get() function would give us the actual HTML result of the page.

Beautiful Soup

We will use the text object created in the previous code segment and then beautify it with the library to improve navigability.

from bs4 import BeautifulSoup
soup = BeautifulSoup(text)

Now that we have extracted the html data into a soup object, we can use atrributes like title and methods like get_text(), find_all() etc. on it to extract more meaningful information.

Retrieving from APIs

import requests
r = requests.get('http://url')
json_data = r.json()