Web Libraries
Importing Data
Reading a CSV File
Importing a csv from a web URL can be done with the UrlLib package as follows
from urllib.request import urlretrieve
urlretrieve('http://onlinefilepath', 'local_file_path.csv')
Reading a JSON file
import json
with open("file_path.json") as json_file:
json_data = json.load(json_file)
json_data
allowing us to loop over or manipulate the data retrieved.
Retrieving from APIs
import requests
r = requests.get('http://url')
json_data = r.json()
Scraping the web
Using the 'requests' package
import requests
raw = requests.get('http://url')
text = raw.text
We can scrap any webpage with one line of code using the requests package and then simply addressing the text
attribute of any raw result produced by the requests.get()
function would give us the actual HTML result of the page.
Beautiful Soup
We will use the text
object created in the previous code segment and then beautify it with the library to improve navigability.
from bs4 import BeautifulSoup
soup = BeautifulSoup(text)
Now that we have extracted the html data into a soup
object, we can use atrributes like title
and methods like get_text()
, find_all()
etc. on it to extract more meaningful information.
Retrieving from APIs
import requests
r = requests.get('http://url')
json_data = r.json()