You're not training to be a software engineer. The way I teach you is different from how I teach computer science students.
Still, my technical background might bias me.
You might find the material challenging. Or possibly easy.
Note my biases and calibrate your expectations to what you need!
When I code, my language of choice is javascript, because it can be used very simply without any tooling, plus it is native to working with web data.
Other languages like Python or R typically need more tooling setup, and the use case is different (data science, machine learning).
Besides coding, we can also achieve outcomes with low code / no code tools. Of course it will often not be as powerful or flexible.
For an editor I will often be using notepad++. I am old-school and like simple editors.
But I can suggest Visual Studio Code(by Microsoft) because it works well with things like Github, which Microsoft has bought.
As long as you have any text editor (atom, emacs, vi, etc.), it is sufficient.
Do download one if you do not have an editor installed.
Please download a free version of ParseHub, a freemium scraping tool.
It takes time to download and we'll be using this later.
CSV (comma separated values) is a text-based file format.
As compared to binary-file format XSL (Excel) files, CSV is just text and can be opened up with any text editor.
Example of CSV file
JSON (Javascript Object Notation) is also a text-based file format.
It is structured data in javascript notation. Can encapsulate hierarchical data in a tree-like structure rather than just a flat table.
Example of JSON file
JSON comprises of key-value pairs, and is a combination of objects and arrays to describe a data structure.
Use online JSON validators to see double check structure.
APIs are application programming interfaces. They are basically how software "talk" to each other.
A large percentange of the web use JSON as the data format for exchanging information.
For example, data.gov.sg API on hourly PM2.5 readings (Schema)
Some APIs are more complicated and require you to send headers or authenticate.
There are tons and tons of tools out there that can do this that we'll discuss later.
One of the easiest method to test an API is using curl, a command line tool in Windows and Linux.
Let's try this on SG's 2-hr weather forecast
You can of course access the API using code, or even cloud-based tools like Swagger and Postbox.
A very, very simple basic HTML template.
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Hello World</title>
<style></style>
</head>
<body>
<h1>Hello World!</h1>
<script></script>
</body>
</html>
Hopefully you have seen this before.
Simple code snippet for HTML fetch:
fetch('https://api.data.gov.sg/v1/transport/carpark-availability')
.then(response => response.json())
.then(data => console.log(data))
Once you get the data you can of course do more with it, e.g. show all car parks with no lots available.
fetch('https://api.data.gov.sg/v1/transport/carpark-availability')
.then(response => response.json())
.then(data => {
let rows = data.items[0].carpark_data;
let results = rows.filter(d => parseInt(d.carpark_info[0].lots_available) == 0)
console.log(rows);
console.log(results);
})
Because often we need the base underlying data to do a data science exploratory analysis or visualization project.
If I asked you to collect data from a site would you know how?
Let's go to simpler examples.
If it was a simple table, collecting it is easy. You could in fact cut and paste the text into an editor, and use the editor to format.
But if I asked you to collect data from a more complicated example, would you know how to do so?
ParseHub is only one of many cloud-based low code/no code scraping tools out there that you can use to scrape websites.
Please download a free version of the ParseHub.
ParseHub has an extremely well thought out beautiful tutorial (and user interface) on how to use their tool to scrape their mock movie listing site.
Please go through this.
And now, we'll go back to the previous examples and scrape the previous sites.
Let's start with example 2 first and then example 1.
Chi-Loong | V/R