Pivot Tables are an easy way to analyze data using a spreadsheet.
They exist in Microsoft Excel (cloud and desktop), Google sheets and opensource office tools (like Libre Office).
For this bootcamp we'll be using Microsoft Excel. I'm using the free office365 version, and it should be similar to your SUTD version.
Please download a sample version of this sample sales data CSV file (source: Kaggle link) and open it up in Excel.
Open up the CSV file in your Excel.
Take a quick glance at the data - attributes (columns) and items (rows).
Create a pivot table on the data worksheet.
Now let's put everything together.
Based on graduate employment survey, which university in Singapore has the highest graduate employment rate in 2022?
What course and what's the mean monthly salary?
Higher order analytic tools like PowerBI and Tableau. Really useful for connectors to various data platforms.
SQL is useful to know if you're doing deeper dives into data.
ETL tools (Extract, transform and load) tools. Mainly if you have loads of data and it is a warehousing and architecture problem.
If you're merging tables on a column Excel can do it using power query or table wizard.
Tools like PowerBI or Tableau can probably do this as well.
For JSON you can combine your datasets in a GIS (Geographic Information System) tool like ESRI ArcGIS or the opensource QGIS. Or use cloud-based tools like Carto or Mapbox.
There are tons of libraries out there. Here is just a small sample of popular libraries for some languages.
We're going to do some very simple programming using JS.
We're going to merge a JSON (GeoJSON format) and a CSV file on a common field.
We're preparing data for a chloropleth visualization from 2 data sources
Download the sample HTML file, the JSON and CSV in the same directory.
Fire up an editor and take a look.
Oh no! A whole bunch of CORS (cross origin resource sharing) errors when trying to run the html file.
This is for security reasons, making sure that javascript doesn't inject some malicious code into your system when you load data.
The best way to get around this is to run a web server, which locks down the protocol to just web traffic.
(You can also disable security under developer mode but this is not recommended)
Two easy ways to get a local live server (amongst many, many others!)
Fire up VSC and download the live server extension and run it.
Alternatively, if you have NPM (node package manager) installed - just run npx http-server in the local directory.
Let's use the console.log to examine the data structure of the two files.
And then let's go through the merge step-by-step.
I'm not going through this but in Python you can use Pandas dataframe to merge the 2 files.
A version of the merged data file is here.
You might see this again in assignment 4.
The data for the exercise has been cleaned up already.
But if I asked you to prepare the latest 2024 data for the choropleth map, how would go about it?
Chi-Loong | V/R