Pivot Tables are an easy way to analyze data using a spreadsheet.
They exist in Microsoft Excel (cloud and desktop), Google sheets and opensource office tools (like Libre Office).
For this bootcamp we'll be using Microsoft Excel. I'm using the free office365 version, and it should be similar to your SUTD version.
Please download a sample version of this sample sales data CSV file (source: Kaggle link) and open it up in Excel.
Now let's put everything together.
Which university in Singapore has the highest graduate employment rate in 2022?
What course and what's the mean monthly salary?
Higher order analytic tools like PowerBI and Tableau. Really useful for connectors to various data platforms.
SQL is useful to know if you're doing deeper dives into data.
ETL tools (Extract, transform and load) tools. Mainly if you have loads of data and it is a warehousing and architecture problem.
If you're merging tables on a column Excel can do it using many various methods.
Tools like PowerBI or Tableau can probably do this as well.
We'll do one quick exercise using the Excel formula VLOOKUP. Download this ODS (Open Document Spreadsheet) sample class file and we'll go through in Excel.
For JSON you can combine your datasets in a GIS (Geographic Information System) tool like ESRI ArcGIS or the opensource QGIS. Or use cloud-based tools like Carto or Mapbox.
If you take my module on visualization we'll go through these perspectives in more detail when we talk about geospatial viz.
There are tons of libraries out there. Here is just a small sample of popular libraries for some languages.
We're going to do some very simple programming using JS.
We're going to merge a JSON (GeoJSON format) and a CSV file on a common field.
We're preparing data for a chloropleth visualization from 2 data sources
Download the sample HTML file, the JSON and CSV in the same directory.
Fire up an editor and take a look.
Oh no! A whole bunch of CORS (cross origin resource sharing) errors when trying to run the html file.
This is for security reasons, making sure that javascript doesn't inject some malicious code into your system when you load data.
The best way to get around this is to run a web server, which locks down the protocol to just web traffic.
(You can also disable security under developer mode but this is not recommended)
Two easy ways to get a local live server (amongst many, many others!)
Fire up VSC and download the live server extension and run it.
Alternatively, if you have NPM (node package manager) installed (comes prepackaged with Node) - just run npx http-server in the local directory.
Let's use the console.log to examine the data structure of the two files.
And then let's go through the merge step-by-step.
I'm not going through this but in Python you can use Pandas dataframe to merge the 2 files.
A version of the merged data file is here.
You'll see this again in assinment 4 if you're taking my module.
Let's use ChatGPT to merge both files instead of code.
Chi-Loong | V/R