SQL - CREDIT RISK ANALYSIS

Using Big Query public data set

Information on this data set:

Column Name	Type	Description
`id`	Float	An anonymised unique identifier for each client.
`limit_balance`	Float	Amount of credit extended to the client in New Taiwan (NT) dollars, covering both individual and family credit limits.
`sex`	String	Client’s gender (1 = male, 2 = female).
`education_level`	String	Highest education level attained by the client:<br> - 1: Graduate school<br> - 2: University<br> - 3: High school<br> - 4: Others<br> - 5, 6: Unknown
`marital_status`	String	Client’s marital status (1 = married, 2 = single, 3 = others).
`age`	Float	Client’s age in years.
`pay_0` to `pay_6`	Float	Client’s repayment status for each month:<br> - `-1`: Payment made on time<br> - `1+`: Number of months delayed (e.g., `1` = one month late, up to `9` for 9+).
`bill_amt_1` to `bill_amt_6`	Float	Bill statement amount in NT dollars for each respective month, from September 2005 (`bill_amt_1`) through April 2005 (`bill_amt_6`).
`pay_amt_1` to `pay_amt_6`	Float	Amount paid by the client each month in NT dollars, from September 2005 (`pay_amt_1`) through April 2005 (`pay_amt_6`).
`default_payment_next_month`	String	Indicates if the client defaulted on the payment the following month (1 = yes, 0 = no).
`predicted_default_payment_next_month.tables.score`	Float	Prediction score indicating the likelihood of default on the next payment.
`predicted_default_payment_next_month.tables.value`	Float	Prediction value, potentially indicating the model’s confidence or predicted default outcome.

Basic SQL Queries were done to see data types of each column, unique values in each column and explore NULL values and outliers.

Some basic SQL queries to explore this data set;

Here more advanced SQL queries were used to analyze;