Using Big Query public data set

GitHub Repo - Project:

Information on this data set:

Column Name Type Description
id Float An anonymised unique identifier for each client.
limit_balance Float Amount of credit extended to the client in New Taiwan (NT) dollars, covering both individual and family credit limits.
sex String Client’s gender (1 = male, 2 = female).
education_level String Highest education level attained by the client:<br> - 1: Graduate school<br> - 2: University<br> - 3: High school<br> - 4: Others<br> - 5, 6: Unknown
marital_status String Client’s marital status (1 = married, 2 = single, 3 = others).
age Float Client’s age in years.
pay_0 to pay_6 Float Client’s repayment status for each month:<br> - -1: Payment made on time<br> - 1+: Number of months delayed (e.g., 1 = one month late, up to 9 for 9+).
bill_amt_1 to bill_amt_6 Float Bill statement amount in NT dollars for each respective month, from September 2005 (bill_amt_1) through April 2005 (bill_amt_6).
pay_amt_1 to pay_amt_6 Float Amount paid by the client each month in NT dollars, from September 2005 (pay_amt_1) through April 2005 (pay_amt_6).
default_payment_next_month String Indicates if the client defaulted on the payment the following month (1 = yes, 0 = no).
predicted_default_payment_next_month.tables.score Float Prediction score indicating the likelihood of default on the next payment.
predicted_default_payment_next_month.tables.value Float Prediction value, potentially indicating the model’s confidence or predicted default outcome.

Basic SQL queries:

Basic SQL Queries were done to see data types of each column, unique values in each column and explore NULL values and outliers.

Some basic SQL queries to explore this data set;

credit_dataset_basic_queries.sql

Basic SQL QUERIES

Advanced SQL queries, further exploring credit risk and credit default

credit_dataset_analysis.sql

Here more advanced SQL queries were used to analyze;

Advanced Credit Analysis

VISUALISATIONS: