Information on this data set:
| Column Name | Type | Description |
|---|---|---|
id |
Float | An anonymised unique identifier for each client. |
limit_balance |
Float | Amount of credit extended to the client in New Taiwan (NT) dollars, covering both individual and family credit limits. |
sex |
String | Client’s gender (1 = male, 2 = female). |
education_level |
String | Highest education level attained by the client:<br> - 1: Graduate school<br> - 2: University<br> - 3: High school<br> - 4: Others<br> - 5, 6: Unknown |
marital_status |
String | Client’s marital status (1 = married, 2 = single, 3 = others). |
age |
Float | Client’s age in years. |
pay_0 to pay_6 |
Float | Client’s repayment status for each month:<br> - -1: Payment made on time<br> - 1+: Number of months delayed (e.g., 1 = one month late, up to 9 for 9+). |
bill_amt_1 to bill_amt_6 |
Float | Bill statement amount in NT dollars for each respective month, from September 2005 (bill_amt_1) through April 2005 (bill_amt_6). |
pay_amt_1 to pay_amt_6 |
Float | Amount paid by the client each month in NT dollars, from September 2005 (pay_amt_1) through April 2005 (pay_amt_6). |
default_payment_next_month |
String | Indicates if the client defaulted on the payment the following month (1 = yes, 0 = no). |
predicted_default_payment_next_month.tables.score |
Float | Prediction score indicating the likelihood of default on the next payment. |
predicted_default_payment_next_month.tables.value |
Float | Prediction value, potentially indicating the model’s confidence or predicted default outcome. |
Basic SQL Queries were done to see data types of each column, unique values in each column and explore NULL values and outliers.
Some basic SQL queries to explore this data set;
credit_dataset_basic_queries.sql
Here more advanced SQL queries were used to analyze;