This repository has been archived on 2025-08-25. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
ml/python/Assignment_tutorial_4.ipynb

260 lines
8.9 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"# Tutorial 4 Assignment - Logistic Regression\n",
"\n",
"We have provided you with a preprocessed dataset, the first cell will load and set everything up for you.\n",
"The objectives for you to complete are as follows:\n",
"1. Code up the commented functions on your own.\n",
"2. Every step that you must code are explained as comments, use them as hints.\n",
"\n",
"The last cell has the code set up for training the model. We expect each one to have trained the model, and note down the best accuracy that they can achieve, and the conditions required to do the same."
],
"metadata": {
"id": "1k2vhsMVv0Pk"
}
},
{
"cell_type": "code",
"source": [
"!wget -O dataset.csv \"https://docs.google.com/spreadsheets/d/1RNtDIvisrnOmjJxS7aPm-45NtOH3qd5-mgd2bHeSOGA/export?format=csv&gid=1727131321\"\n",
"import pandas as pd\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"df=pd.read_csv('/content/dataset.csv')\n",
"\n",
"df.head()\n",
"X = df.drop(['RainTomorrow'], axis=1)\n",
"y = df['RainTomorrow']\n",
"scaler = StandardScaler()\n",
"X_scaled = scaler.fit_transform(X)\n",
"X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42, stratify=y)\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "nj3rHkttqucf",
"outputId": "0ff8d346-148a-4dd9-df8c-4696d2a002c3"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"--2024-08-30 09:47:10-- https://docs.google.com/spreadsheets/d/1RNtDIvisrnOmjJxS7aPm-45NtOH3qd5-mgd2bHeSOGA/export?format=csv&gid=1727131321\n",
"Resolving docs.google.com (docs.google.com)... 74.125.132.138, 74.125.132.139, 74.125.132.113, ...\n",
"Connecting to docs.google.com (docs.google.com)|74.125.132.138|:443... connected.\n",
"HTTP request sent, awaiting response... 307 Temporary Redirect\n",
"Location: https://doc-00-c8-sheets.googleusercontent.com/export/54bogvaave6cua4cdnls17ksc4/p30qcagdcmtcqd8ure4jdd8ec0/1725011230000/112261653790527273724/*/1RNtDIvisrnOmjJxS7aPm-45NtOH3qd5-mgd2bHeSOGA?format=csv&gid=1727131321 [following]\n",
"Warning: wildcards not supported in HTTP.\n",
"--2024-08-30 09:47:10-- https://doc-00-c8-sheets.googleusercontent.com/export/54bogvaave6cua4cdnls17ksc4/p30qcagdcmtcqd8ure4jdd8ec0/1725011230000/112261653790527273724/*/1RNtDIvisrnOmjJxS7aPm-45NtOH3qd5-mgd2bHeSOGA?format=csv&gid=1727131321\n",
"Resolving doc-00-c8-sheets.googleusercontent.com (doc-00-c8-sheets.googleusercontent.com)... 74.125.201.132, 2607:f8b0:4001:c01::84\n",
"Connecting to doc-00-c8-sheets.googleusercontent.com (doc-00-c8-sheets.googleusercontent.com)|74.125.201.132|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: unspecified [text/csv]\n",
"Saving to: dataset.csv\n",
"\n",
"dataset.csv [ <=> ] 12.04M 5.00MB/s in 2.4s \n",
"\n",
"2024-08-30 09:47:17 (5.00 MB/s) - dataset.csv saved [12621972]\n",
"\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"# CODE BELOW"
],
"metadata": {
"id": "kiZ4TBA6wFea"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "2A-RJ6Rscyoa"
},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"def sigmoid(z):\n",
" \"\"\"\n",
" Compute the sigmoid function.\n",
"\n",
" Parameters:\n",
" z : numpy array\n",
" Linear combination of weights and input features.\n",
"\n",
" Returns:\n",
" numpy array\n",
" Sigmoid of input z.\n",
" \"\"\"\n",
" pass\n",
"\n",
"def initialize_weights(n_features):\n",
" \"\"\"\n",
" Initialize weights and bias to zero.\n",
"\n",
" Parameters:\n",
" n_features : int\n",
" Number of features in the dataset.\n",
"\n",
" Returns:\n",
" tuple\n",
" Initialized weights and bias.\n",
" \"\"\"\n",
" # initialize the weights and bias to zero (hint: make sure dimentions are correct)\n",
"\n",
" return weights, bias\n",
"\n",
"def compute_cost(y, y_pred):\n",
" \"\"\"\n",
" Compute the cost function for logistic regression.\n",
"\n",
" Parameters:\n",
" y : numpy array\n",
" Actual labels.\n",
" y_pred : numpy array\n",
" Predicted probabilities.\n",
"\n",
" Returns:\n",
" float\n",
" The cost value.\n",
" \"\"\"\n",
" # compute the cost\n",
"\n",
" return cost\n",
"\n",
"def compute_gradients(X, y, y_pred):\n",
" \"\"\"\n",
" Compute the gradients for weights and bias.\n",
"\n",
" Parameters:\n",
" X : numpy array\n",
" Feature matrix.\n",
" y : numpy array\n",
" Actual labels.\n",
" y_pred : numpy array\n",
" Predicted probabilities.\n",
"\n",
" Returns:\n",
" tuple\n",
" Gradients of weights and bias.\n",
" \"\"\"\n",
" m = X.shape[0]\n",
"\n",
" # compute dw\n",
"\n",
" # compute db\n",
"\n",
" return dw, db\n",
"\n",
"\n",
"def optimize(X, y, weights, bias, learning_rate, num_iterations):\n",
" \"\"\"\n",
" Perform gradient descent to optimize weights and bias.\n",
"\n",
" Parameters:\n",
" X : numpy array\n",
" Feature matrix.\n",
" y : numpy array\n",
" Actual labels.\n",
" weights : numpy array\n",
" Weights of the model.\n",
" bias : float\n",
" Bias of the model.\n",
" learning_rate : float\n",
" Learning rate for gradient descent.\n",
" num_iterations : int\n",
" Number of iterations for gradient descent.\n",
"\n",
" Returns:\n",
" tuple\n",
" Optimized weights, bias, and the list of costs.\n",
" \"\"\"\n",
" costs = []\n",
"\n",
" for i in range(num_iterations):\n",
" # Compute linear model\n",
"\n",
" # Apply sigmoid function\n",
"\n",
" # Compute cost\n",
"\n",
" # Compute gradients\n",
"\n",
" # Update weights and bias\n",
" pass\n",
" return weights, bias, costs\n",
"\n",
"def predict(X, weights, bias):\n",
" \"\"\"\n",
" Predict the binary labels for a dataset.\n",
"\n",
" Parameters:\n",
" X : numpy array\n",
" Feature matrix.\n",
" weights : numpy array\n",
" Weights of the model.\n",
" bias : float\n",
" Bias of the model.\n",
"\n",
" Returns:\n",
" numpy array\n",
" Predicted binary labels (0 or 1).\n",
" \"\"\"\n",
" z = np.dot(X, weights) + bias\n",
" y_pred = sigmoid(z)\n",
" predictions = [1 if i > 0.5 else 0 for i in y_pred]\n",
" return np.array(predictions)\n"
]
},
{
"cell_type": "markdown",
"source": [
"# COMPUTE ACCURACY"
],
"metadata": {
"id": "v9rwH83Rwrfk"
}
},
{
"cell_type": "code",
"source": [
"weights, bias,costs = optimize()\n",
"y_pred = predict(X_test,weights,bias)\n",
"matches = np.sum(y_test == y_pred)\n",
"mismatches = np.sum(y_test != y_pred)\n",
"print(f\"Accuracy: {matches/(matches+mismatches)}\")"
],
"metadata": {
"id": "eyGEV4mWB-rW"
},
"execution_count": null,
"outputs": []
}
]
}