Daily Logs for May 30, 2026

Written by: Tushar Sharma

Featured image for Daily Logs for May 30, 2026

Dear Vishi, dear logs for today.

Deep Learning Lectures

I watched the MIT 15.773 lecture by Rama Ramakrishnan.

What is a Weight?

A weight in a model is a trainable parameter (a coefficient). We multiply the input features by these weights (and add a bias) to produce a prediction. The goal of training is to adjust these weights to minimize the Loss Function, which measures the error between our prediction and the actual target.

Mental Model of Training

The training process is a feedback loop where we use the gradient of the loss to update our parameters.

graph LR
    Input["Input X"] --> Hidden["Hidden Layers"]
    Hidden --> Weights["Weights/Parameters"]
    Weights --> Pred["Prediction Y'"]
    Pred --> Loss["Loss Function"]
    Target["True Target Y"] --> Loss
    Loss -->|"Gradient"| Optimizer["Optimizer"]
    Optimizer -->|"Update"| Weights

Optimization: Gradient Descent

The most common optimization algorithm is Gradient Descent. The weights are updated by moving in the opposite direction of the gradient:

\[w \leftarrow w - \alpha \cdot \frac{\partial Loss}{\partial w}\]

Batch Gradient Descent: One pass through the entire dataset (one epoch) before updating.
Stochastic Gradient Descent (SGD): Updates weights after seeing a small "mini-batch" of data. This is much faster and helps the model escape local minima.

Overfitting vs. Underfitting

Finding the "sweet spot" in model complexity is crucial for generalization.

graph TD
    A["Model Complexity"] --> B{"Performance"}
    B -->|"Too Simple"| C["Underfitting: High Training Error"]
    B -->|"Just Right"| D["Sweet Spot: Low Validation Error"]
    B -->|"Too Complex"| E["Overfitting: Low Training Error, High Validation Error"]

Underfitting: The model hasn't learned the patterns in the training data yet.
Overfitting: The model "memorizes" the training data too well, making it fail on new, unknown data.

Tensors: The Building Blocks

Tensors are N-dimensional arrays that flow through the network:

Rank 0: Scalar (a single number like 23)
Rank 1: Vector (a list of numbers)
Rank 2: Matrix (a 2D grid/table)
Rank N: N-dimensional array (e.g., a batch of color images)