life is too short for a diary




Daily Logs for May 30, 2026

Tags: letters deep learning ml

Author
Written by: Tushar Sharma
Featured image for Daily Logs for May 30, 2026

Dear Vishi, dear logs for today.

Deep Learning Lectures

I watched the MIT 15.773 lecture by Rama Ramakrishnan.

What is a Weight?

A weight in a model is a trainable parameter (a coefficient). We multiply the input features by these weights (and add a bias) to produce a prediction. The goal of training is to adjust these weights to minimize the Loss Function, which measures the error between our prediction and the actual target.

Mental Model of Training

The training process is a feedback loop where we use the gradient of the loss to update our parameters.

graph LR
    Input["Input X"] --> Hidden["Hidden Layers"]
    Hidden --> Weights["Weights/Parameters"]
    Weights --> Pred["Prediction Y'"]
    Pred --> Loss["Loss Function"]
    Target["True Target Y"] --> Loss
    Loss -->|"Gradient"| Optimizer["Optimizer"]
    Optimizer -->|"Update"| Weights

Optimization: Gradient Descent

The most common optimization algorithm is Gradient Descent. The weights are updated by moving in the opposite direction of the gradient:

\[w \leftarrow w - \alpha \cdot \frac{\partial Loss}{\partial w}\]

Overfitting vs. Underfitting

Finding the "sweet spot" in model complexity is crucial for generalization.

graph TD
    A["Model Complexity"] --> B{"Performance"}
    B -->|"Too Simple"| C["Underfitting: High Training Error"]
    B -->|"Just Right"| D["Sweet Spot: Low Validation Error"]
    B -->|"Too Complex"| E["Overfitting: Low Training Error, High Validation Error"]

Tensors: The Building Blocks

Tensors are N-dimensional arrays that flow through the network:


comments powered by Disqus