Tags: letters deep learning ml
Dear Vishi, dear logs for today.
I watched the MIT 15.773 lecture by Rama Ramakrishnan.
A weight in a model is a trainable parameter (a coefficient). We multiply the input features by these weights (and add a bias) to produce a prediction. The goal of training is to adjust these weights to minimize the Loss Function, which measures the error between our prediction and the actual target.
The training process is a feedback loop where we use the gradient of the loss to update our parameters.
graph LR
Input["Input X"] --> Hidden["Hidden Layers"]
Hidden --> Weights["Weights/Parameters"]
Weights --> Pred["Prediction Y'"]
Pred --> Loss["Loss Function"]
Target["True Target Y"] --> Loss
Loss -->|"Gradient"| Optimizer["Optimizer"]
Optimizer -->|"Update"| Weights
The most common optimization algorithm is Gradient Descent. The weights are updated by moving in the opposite direction of the gradient:
\[w \leftarrow w - \alpha \cdot \frac{\partial Loss}{\partial w}\]Finding the "sweet spot" in model complexity is crucial for generalization.
graph TD
A["Model Complexity"] --> B{"Performance"}
B -->|"Too Simple"| C["Underfitting: High Training Error"]
B -->|"Just Right"| D["Sweet Spot: Low Validation Error"]
B -->|"Too Complex"| E["Overfitting: Low Training Error, High Validation Error"]
Tensors are N-dimensional arrays that flow through the network:
23)