Chapter 5 (Notes)

10 minute read

Published: July 12, 2025

5.2.2. Regression Metrics

Regression refers to a predictive modeling problem that involves predicting a continuous (numeric) value rather than a class label.
It is fundamentally different from classification tasks, which involve discrete labels or categories.
Regression models are common in real-world tasks such as:
- Estimating prices (houses, cars, electronics)
- Predicting drug dosages based on patient characteristics
- Forecasting transportation demand
- Predicting sales trends using historical and market data
Unlike classification, where accuracy can directly evaluate performance, regression requires error-based metrics.
These metrics provide an error score summarizing how close the model’s predictions are to the actual values.
Understanding and interpreting these scores is crucial for developing robust and interpretable regression models.
There are four error metrics that are commonly used for evaluating and reporting the performance of a regression model; they are:
- Mean Squared Error (MSE).
- Root Mean Squared Error (RMSE).
- Mean Absolute Error (MAE)
- R-squared $(R^2)$

Mean Squared Error (MSE)

Mean Squared Error (MSE) is one of the most widely used metrics for evaluating the performance of regression models.
It measures the average of the squares of the errors—that is, the average squared difference between the actual and predicted values.
MSE is a loss function used in least squares regression and also serves as a performance metric.
It is the basis for least squares optimization, the core of many regression algorithms (e.g., Linear Regression).
Highlights large errors, making it ideal when penalizing significant mistakes is important.

MSE

Formula:

\[\text{MSE} = \frac{1}{n} \sum_{i=1}^{n}(y_i - \hat{y}_i)^2\]

Where:
- $n$ : Total number of data points
- $y_i$ : Actual (true) value of the $i^{th}$ data point
- $\hat{y}_i$ : Predicted value of the $i^{th}$ data point
Characteristics:
- Always non-negative (since errors are squared).
- Units: Squared units of the target variable, making it less interpretable in its raw form.
- Sensitive to outliers: Squaring errors disproportionately penalizes large mistakes.
- Interpretation: Lower MSE values indicate better model performance.
Use MSE when?
- You want to penalize larger errors more heavily.
- You’re optimizing with algorithms that rely on gradient descent.
- You are more interested in the overall performance than interpretability.

Root Mean Squared Error (RMSE)

RMSE is the square root of the Mean Squared Error (MSE).
It provides a measure of the average magnitude of the prediction error, but unlike MSE, it is in the same unit as the original data.
RMSE gives a higher weight to large errors due to the squaring step in MSE.
Formula

\[\text{RMSE} = \sqrt{ \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 }\]

Where:
- $n$ : Total number of data points
- $y_i$ : Actual (true) value of the $i^{th}$ data point
- $\hat{y}_i$ : Predicted value of the $i^{th}$ data point
Here, squaring handles the magnitude, and the square root brings the unit back to original scale.
It is more interpretable than MSE because it’s in the same scale as the target.
It penalizes larger errors more heavily (like MSE).
Here, Lower RMSE indicates better model performance.

Mean Absolute Error (MAE)

MAE calculates the average absolute difference between predicted and actual values.
It is a linear score, meaning all errors are weighted equally in proportion to their size.
Formula

\[\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} \left| y_i - \hat{y}_i \right|\]

Where:
- $n$ : Total number of data points
- $y_i$ : Actual (true) value of the $i^{th}$ data point
- $\hat{y}_i$ : Predicted value of the $i^{th}$ data point
- $ y_i - \hat{y}_i $ : Absolute difference (always non-negative)
Easy to interpret and in the same unit as the original target.
Less sensitive to outliers than RMSE or MSE.
Good for general error analysis, especially when large errors aren’t critical.

R-squared (Coefficient of Determination)

R-squared $(R^2)$ measures the proportion of the variance in the dependent variable that is explained by the independent variables in the model.
In simpler terms, it tells us how well the model fits the data.
Formula

\[R^2 = 1 - \frac{ \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 }{ \sum_{i=1}^{n} (y_i - \bar{y})^2 }\]

Where,
- $y_i$ : Actual (true) value of the $i^{th}$ data point
- $\hat{y}_i$ : Predicted value of the $i^{th}$ data point
- $\bar{y}$: Mean of actual values
- Numerator: Sum of squared errors (residual sum of squares, RSS)
- Denominator: Total sum of squares (TSS)
Here,
R-squared Value Interpretation
1 Perfect prediction (all variance explained)
0 Model does not explain any variability
< 0 Model performs worse than a horizontal line
A higher R-squared generally means a better fit, but not always.
R-squared doesn’t indicate causality, and a high value may still be misleading if the model is overfitting or improperly specified.
It is a relative measure, it **compares model vs baseline (mean)

R-squared Value	Interpretation
1	Perfect prediction (all variance explained)
0	Model does not explain any variability
< 0	Model performs worse than a horizontal line

5.3. Model Validation Techniques

Model validation is the process of assessing how well a machine learning (ML) or artificial intelligence (AI) model performs — especially on unseen (new) data.
It ensures that the model:
- Achieves its design objectives
- Produces accurate and trustworthy predictions
- Generalizes well beyond the training data
- Complies with regulatory and quality standards
Purpose:
- Evaluate Model Performance: Confirms that the model works not just on training data, but on real-world, unseen datasets.
- Support Model Selection: Helps compare multiple models, choose the most appropriate one, and select optimal hyperparameters.
- Prevent Overfitting: Assesses the risk of overfitting by observing how model performance changes on validation vs training data.
- Build Trust: Ensures transparency and reliability, especially if validated by third-party or independent teams.
- Comply with ML Governance: Contributes to AI governance by enforcing policies, monitoring model activity, and validating data/tooling quality.
Benefits:
- Ensures Generalization
- Guides Model Selection
- Improves Accuracy and Robustness
- Builds Regulatory Confidence
- Prevents Future Failures
Without model validation, we risk deploying models that look good on paper but fail in practice.

Train-Test vs. K-Fold Cross Validation

5.3.1. Train-Test Split

The train-test split is the most basic method of model validation.
It involves splitting the dataset into two parts:
- Training set: Used to train the model
- Test set: Used to evaluate model performance on unseen data
Common ratios include:
- 70% training / 30% testing
- 80% training / 20% testing
- 60% training / 40% testing (for very small datasets)
Advantages:
- Simple and fast
- Good for large datasets
- Helps quickly estimate model performance
Disadvantages
- High variance: Model evaluation may vary significantly based on how the data was split
- Risk of under- or over-estimating model performance due to random splits
- Not ideal for small datasets

5.3.2. Cross Validation

Cross-validation is a more robust model validation technique where the model is trained and tested multiple times on different data subsets.
Helps to assess how the model generalizes to an independent dataset.

5.3.2.1. K-Fold Cross Validation

The dataset is split into K equally sized “folds”.
The model is trained on K−1 folds and tested on the remaining 1 fold.
This process is repeated K times, each time using a different fold as the test set.
The average performance over the K trials is used as the final evaluation metric.

\[\text{CV}_{\text{score}} = \frac{1}{K} \sum_{i=1}^{K} \text{score}_i\]

Where,
- $K$: Number of folds
- $\text{score}_i$: Evaluation metric (e.g., RMSE, MAE, R²) on the $i^{th}$ fold
Advantages
- Reduces variance in performance estimation
- Uses the entire dataset for both training and testing
- More reliable for small to medium-sized datasets
Disadvantages:
- Computationally expensive, especially for large datasets or complex models
- May not work well if data is not independently and identically distributed (i.i.d.)

5.4 Hyperparameter Tuning

Hyperparameter tuning (also called hyperparameter optimization) is the process of searching for the best set of hyperparameters that maximizes a machine learning model’s performance on a given task.
Unlike model parameters (which are learned from training data), hyperparameters are set before training and control the learning process itself.
It determines how well a model learns and generalizes to new data.
Poorly tuned hyperparameters can lead to:
- Underfitting (high bias, poor performance)
- Overfitting (high variance, poor generalization)
Good tuning results in:
- Minimized loss
- Improved accuracy
- Robust performance
- Balanced bias-variance tradeoff
It’s essential for real-world applications in healthcare, finance, autonomous driving, etc.
Common hyperparamter for a Neural Network:
- Learning Rate (η): Controls step size in gradient descent.
  - High → fast but unstable
  - Low → stable but slow
- Epochs: Number of passes over entire training dataset
- Batch Size: Number of samples per gradient update
- Hidden Layers / Neurons: Affects capacity and depth of the network
- Activation Function: ReLU, Sigmoid, Tanh; adds non-linearity
- Momentum: Helps accelerate training by smoothing gradients
- Learning Rate Decay: Reduces learning rate over time
Objective: Minimize loss or maximize metric score
Process:
1. Choose a set of hyperparameters
2. Train the model
3. Evaluate performance (e.g., via cross-validation)
4. Repeat until best configuration is found
Mathematical Representation

\[\theta^* = \arg\min_{\theta \in \Theta} \ \mathcal{L}(f_{\theta}(X_{\text{train}}), y_{\text{train}})\]

where,

θ : Set of hyperparameters
$\mathcal{L}$: Loss function
$f_{\theta}$: Model with configuration θ\thetaθ

Grid Search vs. Random Search

5.4.1. Grid Search

Exhaustively tests all combinations in a hyperparameter grid
Best for small search spaces
Steps:
- Choose the model and its hyperparameters to tune.
- Specify a set of possible values for each hyperparameter.
- Build a grid containing all combinations.
- Train and validate the model for each combination.
- Select the combination that produces the best score (e.g., lowest MSE or highest accuracy).
Advantages:
- Guaranteed to find optimal combo (if within grid)
- Simple to understand
Disadvantages:
- Computationally expensive
- Doesn’t scale well with more parameters or wider ranges
When to use:
- For small search spaces
- When computational resources are not a constraint
- When accuracy is critical and time is available

5.4.2. Random Search

Randomly samples combinations from the search space
Efficient when not all hyperparameters are equally important
This process continues for a fixed number of iterations or until computational budget is exhausted.
Advantages:
- Much faster than grid search
- Can explore larger spaces with fewer evaluations
Disadvantages:
- No guarantee of finding the absolute best configuration
- Results may vary between runs (unless random seed is fixed)
- Might miss promising combinations if not enough iterations
Steps:
- Select model to tune
- Define distributions for hyperparameters
- Set number of iterations (e.g., 10–100)
- Choose evaluation metric (e.g., accuracy, RMSE)
- Randomly sample hyperparameter combinations
- Train and validate model for each combination
- Compare scores across all runs
- Pick best combination
- (Optional) Retrain model on full data
When to use:
- When hyperparameter space is large
- When computational resources or time are limited
- When exact optimal values are not critical, but good performance is sufficient

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Learning Deutsch by learning Phrases and Sentences

1 minute read

Published: September 28, 2025

When it comes to language learning, focusing on phrases and sentences rather than isolated words can make a significant difference. While memorizing vocabulary lists might seem like a straightforward approach, it often leaves learners struggling to use those words in real-life situations. Words alone rarely convey complete meaning; context is crucial. By learning phrases and sentences, you naturally absorb grammar, word order, and common expressions, making your speech sound more natural and fluent.

For example, knowing the word “book” is helpful, but learning the phrase “I’d like to book a table” is far more practical. Phrases provide ready-made building blocks for conversation, reducing the mental effort needed to construct sentences from scratch. This approach also helps with pronunciation and intonation, as you practice speaking in chunks rather than isolated syllables.

Moreover, sentences and phrases expose you to cultural nuances and idiomatic expressions that single words cannot convey. This leads to better comprehension when listening or reading, and more confidence when speaking. In summary, prioritizing phrases and sentences accelerates your ability to communicate effectively, making language learning more enjoyable and efficient.

Below are some of the anki decks that can be used:

Deutsch:

German Sentences
- Part 1 - A1 and A2: https://ankiweb.net/shared/info/785874566
- Part 2 - B1 : https://ankiweb.net/shared/info/17323417
- Part 3 - B2-C1 : https://ankiweb.net/shared/info/944971572
German 7000 Intermediate/Advanced Sentences w/ Audio
- Part 1 : https://ankiweb.net/shared/info/1125602705

Japanese:

LTL Japanese Deck
- Level 1 - Short: https://ankiweb.net/shared/info/1184395484
- Level 2 - Short Medium: https://ankiweb.net/shared/info/187819699
- Level 3 - Medium: https://ankiweb.net/shared/info/266834099
- Level 4 - Medium Long: https://ankiweb.net/shared/info/660574631
- Level 5 - Long: TBD

Deutsch Day 16: Compound Nouns

12 minute read

Published: August 06, 2025

📘 MASTER PLAN: German Compound Noun Vocabulary Expansion

🔶 PHASE 1: Master the Most Useful Base Nouns (Top 50)

These are the “root” or “core” nouns you will see in countless combinations.

Noun Meaning Article Example Compound

| # | Noun | Meaning | Article | Example Compound | | – | ——– | ———– | ——– | —————– | | 1 | Haus | house | das | Krankenhaus | | 2 | Kind | child | das | Kindergarten | | 3 | Arbeit | work | die | Hausarbeit | | 4 | Schule | school | die | Sprachschule | | 5 | Auto | car | das | Autounfall | | 6 | Zeit | time | die | Freizeit | | 7 | Tag | day | der | Feiertag | | 8 | Bahn | rail/train | die | U-Bahn | | 9 | Buch | book | das | Wörterbuch | | 10 | Zimmer | room | das | Schlafzimmer | | 11 | Stadt | city | die | Hauptstadt | | 12 | Name | name | der | Nachname | | 13 | Licht | light | das | Taschenlicht | | 14 | Wasser | water | das | Trinkwasser | | 15 | Luft | air | die | Luftqualität | | 16 | Weg | path/way | der | Heimweg | | 17 | Spiel | game/play | das | Kinderspiel | | 18 | Reise | travel/trip | die | Dienstreise | | 19 | Zeitung | newspaper | die | Bildzeitung | | 20 | Gerät | device | das | Küchengerät | | 21 | Mann | man | der | Geschäftsmann | | 22 | Frau | woman | die | Hausfrau | | 23 | Essen | food/eating | das | Essenszeit | | 24 | Lehrer | teacher | der | Lehrerzimmer | | 25 | Student | student | der | Studentenstadt | | 26 | Bahn | train/rail | die | Eisenbahn | | 27 | Eltern | parents | die (pl) | Elternabend | | 28 | Körper | body | der | Körperpflege | | 29 | Kopf | head | der | Kopfschmerzen | | 30 | Zahn | tooth | der | Zahnarzt | | 31 | Auge | eye | das | Augenarzt | | 32 | Herz | heart | das | Herzenswunsch | | 33 | Beruf | profession | der | Berufsleben | | 34 | Unfall | accident | der | Autounfall | | 35 | Polizei | police | die | Polizeiauto | | 36 | Freund | friend | der | Freundschaft | | 37 | Uhr | clock | die | Wanduhr | | 38 | Sprache | language | die | Fremdsprache | | 39 | Tier | animal | das | Haustier | | 40 | Leben | life | das | Lebensstil | | 41 | Welt | world | die | Weltkarte | | 42 | Feuer | fire | das | Feuerzeug | | 43 | Glas | glass | das | Weinglas | | 44 | Straße | street | die | Hauptstraße | | 45 | Fenster | window | das | Fensterrahmen | | 46 | Schuh | shoe | der | Turnschuh | | 47 | Tasche | bag | die | Handtasche | | 48 | Lampe | lamp | die | Schreibtischlampe | | 49 | Computer | computer | der | Computerprogramm | | 50 | Tisch | table/desk | der | Esstisch |

✅ Goal: Learn gender, plural form, and 2-3 common compounds per base noun.

🔶 PHASE 2: Master Compound Prefix & Suffix Builders

These turn base nouns into real-life compound nouns.

✅ Goal: Learn 10 of each and how they behave when combined.

🔶 PHASE 3: Learn Noun Combination Patterns (Grouped by Theme)

Now combine your prefix + base/suffix using patterns and themes. Grouping by theme makes it easy to recall.

🏠 House & Furniture (Wohnen und Möbel)

🧑‍⚕️ Health & Body (Gesundheit und Körper)

🎓 School & Learning (Schule und Lernen)

🚗 Travel & Transport (Reisen und Verkehr)

⏱ Time & Work (Zeit und Arbeit)

📱 Devices & Objects (Geräte und Gegenstände)

📰 Media & Reading (Medien und Lesen)

| Compound Noun      | Meaning            | | ------------------ | ------------------ | | Wörterbuch         | dictionary         | | Schulbuch          | school book        | | Tageszeitung       | daily newspaper    | | Bildzeitung        | tabloid            | | Lesebrille         | reading glasses    | | Fernsehprogramm    | TV program         | | Lieblingsbuch      | favorite book      | | Zeitungsausschnitt | newspaper clipping | | Sachbuch           | nonfiction book    | | Bibliotheksausweis | library card       | | Buchhandlung       | bookstore          | | Nachrichtenkanal   | news channel       |

💼 People & Professions (Personen und Berufe)

🐾 Nature & Environment (Natur und Umwelt)

📦 BONUS GROUP – Easy-to-Understand Compounds from A1/A2 Level

Deutsch Day 12: Subject + Verb

1 minute read

Published: July 29, 2025

Nominative

Personal Pronomen in Nominative

Nominative pronouns are personal pronouns that replace the subject in a sentence. They show who or what is doing something, e.g., I am tired.

Deutsch: Pronomen im Nominativ sind Personalpronomen, die das Subjekt im Satz ersetzen. Sie zeigen, wer oder was etwas tut, z. B. Ich bin müde.

	Singular	Plural
1st person	ich (I)	Wir (We)
2nd person	du/Sie (you)	Ihr (you all)
3rd Person	er/sie/es (he/she/it)	sie (they)

Sein (to have)

Number	Person	Personalpronomen	Sein (to be)
Singular	1st person	ich (I)	bin
Singular	2nd person	du (you - informal)	bist
Singular	3rd person	er/sie/es (he/she/it)	ist
Singular	2nd person (formal)	Sie (you - formal)	sind
Plural	1st person	wir (we)	sind
Plural	2nd person	ihr (you all - informal)	seid
Plural	3rd person	sie (they)	sind
Plural	2nd person (formal)	Sie (you all - formal)	sind

habe (to have)

Number	Person	Personalpronomen	Haben (to have)
Singular	1st person	ich (I)	habe
Singular	2nd person	du (you - informal)	hast
Singular	3rd person	er/sie/es (he/she/it)	hat
Singular	2nd person (formal)	Sie (you - formal)	haben
Plural	1st person	wir (we)	haben
Plural	2nd person	ihr (you all - informal)	habt
Plural	3rd person	sie (they)	haben
Plural	2nd person (formal)	Sie (you all - formal)	haben

Deutsch Day 11: Ja/Nein Trage

less than 1 minute read

Published: July 28, 2025

The structure for Yes/No question (Ja/Nein Trage) in Deutsch is as follows:

` Verb (konjugiert) + Subjekt + Rest`

For Example:

Deutsch	Englisch
Bist du müde?	Are you tired?
Hast du ein Buch?	Do you have a book?
Kommt er aus Spanien?	Does he come from Spain?
Geht sie zur Schule?	Does she go to school?
Wohnst du in Berlin?	Do you live in Berlin?

Other example:

Ist das die Brille?
Ist das die Handdy?
Ist das der Apfel?
Ist das der Tasse?

Bishnu Khadka