Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Pages

Posts

Learning Deutsch by learning Phrases and Sentences

1 minute read

Published:

When it comes to language learning, focusing on phrases and sentences rather than isolated words can make a significant difference. While memorizing vocabulary lists might seem like a straightforward approach, it often leaves learners struggling to use those words in real-life situations. Words alone rarely convey complete meaning; context is crucial. By learning phrases and sentences, you naturally absorb grammar, word order, and common expressions, making your speech sound more natural and fluent.

For example, knowing the word “book” is helpful, but learning the phrase “I’d like to book a table” is far more practical. Phrases provide ready-made building blocks for conversation, reducing the mental effort needed to construct sentences from scratch. This approach also helps with pronunciation and intonation, as you practice speaking in chunks rather than isolated syllables.

Moreover, sentences and phrases expose you to cultural nuances and idiomatic expressions that single words cannot convey. This leads to better comprehension when listening or reading, and more confidence when speaking. In summary, prioritizing phrases and sentences accelerates your ability to communicate effectively, making language learning more enjoyable and efficient.

Below are some of the anki decks that can be used:

Deutsch:

  • German Sentences
    • Part 1 - A1 and A2: https://ankiweb.net/shared/info/785874566
    • Part 2 - B1 : https://ankiweb.net/shared/info/17323417
    • Part 3 - B2-C1 : https://ankiweb.net/shared/info/944971572
  • German 7000 Intermediate/Advanced Sentences w/ Audio
    • Part 1 : https://ankiweb.net/shared/info/1125602705

Japanese:

  • LTL Japanese Deck
    • Level 1 - Short: https://ankiweb.net/shared/info/1184395484
    • Level 2 - Short Medium: https://ankiweb.net/shared/info/187819699
    • Level 3 - Medium: https://ankiweb.net/shared/info/266834099
    • Level 4 - Medium Long: https://ankiweb.net/shared/info/660574631
    • Level 5 - Long: TBD

Deutsch Day 16: Compound Nouns

12 minute read

Published:

📘 MASTER PLAN: German Compound Noun Vocabulary Expansion

🔶 PHASE 1: Master the Most Useful Base Nouns (Top 50)

These are the “root” or “core” nouns you will see in countless combinations.

Noun Meaning Article Example Compound

| # | Noun | Meaning | Article | Example Compound | | – | ——– | ———– | ——– | —————– | | 1 | Haus | house | das | Krankenhaus | | 2 | Kind | child | das | Kindergarten | | 3 | Arbeit | work | die | Hausarbeit | | 4 | Schule | school | die | Sprachschule | | 5 | Auto | car | das | Autounfall | | 6 | Zeit | time | die | Freizeit | | 7 | Tag | day | der | Feiertag | | 8 | Bahn | rail/train | die | U-Bahn | | 9 | Buch | book | das | Wörterbuch | | 10 | Zimmer | room | das | Schlafzimmer | | 11 | Stadt | city | die | Hauptstadt | | 12 | Name | name | der | Nachname | | 13 | Licht | light | das | Taschenlicht | | 14 | Wasser | water | das | Trinkwasser | | 15 | Luft | air | die | Luftqualität | | 16 | Weg | path/way | der | Heimweg | | 17 | Spiel | game/play | das | Kinderspiel | | 18 | Reise | travel/trip | die | Dienstreise | | 19 | Zeitung | newspaper | die | Bildzeitung | | 20 | Gerät | device | das | Küchengerät | | 21 | Mann | man | der | Geschäftsmann | | 22 | Frau | woman | die | Hausfrau | | 23 | Essen | food/eating | das | Essenszeit | | 24 | Lehrer | teacher | der | Lehrerzimmer | | 25 | Student | student | der | Studentenstadt | | 26 | Bahn | train/rail | die | Eisenbahn | | 27 | Eltern | parents | die (pl) | Elternabend | | 28 | Körper | body | der | Körperpflege | | 29 | Kopf | head | der | Kopfschmerzen | | 30 | Zahn | tooth | der | Zahnarzt | | 31 | Auge | eye | das | Augenarzt | | 32 | Herz | heart | das | Herzenswunsch | | 33 | Beruf | profession | der | Berufsleben | | 34 | Unfall | accident | der | Autounfall | | 35 | Polizei | police | die | Polizeiauto | | 36 | Freund | friend | der | Freundschaft | | 37 | Uhr | clock | die | Wanduhr | | 38 | Sprache | language | die | Fremdsprache | | 39 | Tier | animal | das | Haustier | | 40 | Leben | life | das | Lebensstil | | 41 | Welt | world | die | Weltkarte | | 42 | Feuer | fire | das | Feuerzeug | | 43 | Glas | glass | das | Weinglas | | 44 | Straße | street | die | Hauptstraße | | 45 | Fenster | window | das | Fensterrahmen | | 46 | Schuh | shoe | der | Turnschuh | | 47 | Tasche | bag | die | Handtasche | | 48 | Lampe | lamp | die | Schreibtischlampe | | 49 | Computer | computer | der | Computerprogramm | | 50 | Tisch | table/desk | der | Esstisch |

✅ Goal: Learn gender, plural form, and 2-3 common compounds per base noun.

🔶 PHASE 2: Master Compound Prefix & Suffix Builders

These turn base nouns into real-life compound nouns.

🔹 Common PREFIX Nouns (go in front) | Prefix | Meaning | Example | | ———- | ————— | ————————- | | Kinder- | child | Kinderspiel, Kinderzimmer | | Kranken- | sick | Krankenhaus | | Haus- | house/home | Hausaufgabe | | Sprach- | language | Sprachkurs, Sprachschule | | Auto- | car | Autounfall | | Haupt- | main | Hauptstraße | | Fahr- | drive | Fahrkarte, Fahrplan | | Schul- | school | Schulbuch, Schultasche | | Arbeits- | work | Arbeitszeit, Arbeitsplatz | | Taschen- | bag/pocket | Taschenlampe | | Zug- | train | Zugfahrt | | Essens- | eating | Essenszeit | | Zeit- | time | Zeitplan, Zeitdruck | | Stadt- | city | Stadtzentrum | | Polizei- | police | Polizeiauto | | Wasser- | water | Wasserglas, Wasserdampf | | Lebens- | life | Lebensstil, Lebenszeit | | Freund- | friend | Freundschaft | | Luft- | air | Luftverschmutzung | | Körper- | body | Körperpflege | | Augen- | eye | Augenarzt | | Kopf- | head | Kopfhörer | | Familien- | family | Familienfest | | Musik- | music | Musikschule | | Internet- | internet | Internetverbindung | | Lehrer- | teacher | Lehrerzimmer | | Nach- | after/following | Nachname, Nachmittag | | Vor- | before/front | Vorname, Vormittag | | Geschäfts- | business | Geschäftsmann | | Welt- | world | Weltkarte, Weltreise |

🔸 Common SUFFIX Nouns (go at the end) Suffix Meaning Example Word | Suffix | Meaning | Example | | ————- | —————- | ———————– | | -haus | building | Krankenhaus | | -zimmer | room | Schlafzimmer | | -schule | school | Sprachschule | | -buch | book | Wörterbuch | | -zeit | time | Freizeit, Arbeitszeit | | -spiel | game/play | Fußballspiel | | -gerät | device/equipment | Küchengerät | | -fahrt | ride/trip | Zugfahrt | | -weg | way/path | Heimweg | | -karte | card/map | Fahrkarte, Weltkarte | | -straße | street | Hauptstraße | | -tag | day | Feiertag, Arbeitstag | | -arbeit | work | Hausarbeit, Teamarbeit | | -freundschaft | friendship | Freundschaft | | -leben | life | Berufsleben | | -plan | plan | Stundenplan | | -programm | program | Computerprogramm | | -aufgabe | task/assignment | Hausaufgabe | | -zeitung | newspaper | Tageszeitung | | -gerät | appliance/tool | Haushaltsgerät | | -stelle | place/post | Arbeitsstelle | | -name | name | Vorname, Nachname | | -uhr | clock | Armbanduhr | | -arzt | doctor | Zahnarzt, Augenarzt | | -stadt | city | Studentenstadt | | -lampe | lamp | Schreibtischlampe | | -schuh | shoe | Handschuh | | -tasche | bag | Handtasche, Schultasche | | -fenster | window | Dachfenster | | -rahmen | frame | Fensterrahmen |

✅ Goal: Learn 10 of each and how they behave when combined.

🔶 PHASE 3: Learn Noun Combination Patterns (Grouped by Theme)

Now combine your prefix + base/suffix using patterns and themes. Grouping by theme makes it easy to recall.

🏠 House & Furniture (Wohnen und Möbel)

| Compound Noun | Meaning | | ————– | ———————– | | Wohnzimmer | living room | | Schlafzimmer | bedroom | | Badezimmer | bathroom | | Esstisch | dining table | | Küchenschrank | kitchen cupboard | | Hausnummer | house number | | Haustür | front door | | Fensterrahmen | window frame | | Dachfenster | roof window | | Haustier | pet (lit. house animal) | | Haushaltsgerät | household appliance | | Hausaufgabe | homework |

🧑‍⚕️ Health & Body (Gesundheit und Körper)

| Compound Noun | Meaning | | ————— | —————- | | Krankenhaus | hospital | | Zahnarzt | dentist | | Augenarzt | eye doctor | | Kopfschmerzen | headache | | Rückenschmerzen | back pain | | Körperpflege | body care | | Krankenkasse | health insurance | | Herzschlag | heartbeat | | Blutdruck | blood pressure | | Hausarzt | family doctor | | Notaufnahme | emergency room | | Krankenwagen | ambulance |

🎓 School & Learning (Schule und Lernen)

| Compound Noun | Meaning | | —————– | ——————- | | Schulbuch | school book | | Lehrerzimmer | teacher’s room | | Sprachschule | language school | | Hausaufgabe | homework | | Klassenarbeit | class test | | Stundenplan | schedule/timetable | | Schultasche | school bag | | Schulweg | way to school | | Schülerausweis | student ID card | | Schulzeit | school time | | Unterrichtsstunde | lesson | | Schulanfang | beginning of school |

🚗 Travel & Transport (Reisen und Verkehr)

| Compound Noun | Meaning | | ————- | ——————– | | Fahrkarte | ticket | | Zugfahrt | train ride | | Autofahrt | car trip | | Weltreise | world trip | | U-Bahn | subway | | Hauptstraße | main street | | Reisebüro | travel agency | | Abfahrtzeit | departure time | | Gepäckwagen | luggage cart | | Straßenkarte | road map | | Reisepass | passport | | Urlaubsort | vacation destination |

⏱ Time & Work (Zeit und Arbeit)

| Compound Noun | Meaning | | —————– | ——————– | | Arbeitszeit | working time | | Freizeit | free time | | Essenszeit | mealtime | | Feiertag | holiday | | Arbeitstag | work day | | Zeitplan | schedule | | Stundenlohn | hourly wage | | Arbeitsvertrag | work contract | | Teilzeitjob | part-time job | | Urlaubsantrag | vacation request | | Terminkalender | appointment calendar | | Wochenarbeitszeit | weekly working hours |

📱 Devices & Objects (Geräte und Gegenstände)

| Compound Noun | Meaning | | —————– | —————– | | Taschenlampe | flashlight | | Küchengerät | kitchen appliance | | Haushaltsgerät | household device | | Schreibtischlampe | desk lamp | | Computerprogramm | software program | | Fernsehapparat | television set | | Handyhülle | phone case | | Weckeruhr | alarm clock | | Kopfhörer | headphones | | Waschmaschine | washing machine | | Kühlschrank | refrigerator | | Mikrowellengerät | microwave |

📰 Media & Reading (Medien und Lesen)

| Compound Noun      | Meaning            | | ------------------ | ------------------ | | Wörterbuch         | dictionary         | | Schulbuch          | school book        | | Tageszeitung       | daily newspaper    | | Bildzeitung        | tabloid            | | Lesebrille         | reading glasses    | | Fernsehprogramm    | TV program         | | Lieblingsbuch      | favorite book      | | Zeitungsausschnitt | newspaper clipping | | Sachbuch           | nonfiction book    | | Bibliotheksausweis | library card       | | Buchhandlung       | bookstore          | | Nachrichtenkanal   | news channel       |

💼 People & Professions (Personen und Berufe)

| Compound Noun | Meaning | | ————— | ———————– | | Geschäftsmann | businessman | | Hausfrau | housewife | | Zahnarzt | dentist | | Augenarzt | eye doctor | | Lehrerzimmer | teachers’ lounge | | Berufsberater | career counselor | | Polizeibeamter | police officer | | Busfahrer | bus driver | | Berufserfahrung | professional experience | | Tierarzt | veterinarian | | Feuerwehrmann | firefighter | | Kindergärtnerin | preschool teacher |

🐾 Nature & Environment (Natur und Umwelt)

| Compound Noun | Meaning | | —————- | ——————- | | Haustier | pet | | Trinkwasser | drinking water | | Luftqualität | air quality | | Umweltproblem | environmental issue | | Müllabfuhr | garbage collection | | Solaranlage | solar panel system | | Naturkatastrophe | natural disaster | | Regenwasser | rainwater | | Wetterbericht | weather report | | Baumkrone | treetop | | Sonnenaufgang | sunrise | | Gewitterwolke | thundercloud |

📦 BONUS GROUP – Easy-to-Understand Compounds from A1/A2 Level

| Compound Noun | Meaning | | ————- | ————— | | Handtasche | handbag | | Handschuh | glove | | Haustür | front door | | Fußweg | footpath | | Zahnbürste | toothbrush | | Wasserflasche | water bottle | | Sonnenbrille | sunglasses | | Reisetasche | travel bag | | Arbeitszimmer | workroom/office | | Einkaufsliste | shopping list |

Deutsch Day 12: Subject + Verb

1 minute read

Published:

Nominative

Personal Pronomen in Nominative

Nominative pronouns are personal pronouns that replace the subject in a sentence. They show who or what is doing something, e.g., I am tired.

Deutsch: Pronomen im Nominativ sind Personalpronomen, die das Subjekt im Satz ersetzen. Sie zeigen, wer oder was etwas tut, z. B. Ich bin müde.

 SingularPlural
1st personich (I)Wir (We)
2nd persondu/Sie (you)Ihr (you all)
3rd Personer/sie/es (he/she/it)sie (they)

Sein (to have)

NumberPersonPersonalpronomenSein (to be)
Singular1st personich (I)bin
Singular2nd persondu (you - informal)bist
Singular3rd personer/sie/es (he/she/it)ist
Singular2nd person (formal)Sie (you - formal)sind
Plural1st personwir (we)sind
Plural2nd personihr (you all - informal)seid
Plural3rd personsie (they)sind
Plural2nd person (formal)Sie (you all - formal)sind

habe (to have)

NumberPersonPersonalpronomenHaben (to have)
Singular1st personich (I)habe
Singular2nd persondu (you - informal)hast
Singular3rd personer/sie/es (he/she/it)hat
Singular2nd person (formal)Sie (you - formal)haben
Plural1st personwir (we)haben
Plural2nd personihr (you all - informal)habt
Plural3rd personsie (they)haben
Plural2nd person (formal)Sie (you all - formal)haben

Deutsch Day 11: Ja/Nein Trage

less than 1 minute read

Published:

The structure for Yes/No question (Ja/Nein Trage) in Deutsch is as follows:

` Verb (konjugiert) + Subjekt + Rest`

For Example:

DeutschEnglisch
Bist du müde?Are you tired?
Hast du ein Buch?Do you have a book?
Kommt er aus Spanien?Does he come from Spain?
Geht sie zur Schule?Does she go to school?
Wohnst du in Berlin?Do you live in Berlin?

Other example:

  • Ist das die Brille?
  • Ist das die Handdy?
  • Ist das der Apfel?
  • Ist das der Tasse?

Deutsch Day 10: Nominative

2 minute read

Published:

German/Deutsch is divided into different cases such that it would be easier to build on the language slowly but surely. It adds different parts required for fluency–or at least be able to make some sentences. The cases are as follows:

  1. Nominative
  2. Akklustiv
  3. Dativ
  4. Gentiv (A2)

Nominative

 MaskulineFemilineNeutrumPlural
bestmitte Artikel (the)derdiedasdie
unbestmitte Artikel (one)eineineein-
negative Artikel (no)keinkeinekeinkeine
Possessive Artikel (my)meinmeinemeinmeine

beispiele (Examples):

Was ist das? Wei ist das?

  1. Das ist der Tische.
    • unbestmitte Artikel: Das ist ein Tische.
    • negative Artikel: Das ist kein Tische.
    • Possessive Artikel: Das ist mein Tische.
  2. Das ist die Banane.🍌
    • unbestmitte Artikel: Das ist eine Banane.
    • negative Artikel:: Das ist keine Banane.
    • Possessive Artikel:: Das ist meine Banane.
  3. Das ist das Handy. 📲🤳
    • unbestmitte Artikel: Das ist ein Handy.
    • negative Artikel:: Das ist kein Handy.
    • Possessive Artikel:: Das ist mein Handy.
  4. Das sind das Zeitungen. 🗞️📰
    • unbestmitte Artikel: (This is plural. Therefore, it has no )
    • negative Artikel:: Das sind kein Zeitungen.
    • Possessive Artikel:: Das sind mein Zeitungen.
  5. Das sind die Bücher.
    • unbestmitte Artikel:
    • negative Artikel:: Das sind keine Bücher.
    • Possessive Artikel:: Das ist meine Bücher.
  6. Das ist das Bus/Auto
    • unbestmitte Artikel: Das ist ein Auto.
    • negative Artikel:: Das ist kein Auto.
    • Possessive Artikel:: Das ist mein Auto.
  7. Das sind die Blumen.
    • unbestmitte Artikel:
    • negative Artikel:: Das sind keine Blumen.
    • Possessive Artikel:: Das sind meine Blumen.
  8. Das ist der Lehrer/die Lehrerin.
    • unbestmitte Artikel: Das ist ein Lehrer.
    • negative Artikel:: Das ist kein Lehrer.
    • Possessive Artikel:: Das ist mein Lehrer.
  9. Das ist die Katze.
    • unbestmitte Artikel: Das ist eine Katze.
    • negative Artikel:: Das ist keine Katze.
    • Possessive Artikel:: Das ist meine Katze.
  10. Das ist der Kugelschreiber.
    • unbestmitte Artikel: Das ist ein Kugelschriber.
    • negative Artikel:: Das ist kein Kugelschriber.
    • Possessive Artikel:: Das ist mein Kugelschriber.
  11. Das ist die Schokolade.
    • unbestmitte Artikel: Das ist ein Schokolade.
    • negative Artikel:: Das ist kein Schokolade.
    • Possessive Artikel:: Das ist mein Schokolade.
  12. Das ist das Mädchen.
    • unbestmitte Artikel: Das ist ein Mädchen.
    • negative Artikel:: Das ist kein Mädchen.
    • Possessive Artikel:: Das ist mein Mädchen.
  13. Das ist der Elefant.
    • unbestmitte Artikel: Das ist ein Elefant.
    • negative Artikel:: Das ist kein Elefant.
    • Possessive Artikel:: Das ist mein Elefant.

Note:

To ask someone what the article of a Nomen (noun) is, we use the following sentence.

  • Was ist die Artikel von ____?

To ask someone what does a particular word mean or to show people object and ask what the word of the object is, then we use the following sentence.

  • Was bedeutet das? (and pointing to the object) or

  • Was dedeutet ____?

die Artikle im Nominative

9 minute read

Published:

Nomen

Noun is called Nomen in German and it is always written with it’s first letter capital. This might not be intuitive for an English speaker but it is the rule. Also, in most of the case written with artikel. The artikel depends on the case if it is Nominative, Akkusative or Dativ. And in German, based on those artikel, we have different work for saying “not”, “one” and “my”. Here, Artikel im Nominative is explained:

Der, Die, Das

maskulinfeminineneutrumplural
derdiedas(die)

1. Maskulin (der)

Suffixes:

-ant  -ast  -är  -ent  -et  -eur  -iker  -ismus  -ist  -ling  -loge  -ner  -oph  -ör  -tor

Examples:

  • der Tourismus
  • der Millionär
  • der Kontinent

Categories:

  • Days and Months
  • Seasons
  • Occupations
  • Alcohol (der Wein, der Tequila, …)

But! (das Bier)


2. Feminine (die)

Suffixes:

-ade  -heit  -keit  -je  -i  -in  -ine  -ion  -ive  -schaft  -sis  -thek  -unft  -ung  -ur

Examples:

  • die Schokolade
  • die Diskussion
  • die Gesundheit

3. Neutrum (das)

Suffixes:

-chen  -em  -et  -ett  -lein  -ma  -ment  -nis  -o  -sal  -tum  -um

Examples:

  • das Mädchen
  • das Aquarium
  • das Experiment

Examples

🚆 Group 1: Travel & Transport

📘 Vocabulary Table

Singular (DE)Plural (DE)Meaning (EN)
das Autodie Autoscar
das Babydie Babysbaby
der Bahnhofdie Bahnhöfetrain station
der Zugdie Zügetrain
die Bahndie Bahnenrailway
die Blumedie Blumenflower
der Briefdie Briefeletter (written message)
die Fahrkartedie Fahrkartenticket
die Flaschedie Flaschenbottle
das Fotodie Fotosphoto
das Gesprächdie Gesprächeconversation
die Reisedie Reisentravel, trip
die Stadtdie Städtecity
das Dorfdie Dörfervillage
das Zieldie Zieledestination / goal

🧳 Situational Sentences (Color-Coded)

German (A1)English
Ich plane meine erste Reise nach München.I plan my first trip to Munich.
München ist eine große Stadt und mein Ziel.Munich is a big city and my destination.
Ich fahre mit dem Auto zum Bahnhof.I drive by car to the train station.
Am Bahnhof kaufe ich eine Fahrkarte.At the station, I buy a ticket.
Der Zug kommt pünktlich an.The train arrives on time.
Im Zug sitzt eine Frau mit einem Baby.On the train, a woman sits with a baby.
Ich höre ein interessantes Gespräch über München.I hear an interesting conversation about Munich.
Ich trinke Wasser aus meiner Flasche.I drink water from my bottle.
Durch das Fenster sehe ich ein kleines Dorf mit schönen Blumen.Through the window, I see a small village with beautiful flowers.
Ich mache ein Foto von der Landschaft.I take a photo of the landscape.
In München schreibe ich einen Brief an meine Familie über die schöne Reise.In Munich, I write a letter to my family about the beautiful trip.

👪 Group 2: Family & Relationships

📘 Vocabulary Table

Singular (DE)Plural (DE)Meaning (EN)
die Mutterdie Müttermother
der Vaterdie Väterfather
die Großmutterdie Großmüttergrandmother
der Großvaterdie Großvätergrandfather
der Sohndie Söhneson
die Tochterdie Töchterdaughter
der Jungedie Jungenboy
das Mädchendie Mädchengirl
der Freunddie Freunde(male) friend
die Freundindie Freundinnen(female) friend
die Elternparents
die Beziehungdie Beziehungenrelationship
der Kussdie Küssekiss
der Menschdie Menschenperson / people
die Grüßegreetings

🧳 Situational Sentences (Color-Coded) - Meeting My Girlfriend’s Family

German (A1)English
Ich besuche heute meine Freundin Anna und ihre Familie.I visit my girlfriend Anna and her family today.
Zuerst treffe ich ihre Mutter und ihren Vater.First, I meet her mother and her father.
Die Eltern sind sehr nett zu mir.The parents are very nice to me.
Dann kommt die Großmutter und der Großvater.Then the grandmother and grandfather come.
Sie sagen freundliche Grüße zu mir.They say friendly greetings to me.
Anna hat einen kleinen Bruder - er ist ein Junge von 8 Jahren.Anna has a little brother - he is a boy of 8 years.
Sie hat auch eine Schwester - ein Mädchen von 12 Jahren.She also has a sister - a girl of 12 years.
Der Vater fragt mich über unsere Beziehung.The father asks me about our relationship.
Ich sage: “Ich liebe Anna sehr!”I say: “I love Anna very much!”
Anna gibt mir einen Kuss vor der Familie.Anna gives me a kiss in front of the family.
Alle Menschen in der Familie sind glücklich und lächeln.All people in the family are happy and smile.

🇩🇪 Deutsch🇬🇧 English
Ich bin Tourist und reise nach Deutschland.I am a tourist and travel to Germany.
In Berlin fühle ich mich krank und gehe zum Arzt.In Berlin, I feel sick and go to the doctor.
Im Krankenhaus hilft mir ein Krankenpfleger.At the hospital, a nurse helps me.
Der Arzt untersucht mich und sagt: „Alles ist gut!The doctor examines me and says: “Everything is fine!
Danach esse ich in einem Restaurant. Dort arbeitet ein Koch.After that, I eat in a restaurant. A cook works there.
Der Kellner bringt mir das Essen.The waiter brings me the food.
Am nächsten Tag besuche ich eine Bibliothek und lerne neue Wörter.The next day I visit a library and learn new words.
Ein Bibliothekar zeigt mir deutsche Bücher.A librarian shows me German books.

🏥 Group 4: Health, Safety & Emergencies - Emergency at the Airport

📘 Vocabulary Table

Singular (DE)Plural (DE)Meaning (EN)
der Unfalldie Unfälleaccident
der Arztdie Ärztedoctor (already in group)
der Krankenwagendie Krankenwagenambulance
die Polizeipolice (institution)
der Polizistdie Polizistenpoliceman

🧳 Sentences - Airport Emergency

🇩🇪 Deutsch🇬🇧 English
Ich bin am Flughafen und warte auf meinen Flug.I am at the airport and wait for my flight.
Plötzlich passiert ein Unfall vor mir.Suddenly, an accident happens in front of me.
Eine Frau fällt und verletzt sich.A woman falls and injures herself.
Ich rufe sofort den Krankenwagen.I call the ambulance immediately.
Der Arzt kommt und hilft der Frau.The doctor comes and helps the woman.
Auch die Polizei kommt zum Flughafen.The police also come to the airport.
Ein Polizist fragt mich, was ich gesehen habe.A policeman asks me what I saw.
Ich beschreibe den Unfall und helfe gern.I describe the accident and gladly help.
Alles ist bald wieder ruhig.Everything is calm again soon.

🎤 Group 5: Media & Communication - The Interview

📘 Vocabulary Table

Singular (DE)Plural (DE)Meaning (EN)
der Reporterdie Reporterreporter
das Radiodie Radiosradio
das Gesprächdie Gesprächeconversation/interaction
das Wortdie Wörterword
der Satzdie Sätzesentence
das Fotodie Fotosphoto
der Briefdie Briefeletter (mail)
der Textdie Textetext

🧳 Sentences - Radio Interview Story

🎤 Meine Geschichte – Das Interview (Ich-Form, A1–A2 Niveau)

🇩🇪 Deutsch🇬🇧 English
Ich gehe zum Radio-Sender.I go to the radio station.
Ein Reporter wartet auf mich.A reporter is waiting for me.
Wir haben ein Gespräch über meine Reise.We have a conversation about my trip.
Ich spreche viele Wörter, aber ich mache auch Fehler.I say many words, but I make mistakes too.
Der Reporter lacht und sagt: „Dein Deutsch ist gut!“The reporter laughs and says: “Your German is good!”
Nach dem Interview bekomme ich ein Foto.After the interview, I get a photo.
Ich schreibe später einen Brief an meine Familie.I write a letter to my family later.
In dem Brief erzähle ich den ganzen Text vom Interview.In the letter, I tell the whole text from the interview.

📦 Group 6: Everyday Items & Miscellaneous - Moving Day

📘 Vocabulary Table

Singular (DE)Plural (DE)Meaning (EN)
die Flaschedie Flaschenbottle
der Kofferdie Kofferluggage/bag
die Taschedie Taschenbag / purse
der Stockdie Stockwerkefloor (story)
das Erdgeschossdie Erdgeschosseground floor
die Toilettedie Toilettentoilet
das Jahrdie Jahreyear
das Alterdie Alterage
der Apfeldie Äpfelapple
der Ballondie Ballonsballoon
das Wetterweather
die Zukunftfuture
die Vergangenheitpast
die Gegenwartpresent

🧳 Sentences - Moving to a New Apartment

🇩🇪 Deutsch🇬🇧 English
Heute ist mein Umzugstag.Today is my moving day.
Ich packe meine Koffer und Taschen.I pack my suitcases and bags.
Im Erdgeschoss warte ich auf den Aufzug.On the ground floor, I wait for the elevator.
Ich ziehe in den dritten Stock.I move to the third floor.
Mein neues Zimmer ist hell und groß.My new room is bright and big.
Ich stelle eine Flasche Wasser in die Toilette.I put a bottle of water in the toilet room.
Ich esse einen Apfel und trinke Tee.I eat an apple and drink tea.
Draußen fliegt ein Ballon im Wetter.Outside, a balloon flies in the weather.
Ich denke an die Vergangenheit und freue mich auf die Zukunft.I think about the past and look forward to the future.
Die Gegenwart ist auch schön.The present is also beautiful.

Note: This needs to be checked if the current stories are for A1 level or not.

Deutsch Day 7: Jemanden kenenlernen

less than 1 minute read

Published:

  1. Wie heißen Sie? (Formall) / Wie heißt du? (Informall)
    • Ich heiße Bishnu.
  2. Wie ist ihr Familienname? (Formall) / Wie ist dein Familienname? (Informall)
    • Mein Familienname ist Khadka.
  3. Woher kommen sie? (Formall) / Woher kommst du? (Informall)
    • Ich komme aus Nepal.
  4. Wo Wohnen sie? (Formall) / Wo wohnst du? (Informall)
    • Ich wohne in Budhanilkantha, Kathmandu.
  5. Wie alt sind sie? (Formall) / Wie alt bist du? (Informall)
    • Ich bin 25 Jahre alt.
  6. Welche Sprachen Sprehen sie? (Formall) / Welche Sprachen Sprichst du? (Informall)
    • Ich spreche Nepalesisch, Englisch, und Deutsch.
  7. Was sind sie von Beruf? (Formall) / Was bist du von Beruf? (Informall)
    • Ich bin ein Forchung Assistent von Beruf.
  8. Sind sie verheiratet (Formall) / Bist du verheiratet? (Informall)
    • Nein, Ich bin Ledig.
    • Nein, Ich bin verheiratet nicht.
  9. Haben Sie Kind? (Formall) / Hast du Kind? (Informall)
    • Nein, Ich habe kein Kind.
  10. Was sind Ihre Hobbys? (Formall) / Was sind deine Hobbys? (Informall)
    • Mein Hobby sind Bücher lesen und Fußbal spielen.

Sich vorstellen

1 minute read

Published:

(INTRODUCTION)

Name

Ich heiße Bishnu.

Ich bin Bishnu.

Mein Name ist Khadka Bishnu.

First and Last Name

Mein Familienname ist Khadka.

Mein Vorname ist Bishnu.

Mein Nachname ist Khadka.

Land/Lände (Where are you from?)

Ich komme aus Nepal.

Ich wohne in Budhanilkantha, Kathmandu.

Sprache/Sprachen

Ich spreche Nepalesisch, Englisch, und Deutsch.

Age

Ich bin 25 Jahre alt.

Work

Ich bin ein Forchung Assistent von Beruf.

Married?

Ich bin Ledig.

Ich bin verheiratet nicht.

Childern?

Ich habe kein Kind.

Hobby?

Mein Hobby ist Bücher lesen.

Fragen und Antworten

  1. Wie heißen Sie? (Formall) / Wie heißt du? (Informall)
    • Ich heiße Bishnu.
  2. Wie ist ihr Familienname? (Formall) / Wie ist dein Familienname? (Informall)
    • Mein Familienname ist Khadka.
  3. Woher kommen sie? (Formall) / Woher kommst du? (Informall)
    • Ich komme aus Nepal.
  4. Wo Wohnen sie? (Formall) / Wo wohnst du? (Informall)
    • Ich wohne in Budhanilkantha, Kathmandu.
  5. Wie alt sind sie? (Formall) / Wie alt bist du? (Informall)
    • Ich bin 25 Jahre alt.
  6. Welche Sprachen Sprehen sie? (Formall) / Welche Sprachen Sprichst du? (Informall)
    • Ich spreche Nepalesisch, Englisch, und Deutsch.
  7. Was sind sie von Beruf? (Formall) / Was bist du von Beruf? (Informall)
    • Ich bin ein Forchung Assistent von Beruf.
  8. Sind sie verheiratet (Formall) / Bist du verheiratet? (Informall)
    • Nein, Ich bin Ledig.
    • Nein, Ich bin verheiratet nicht.
  9. Haben Sie Kind? (Formall) / Hast du Kind? (Informall)
    • Nein, Ich habe kein Kind.
  10. Was sind Ihre Hobbys? (Formall) / Was sind deine Hobbys? (Informall)
    • Mein Hobby sind Bücher lesen und Fußbal spielen.

Wie geht’s

1 minute read

Published:

In Deutsh, there are two ways of talking.

  1. Formal (formell)
  2. Informal (informell)

😁 Positive

formell (Formal)informell (Informal)
Wie geht es Ihnen? Mir geht es ________.
gut
sher gut
super
wunderbar
ausgezeichnet
toll
prima
Wie geht es dir? Mir geht es ________.
gut
sehr gut
super
wunderbar
ausgezeichnet
toll
prima

Translation:

How are you? I am ________.
good
very good
super
wonderful
excellent
nice
prime

😞 Negative

formell (Formal)informell (Informal)
Wie geht es Ihnen? Mir geht es ________.
schlecht
sher schlecht
nicht gut
müde
Krank
traurigh
Wie geht es dir? Mir geht es ________.
schlecht
sher schlecht
nicht gut
müde
Krank
traurigh

Translation:

How are you? I am ________.
bad
very bad
not good
tired
Krank
traurigh

And you?

<

formell (Formal)informell (Informal)
Und Ihnen? Mir geht es auch gut.
Und dir? Mir geht es auch gut.

Example:

formell (Formal)informell (Informal)
Sita: Hallo, Guten morgen, Herr Dulal.
Shämlal: Guten morgen, Frau Shah...Wie geht es Ihnen?
Sita: Mir geht es gut danke Und Ihnen?
Shämlal: Mir geht es auch gut...danke danke!!!
Sita: Hallo Shämlal, wie geht's?
Shämlal: Hallo Sita, mir geht es gut und dir?
Sita: Mir geht es auch gut...danke!!

Deutsch Day 4: Slangs

less than 1 minute read

Published:

Common Phrases

  • Danke - Thanks
  • Danke Schön - Thank you so much
  • Vielen Dank - Thank you very much
  • Bitte - Please
  • Entschuldigung - Excuse me
  • Entschuldigen sie - Excuse me
  • Es tut mir leid - I am sorry
  • noch einmal bitte - once again please
  • einen moment bitte - just a moment please
  • Wiederholen sie bitte! - Repeat again please
  • Ich verstehe (nicht) - I understand (not)
  • Ich weiß (nicht) - I know
  • Kaine Ahnung - No Idea
  • Kein Problem - No problem
  • Velleicht - May be
  • Ja - yes
  • Nein - No
  • Oky/Okay - Okay
  • Kapput - Broken

Die Zahlen

2 minute read

Published:

Die Zahlen

0 to 12 in Deutsch

  • 0 = null
  • 1 = eins
  • 2 = zwei
  • 3 = drie
  • 4 = vier
  • 5 = fünf
  • 6 = sechs
  • 7 = sieben
  • 8 = acht
  • 9 = neun
  • 10 = zehn
  • 11 = elf
  • 12 = zwölf

13-19 in Deutsch

  • 13 = dreizehn
  • 14 = vierzehn
  • 15 = fünfzehn
  • 16 = sechzehn
  • 17 = siebzehn
  • 18 = achtzehn
  • 19 = neunzehn

For numbers after 20, you use the “und” (and). First we add write the one’s place and then the ten’s place.

For example, 21 = einundzwanzig (ein+und+zwanzig) (1 and 20)

20-29 in Deutsch

  • 20 = zwanzig
  • 21 = einundzwanzig (ein+und+zwanzig) (1+and+20)
  • 22 = zweiundzwanzig
  • 23 = dreiundzwanzig
  • 24 = vierundzwanzig
  • 25 = fünfundzwanzig
  • 26 = sechsundzwanzig
  • 27 = siebenundzwanzig
  • 28 = achtundzwanzig
  • 29 = neunundzwanzig

30-39 in Deutsch

  • 30 = dreißig
  • 31 = einunddreißig
  • 32 = zweiunddreißig
  • 33 = dreiunddreißig
  • 34 = vierunddrießig
  • 35 = fünfunddrießig
  • 36 = sechsunddreißig
  • 37 = siebenunddreißig
  • 38 = achtunddreißig
  • 39 = neununddreißig

40-49 in Deutsch

  • 40 = vierzig
  • 41 = einundvierzig
  • 42 = zweiundvierzig
  • 43 = dreiundvierzig
  • 44 = vierundvierzig
  • 45 = fünfundvierzig
  • 46 = sechsundvierzig
  • 47 = siebenundvierzig
  • 48 = achtundvierzig
  • 49 = neunundvierzig

50-59 in Deutsch

  • 50 = fünfzig
  • 51 = einundfünfzig
  • 52 = zweiundfünfzig
  • 53 = dreiundfünfzig
  • 54 = vierundfünfzig
  • 55 = fünfundfünfzig
  • 56 = sechsundfünfzig
  • 57 = siebenundfünfzig
  • 58 = achtundfünfzig
  • 59 = neunundfünfzig

60-69 in Deutsch

  • 60 = sechzig
  • 61 = einundsechzig
  • 62 = zweiundsechzig
  • 63 = dreiundsechzig
  • 64 = vierundsechzig
  • 65 = fünfundsechzig
  • 66 = sechsundsechzig
  • 67 = siebenundsechzig
  • 68 = achtundsechzig
  • 69 = neunundsechzig

70-79 in Deutsch

  • 70 = siebzig
  • 71 = einundsiebzig
  • 72 = zweiundsiebzig
  • 73 = dreiundsiebzig
  • 74 = vierundsiebzig
  • 75 = fünfundsiebzig
  • 76 = sechsundsiebzig
  • 77 = seibenundsiebzig
  • 78 = achtundsiebzig
  • 79 = neunundsiebzig

80-89 in Deutsch

  • 80 = achtzig
  • 81 = einundachtzig
  • 82 = zweiundachtzig
  • 83 = dreiundachtzig
  • 84 = vierundachtzig
  • 85 = fünfundachtzig
  • 86 = sechsundachtzig
  • 87 = siebenundachtzig
  • 88 = achtundachtzig
  • 89 = neunundachtzig

90-99 in Deutsch

  • 90 = neunzig
  • 91 = einundneunzig
  • 92 = zweiundneunzig
  • 93 = dreiundneunzig
  • 94 = vierundneunzig
  • 95 = fünfundneunzig
  • 96 = sechsundneunzig
  • 97 = siebenundneunzig
  • 98 = achtundneunzig
  • 99 = neunundneunzig

100 in Deutsch

  • 100 = hundret

1000 in Deutsch

  • 1000 = tausend

Other examples

Here, from an English speaking perspective it can be viewed as the following syntax:

1000th place tausend 100s place hundred 10s place und 1s place

Note: They are written together.

  • 6791 = sechstausendsiebenhundreteinundneunzig
  • 7082 = siebentausendzweiundachtzig
  • 2125 = zweitausendeinhundretfünfundzwanzig

Deutsch Alphabets

1 minute read

Published:

German Alphabet with Nepali Pronunciation

German LetterNepali उच्चारण (Pronunciation)
aआ (ā)
bबे (be)
cच्से (tse)
dडे (ḍe)
eए (e)
fएफ (ef)
gगे (ge)
hहाह (hāh)
iई (ī)
jयट् (yaṭ)
kका (kā)
lएल (el)
mएम (em)
nएन (en)
oओ (o)
pपे (pe)
qकू (kū)
rएर् (er)
sएस (es)
tटे (ṭe)
uऊ (ū)
vफौ (phau)
wवे (ve)
xइक्‍स (iks)
yयुप्सिलोन (yupsilon)
zत्सेट (tset)

Umlaut Letters with Nepali Pronunciation

German LetterNepali उच्चारण (Ucchāraṇ)Notes
äए (e) or ऐ (ai)Similar to “e” in English “bed”
öओए (oe)Like “ö” in German “schön”
üयू (yu)Like French “u” or “ü”
ßएस्स (ess)Sharp “s” sound, like “ss”

Some rules for the pronunciation of words in Deutsch.

Rule 1: for S After Vowels

If vowels (a, e, i, o, u) precede S, it makes a (English J sound) (ja).

  • Examples:
    • Saजा (ja)
    • Siजी (jī)
    • Seजे (je)
    • Soजो (jo)
    • Suजू (jū)

Rule 2: for “ch” After Vowels

  • If ch is preceded by a, o, u, au, it gives a (kha) sound.

  • Examples:

    • cha (kha)
    • choखो (kho)
    • chuखू (khū)
    • chauखौ (khau)

Rule 3: for Vowel Combinations

  1. ei is pronounced as English “i”
  2. ie is pronounced as English “e”
  3. ig is pronounced as “ss”

Deutsch #1

1 minute read

Published:

Time of the day greetings

  • Guten Morgen (Good Morning) 🌄
  • Guten Tag (Good Afternoon) ☀️
  • Guten Abend (Good Evening) 🌆
  • Gute Nacht (Good Night) 🌙

Others

  • Tschüß/Ciao (bye) 👋
  • bis morgen (see you tomorrow)
  • Auf wiedershen (see you again, but said to someone you often meet)
  • Auf wiederhören (see you again, but said to someone you meet seldom)
  • bis bald (see you soon, said to someone you expect to see on the same day)
  • bis später (see you soon on a telephone)

Common Phrases

  • Danke - Thanks
  • Danke Schön - Thank you so much
  • Vielen Dank - Thank you very much
  • Bitte - Please
  • Entschuldigung - Excuse me
  • Entschuldigen sie - Excuse me
  • Es tut mir leid - I am sorry
  • noch einmal bitte - once again please
  • einen moment bitte - just a moment please
  • Wiederholen sie bitte! - Repeat again please
  • Ich verstehe (nicht) - I understand (not)
  • Ich weiß (nicht) - I know
  • Kaine Ahnung - No Idea
  • Kein Problem - No problem
  • Velleicht - May be
  • Ja - yes
  • Nein - No
  • Oky/Okay - Okay
  • Kapput - Broken
  • Ja klar - Of course
  • Alles klar - all is clear

Some frequently used words

  • viel - much
  • etwas - something
  • machmal - sometime
  • oft - often
  • naturalich - Naturally
  • weil - Because
  • hier - Here
  • diese - This, These
  • Es gibt - That is, It is… (among other)

Chapter 5 (Notes)

10 minute read

Published:

5.2.2. Regression Metrics

  • Regression refers to a predictive modeling problem that involves predicting a continuous (numeric) value rather than a class label.
  • It is fundamentally different from classification tasks, which involve discrete labels or categories.
  • Regression models are common in real-world tasks such as:
    • Estimating prices (houses, cars, electronics)
    • Predicting drug dosages based on patient characteristics
    • Forecasting transportation demand
    • Predicting sales trends using historical and market data
  • Unlike classification, where accuracy can directly evaluate performance, regression requires error-based metrics.
  • These metrics provide an error score summarizing how close the model’s predictions are to the actual values.
  • Understanding and interpreting these scores is crucial for developing robust and interpretable regression models.
  • There are four error metrics that are commonly used for evaluating and reporting the performance of a regression model; they are:
    • Mean Squared Error (MSE).
    • Root Mean Squared Error (RMSE).
    • Mean Absolute Error (MAE)
    • R-squared $(R^2)$

Mean Squared Error (MSE)

  • Mean Squared Error (MSE) is one of the most widely used metrics for evaluating the performance of regression models.
  • It measures the average of the squares of the errors—that is, the average squared difference between the actual and predicted values.
  • MSE is a loss function used in least squares regression and also serves as a performance metric.
  • It is the basis for least squares optimization, the core of many regression algorithms (e.g., Linear Regression).
  • Highlights large errors, making it ideal when penalizing significant mistakes is important.

MSE

  • Formula:

[\text{MSE} = \frac{1}{n} \sum_{i=1}^{n}(y_i - \hat{y}_i)^2]

  • Where:
    • $n$ : Total number of data points
    • $y_i$ : Actual (true) value of the $i^{th}$ data point
    • $\hat{y}_i$ : Predicted value of the $i^{th}$ data point
  • Characteristics:
    • Always non-negative (since errors are squared).
    • Units: Squared units of the target variable, making it less interpretable in its raw form.
    • Sensitive to outliers: Squaring errors disproportionately penalizes large mistakes.
    • Interpretation: Lower MSE values indicate better model performance.
  • Use MSE when?
    • You want to penalize larger errors more heavily.
    • You’re optimizing with algorithms that rely on gradient descent.
    • You are more interested in the overall performance than interpretability.

Root Mean Squared Error (RMSE)

  • RMSE is the square root of the Mean Squared Error (MSE).
  • It provides a measure of the average magnitude of the prediction error, but unlike MSE, it is in the same unit as the original data.
  • RMSE gives a higher weight to large errors due to the squaring step in MSE.
  • Formula

[\text{RMSE} = \sqrt{ \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 }]

  • Where:
    • $n$ : Total number of data points
    • $y_i$ : Actual (true) value of the $i^{th}$ data point
    • $\hat{y}_i$ : Predicted value of the $i^{th}$ data point
  • Here, squaring handles the magnitude, and the square root brings the unit back to original scale.
  • It is more interpretable than MSE because it’s in the same scale as the target.
  • It penalizes larger errors more heavily (like MSE).
  • Here, Lower RMSE indicates better model performance.

Mean Absolute Error (MAE)

  • MAE calculates the average absolute difference between predicted and actual values.
  • It is a linear score, meaning all errors are weighted equally in proportion to their size.
  • Formula
[\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} \lefty_i - \hat{y}_i \right]
  • Where:
    • $n$ : Total number of data points
    • $y_i$ : Actual (true) value of the $i^{th}$ data point
    • $\hat{y}_i$ : Predicted value of the $i^{th}$ data point
    • $y_i - \hat{y}_i$ : Absolute difference (always non-negative)
  • Easy to interpret and in the same unit as the original target.
  • Less sensitive to outliers than RMSE or MSE.
  • Good for general error analysis, especially when large errors aren’t critical.

R-squared (Coefficient of Determination)

  • R-squared $(R^2)$ measures the proportion of the variance in the dependent variable that is explained by the independent variables in the model.
  • In simpler terms, it tells us how well the model fits the data.
  • Formula

[R^2 = 1 - \frac{ \sum_{i=1}^{n} (y_i - \hat{y}i)^2 }{ \sum{i=1}^{n} (y_i - \bar{y})^2 }]

  • Where,
    • $y_i$ : Actual (true) value of the $i^{th}$ data point
    • $\hat{y}_i$ : Predicted value of the $i^{th}$ data point
    • $\bar{y}$: Mean of actual values
    • Numerator: Sum of squared errors (residual sum of squares, RSS)
    • Denominator: Total sum of squares (TSS)
  • Here,

    R-squared ValueInterpretation
    1Perfect prediction (all variance explained)
    0Model does not explain any variability
    < 0Model performs worse than a horizontal line
  • A higher R-squared generally means a better fit, but not always.
  • R-squared doesn’t indicate causality, and a high value may still be misleading if the model is overfitting or improperly specified.
  • It is a relative measure, it **compares model vs baseline (mean)

5.3. Model Validation Techniques

  • Model validation is the process of assessing how well a machine learning (ML) or artificial intelligence (AI) model performs — especially on unseen (new) data.
  • It ensures that the model:
    • Achieves its design objectives
    • Produces accurate and trustworthy predictions
    • Generalizes well beyond the training data
    • Complies with regulatory and quality standards
  • Purpose:
    • Evaluate Model Performance: Confirms that the model works not just on training data, but on real-world, unseen datasets.
    • Support Model Selection: Helps compare multiple models, choose the most appropriate one, and select optimal hyperparameters.
    • Prevent Overfitting: Assesses the risk of overfitting by observing how model performance changes on validation vs training data.
    • Build Trust: Ensures transparency and reliability, especially if validated by third-party or independent teams.
    • Comply with ML Governance: Contributes to AI governance by enforcing policies, monitoring model activity, and validating data/tooling quality.
  • Benefits:
    • Ensures Generalization
    • Guides Model Selection
    • Improves Accuracy and Robustness
    • Builds Regulatory Confidence
    • Prevents Future Failures
  • Without model validation, we risk deploying models that look good on paper but fail in practice.

Train-Test vs. K-Fold Cross Validation

5.3.1. Train-Test Split

  • The train-test split is the most basic method of model validation.
  • It involves splitting the dataset into two parts:
    • Training set: Used to train the model
    • Test set: Used to evaluate model performance on unseen data
  • Common ratios include:
    • 70% training / 30% testing
    • 80% training / 20% testing
    • 60% training / 40% testing (for very small datasets)
  • Advantages:
    • Simple and fast
    • Good for large datasets
    • Helps quickly estimate model performance
  • Disadvantages
    • High variance: Model evaluation may vary significantly based on how the data was split
    • Risk of under- or over-estimating model performance due to random splits
    • Not ideal for small datasets

5.3.2. Cross Validation

  • Cross-validation is a more robust model validation technique where the model is trained and tested multiple times on different data subsets.
  • Helps to assess how the model generalizes to an independent dataset.

5.3.2.1. K-Fold Cross Validation

  • The dataset is split into K equally sized “folds”.
  • The model is trained on K−1 folds and tested on the remaining 1 fold.
  • This process is repeated K times, each time using a different fold as the test set.
  • The average performance over the K trials is used as the final evaluation metric.

[\text{CV}{\text{score}} = \frac{1}{K} \sum{i=1}^{K} \text{score}_i]

  • Where,
    • $K$: Number of folds
    • $\text{score}_i$: Evaluation metric (e.g., RMSE, MAE, R²) on the $i^{th}$ fold
  • Advantages
    • Reduces variance in performance estimation
    • Uses the entire dataset for both training and testing
    • More reliable for small to medium-sized datasets
  • Disadvantages:
    • Computationally expensive, especially for large datasets or complex models
    • May not work well if data is not independently and identically distributed (i.i.d.)

5.4 Hyperparameter Tuning

  • Hyperparameter tuning (also called hyperparameter optimization) is the process of searching for the best set of hyperparameters that maximizes a machine learning model’s performance on a given task.
  • Unlike model parameters (which are learned from training data), hyperparameters are set before training and control the learning process itself.
  • It determines how well a model learns and generalizes to new data.
  • Poorly tuned hyperparameters can lead to:
    • Underfitting (high bias, poor performance)
    • Overfitting (high variance, poor generalization)
  • Good tuning results in:
    • Minimized loss
    • Improved accuracy
    • Robust performance
    • Balanced bias-variance tradeoff
  • It’s essential for real-world applications in healthcare, finance, autonomous driving, etc.
  • Common hyperparamter for a Neural Network:
    • Learning Rate (η): Controls step size in gradient descent.
      • High → fast but unstable
      • Low → stable but slow
    • Epochs: Number of passes over entire training dataset
    • Batch Size: Number of samples per gradient update
    • Hidden Layers / Neurons: Affects capacity and depth of the network
    • Activation Function: ReLU, Sigmoid, Tanh; adds non-linearity
    • Momentum: Helps accelerate training by smoothing gradients
    • Learning Rate Decay: Reduces learning rate over time
  • Objective: Minimize loss or maximize metric score
  • Process:
    1. Choose a set of hyperparameters
    2. Train the model
    3. Evaluate performance (e.g., via cross-validation)
    4. Repeat until best configuration is found
  • Mathematical Representation

[\theta^* = \arg\min_{\theta \in \Theta} \ \mathcal{L}(f_{\theta}(X_{\text{train}}), y_{\text{train}})]

where,

  • θ : Set of hyperparameters
  • $\mathcal{L}$: Loss function
  • $f_{\theta}$: Model with configuration θ\thetaθ

Grid Search vs. Random Search

  • Exhaustively tests all combinations in a hyperparameter grid
  • Best for small search spaces
  • Steps:
    • Choose the model and its hyperparameters to tune.
    • Specify a set of possible values for each hyperparameter.
    • Build a grid containing all combinations.
    • Train and validate the model for each combination.
    • Select the combination that produces the best score (e.g., lowest MSE or highest accuracy).
  • Advantages:
    • Guaranteed to find optimal combo (if within grid)
    • Simple to understand
  • Disadvantages:
    • Computationally expensive
    • Doesn’t scale well with more parameters or wider ranges
  • When to use:
    • For small search spaces
    • When computational resources are not a constraint
    • When accuracy is critical and time is available
  • Randomly samples combinations from the search space
  • Efficient when not all hyperparameters are equally important
  • This process continues for a fixed number of iterations or until computational budget is exhausted.
  • Advantages:
    • Much faster than grid search
    • Can explore larger spaces with fewer evaluations
  • Disadvantages:
    • No guarantee of finding the absolute best configuration
    • Results may vary between runs (unless random seed is fixed)
    • Might miss promising combinations if not enough iterations
  • Steps:
    • Select model to tune
    • Define distributions for hyperparameters
    • Set number of iterations (e.g., 10–100)
    • Choose evaluation metric (e.g., accuracy, RMSE)
    • Randomly sample hyperparameter combinations
    • Train and validate model for each combination
    • Compare scores across all runs
    • Pick best combination
    • (Optional) Retrain model on full data
  • When to use:
    • When hyperparameter space is large
    • When computational resources or time are limited
    • When exact optimal values are not critical, but good performance is sufficient

GPTopic: Dynamic and Interactive Topic Representations

less than 1 minute read

Published:

Notes

Background to the paper

Thielmann 2020

  • Unsupervised document classification for imbalanced data sets poses a major challenge as it requires carefully curated dataset.
  • The authors propose an integration of web scraping, one-class Support Vector Machines (SVM) and Latent Dirich-let Allocation (LDA) topic modelling as a multi-step classification rule that circumvents manual labelling.

Weisser 2023

  • Topic modeling with method like Latent Dirichlet Allocation (LDA) topic modeling (most commonly used model) is satisfactory for large text, but for small texts like (tweets) it is challenging.
  • The author compared the performance of LDA, Gibbs Sampler Dirichlet Multinomial Model (GSDMM) and the Gamma Poisson Mixture Model (GPM), which are specifically designed for sparse data.
  • They found that GSDMM and GPM are better for sparse data.

Notes

  • EMBEDDING → DIMENSIONALITY REDUCTION → CLUSTERING → TOP WORDS ( I-TF-IDF and COSINE SIMILARITY) → LLM (GPT 4.0 AND GPT 3.5) for topic naming and description
  • Answering questions using LLM (ChatGPT + RAG Implementation)
  • Topic Modification
    • splitting using keywords.
    • splitting using k-means

Chapter Notes: LLM Engineer’s Handbook - RAG Feature Pipeline

5 minute read

Published:

Book:

Chapter 4: RAG Feature Pipeline

  • Retrieval-augmented generation (RAG) Feature Pipeline: Qdrant vector DB for online serving and ZenML artifacts for offline training.
  • Naive RAG
    • Chunking
    • Embedding
    • Vector DBs
  • Chapter teaches you what RAG is and how to implement it.
  • The main sections of this chapter are:
    • Understanding RAG
    • An overview of advanced RAG
    • Exploring the LLM Twin’s RAG feature pipeline architecture
    • Implementing the LLM Twin’s RAG feature pipeline
  • A RAG system is composed of three main modules independent of each other:
    • Ingestion pipeline: A batch or streaming pipeline used to populate the vector DB
    • Retrieval pipeline: A module that queries the vector DB and retrieves relevant entries to the user’s input
    • Generation pipeline: The layer that uses the retrieved data to augment the prompt and an LLM to generate answers

Ingestion Pipeline

  • For the ingestion pipeline, first we need to collect the data.
  • This can be from DBs, APIs, or web pages. And depending on the source, your cleaning step might be different.
  • The cleaned data is then chunked (depending on the model’s input size)
  • Then the chunks are embedded.
  • The chunks data along with it’s metadata is taken by the loading module.

So the flow for ingestion pipeline is: Collect->Clean->Chunk->Embed->Load

Retrieval Pipeline

  • A retrieval pipeline uses the user input to output the similar chunks of data.
  • For this, first, the user input need to be translated to the same vector spaces as the chunks of data.
  • Then we use distance formula to get the ‘K’ nearest elements to it.
  • Those elements is used to augment the prompt.

Here, cosine distance formula is one of the most popular distance formula used to get the distance. But it is said that the distance formula depends on the data and the embedding model we have. How do we decide on the best distance formula?

Generation Pipeline

  • The final prompt results from a system and prompt template populated with the user’s query and retrieved context. You might have a single prompt template or multiple prompt templates, depending on your application. Usually, all the prompt engineering is done at the prompt template level.

Critical aspects affecting the accuracy of RAGs:

  • Embedding used,
  • Similarity function used.

Embeddings

  • Algorithm for creating vector indexes: Random Projection, Hierarchial Navigable Small World (HNSW), Product Quantization (PQ), and Locality Sensitive Hashing (LSH).

The vanilla RAG framework we just presented doesn’t address many fundamental aspects that impact the quality of the retrieval and answer generation, such as:

  • Are the retrieved documents relevant to the user’s question?
  • Is the retrieved context enough to answer the user’s question?
  • Is there any redundant information that only adds noise to the augmented prompt?
  • Does the latency of the retrieval step match our requirements?
  • What do we do if we can’t generate a valid answer using the retrieved information?

Therefore, for RAG we need two things;

  • robust evaluation of retrieval
  • retrieval limitation should be addressed in the algorithm itself.

Advanced RAG

The vanilla RAG design can be optimized at three different stages:

  • Pre-retrieval
  • Retrieval
  • Post-retrieval

Pre-retrieval

most of the data indexing techniques focus on better preprocessing and structuring the data to improve retrieval efficiency, such as:

  • Sliding Window
  • Enhancing Data Granularity
  • Metadata
  • Optimizing index structures
  • Small-to-big

For query optimization,

  • Query routing
  • Query rewriting
    • Paraphrasing
    • Synonym substitution
    • Sub-queries
    • Hypothetical document embeddings (HyDE)
  • Query Expansion
    • Self-Query

Retrieval Pipeline Optimization

There are two ways

  • Improve the Embedding model
    • by fine-tuning the pre-trained model (very computationally costly, evan financially)
    • using Instruction models (less costly)
  • Leveraging the DB’s filter and search features

Post-Retrieval Pipeline Optimization

  • Re-ranking
  • Prompt compression

Exploring LLM Twin’s RAG feature pipeline

To implement the RAG feature pipeline, we have two design choice:

Batch PipelineStreaming Pipeline
regular intervalcontinuous
simplecomplex
when data processing is not criticalwhen it is critical
handles large data efficientlyhandles single data points

Core steps for RAG feature pipeline

  • Data Extraction
  • Cleaning
  • Chunking
  • Embedding
  • Data Loading: Embedding + Metadata + Chunks

Change data capture (CDC)

  • a strategy that allows you to optimally keep two or more data storage types in sync without computing and I/O overhead.
  • It captures any CRUD operation done on the source DB and replicates it on a target DB.
  • Optionally, you can add preprocessing steps in between the replication.

The CDC (Change Data Capture) pattern addresses these issues using two main strategies:

  • Push: The source DB actively sends changes to targets, enabling real-time updates. A messaging system buffers changes to prevent data loss if targets are unavailable.
  • Pull: The source DB logs changes, and targets fetch them periodically. This reduces source load but introduces delays; a messaging buffer ensures reliability.

The main CDC patterns that are used in the industry:

  • Time-stamp based: overhead to the source as we have to query the whole table/dataset.
  • Trigger based: same overhead.
  • Log-based: no overhead to the source system, however since logs are not standardized, we have to implement vendor-wise implementations.

Why is the data stored in two snapshots?

  • After the data is cleaned: For fine-tuning LLMs
  • After the documents are chunked and embedded: For RAG

Why Triton is gaining popularity?

1 minute read

Published:

References:

In a typical machine learning(ML) work flow, we program the feature production, training, and inference. We do that mostly using frameworks to write high-level program and not have to manage the low level details required for ML or deep learning (DL). The pytorch or tensorflow (frameworks) calls the cuda if available and the operations are now performed on the GPU. DL models have achieved state-of-the-art (SOTA) performance in multiple-domains due to their hierarchial structure of parametric as well as non-parametric layers. Therefore, CUDA has to descide on how to perform the operations. Libraries like cuBLAS, cuDNN, or PyTorch’s built-in kernels are highly optimized for common operations (matrix multiply, convolutions, etc.). But if our applications have specialized algorithms, unique data layouts, and non-standard precision or formats, then CUDA might not perform well. Therefore, you write CUDA program for faster execution.

However, CUDA programming is very manual and tedious. It works on the principle of Scalar Program, Blocked threads. This means we have to define what each thread does and manage it. It is a low-level programming method. Therefore, Triton was developed to make the specialized algorithms faster and CUDA programming a little less tedious and manual. Triton is a high-level CUDA programming method. It works on the principle of Blocked Program, Scalar Threads. This means that instead of managing each thread we manage a group of threads instead. And Trition handles the actual operation based on our memory flow and our data flow and chooses the optimum way to perform the given task/operation. Making it faster for the specialized use cases.

Therefore, Triton has gained popularity and is helping researchers and developers with cuda programming.

Book Notes: LLM Engineer’s Handbook

7 minute read

Published:

Book:

Notes

  • An LLM engineer should have the knowledge in the following:
    • Data preparation
    • Fine-tune LLM
    • Inference Optimization
    • Product Deployment (MLOps)
  • What the book will teach:
    • Data Engineering
    • Supervised Fine-tuning
    • Model Evaluation
    • Inference Optimization
    • RAG
  • For every project there must be planning. And the three planning steps the book talks about is as follows:
    1. Understand the problem
      • What we want to build ?
      • Why are we building it?
    2. Minimal Viable Product reflecting real-world scenario.
      • Bridge the gap between the idealistic and the reality of what can be built.
      • What are the steps that is required to build it?
      • not clear on this part
    3. System Design step
      • Core architecture and design choices
      • How are we going to build it?
  • What the book covers:

Chapter 1: Understanding

  • The chapter covers the following topics:
    • Understanding the LLM Twin concept
    • Planning the MVP of the LLM Twin product.
    • Building ML systems with feature/training/inference pipelines
    • Designing the system architecture of the LLM Twin
  • The key of the LLM Twin stands in the following:
    • What data we collect
    • How we preprocess the data
    • How we feed the data into the LLM
    • How we chain multiple prompts for the desired results
    • How we evaluate the generated content
  • We have to consider how to do the following (MLOps):
    • Ingest, clean, and validate fresh data
    • Training versus inference setups
    • Compute and serve features in the right environment
    • Serve the model in a cost-effective way
    • Version, track, and share the datasets and models
    • Monitor your infrastructure and models
    • Deploy the model on a scalable infrastructure
    • Automate the deployments and training
  • In every software architecture, Database->Business Logic->UI. And, any layer can be as complex as required. But for ML, what do we require? Well, that is the FTI architecture. Feature->Training->Inference.

FTI Architecture

To conclude, the most important thing you must remember about the FTI pipelines is their interface:

  • The feature pipeline takes in data and outputs the features and labels saved to the feature store.
  • The training pipeline queries the features store for features and labels and outputs a model to the model registry.
  • The inference pipeline uses the features from the feature store and the model from the model registry to make predictions.

Requirements of the ML system from a purely technical perspective:

  • Data
    • collect
    • standardize
    • clean the raw data
    • create instruct database for fine-tuning an LLM
    • chunk and embed the cleaned data. Store the vectorized data into a vector DB for RAG.
  • Training
    • Fine-tune LLMs of various sizes
    • Fine-tune on instruction datasets of multiple sizes.
    • Switch between LLM types
    • Track and compare experiments.
    • Test potential production LLM candidates before deploying them.
    • Automatically start the training when new instruction datasets are available.
  • Inference
    • A REST API interface for clients to interact with the LLM
    • Access to the vector DB in real time for RAG.
    • Inference with LLMs of various sizes
    • Autoscaling based on user requests
    • Automatically deploy the LLMs that pass the evaluation step
  • LLMOPs
    • Instruction dataset versioning, lineage, and reusability
    • Model versioning, lineage, and reusability
    • Experiment tracking
    • Continuous training, continuous integration, and continuous delivery (CT/CI/CD)
    • Prompt and system monitoring

LLM Twin high-level architecture

Chapter 2: Tooling and Installation

  • The chapter covers:
    • Python ecosystem and project installation
    • MLOps and LLMOps tooling
    • Databases for storing unstructured and vector data
    • Preparing for AWS
  • Any Python project needs three fundamental tools: the Python interpreter, dependency management, and a task execution tool.
  • Poetry is one of the most popular dependency and virtual environment managers within the Python ecosystem.
  • An orchestrator is a system that automates, schedules, and coordinates all your ML pipelines. It ensures that each pipeline—such as data ingestion, preprocessing, model training, and deployment—executes in the correct order and handles dependencies efficiently.
  • ZenML is one such orchestrator.
    • It orchestrates by pipelines and steps. They are just python functions. Where steps are called in pipeline functions. Modular code should be written for this.
    • ZenML transforms any step output into artifacts.
    • Any file produced during the ML lifecycle.
  • Experiment Tracker:
    • Training ML models is an entirely iterative and experimental process. Therefore, an experiment tracker is required.
    • CometML is one that helps us in this aspect.
  • Prompt monitoring
    • you cannot use standard logging tools as prompts are complex and unstructured chains.
    • Optik is simple to use prompt monitoring compared to other prompt monitoring tools.
  • MongoDB, NoSQL dataset.
  • Qdrant, vector database.
  • For our MVP, AWS, it’s the perfect option as it provides robust features for everything we need, such as S3 (object storage), ECR (container registry), and SageMaker (compute for training and inference).

Chapter 3: Data Engineering

In this chapter, we will study the following topics:

  • Designing the LLM Twin’s data collection pipeline
  • Implementing the LLM Twin’s data collection pipeline
  • Gathering raw data into the data warehouse

An ETL pipeline involves three fundamental steps:

  • We extract data from various sources. We will crawl data from platforms like Medium, Substack, and GitHub to gather raw data.
  • We transform this data by cleaning and standardizing it into a consistent format suitable for storage and analysis.
  • We load the transformed data into a data warehouse or database.

Collect and curate the dataset

  • From raw data, Extract -> Transform -> Load into MongoDB. (ETL)
    • crawling
    • standardizing data
    • load into data warehouse

Chapter 5: Supervised Fine-Tuning

  • SFT refines the model’s capabilities (here model refers to pre-trained model that can predict the new sequence) learning to predict instruction-answer pair.
  • Makes the general ability of pre-trained LLMs of understanding language to specific application, or in this case more conversational.
  • In this chapter, the authors cover the following topics:
    • Creating a high-quality instruction dataset
    • SFT techniques
    • Implementing fine-tuning in practice

Chapter 6: Fine-Tuning with Preference Alignment

  • SFT cannot address a human’s preference of how a conversation should be, therefore we use preference alignment, specifically the Direct Preference Optimization (DPO).
  • Authors cover the following topics in this chapter:
    • Understanding preference datasets
    • How to create our own preference dataset
    • Direct preference optimization (DPO)
    • Implementing DPO in practice to align our model

Chapter 7: Evaluating LLMs

  • no unified approach to measuring a model’s performance but there are patterns and recipes that we can adapt to specific use cases.
  • The chapter covers:
    • Model evaluation
    • RAG evaluation
    • Evaluating TwinLlama-3.1-8B

Chapter 8: Inference Optimization

  • Some tasks like document generation take hours and some tasks like code completion take a small amount of time, this is why optimization of the inference is quite important. The things that are optimized are the latency (the speed of the generation of the first token), throughput (number of tokens generated per second), and memory footprint of the LLM.
  • The chapter covers:
    • Model optimization strategies
    • Model parallelism
    • Model quantization

Chapter 9: RAG Inference Pipeline

  • Where the magic happens for the RAG system.
  • The chapter covers the following topics:
    • Understanding the LLM Twin’s RAG inference pipeline
    • Exploring the LLM Twin’s advanced RAG techniques
    • Implementing the LLM Twin’s RAG inference pipeline

Chapter 10: Inference Pipeline Deployment

  • The chapter covers:
    • Criteria for choosing deployment types
    • Understanding inference deployment types
    • Monolithic versus microservices architecture in model serving
    • Exploring the LLM Twin’s inference pipeline deployment strategy
    • Deploying the LLM Twin service
    • Autoscaling capabilities to handle spikes in usage

Chapter 11: MLOps and LLMOps

  • This chapter covers:
    • The path to LLMOps: Understanding its roots in DevOps and MLOps
    • Deploying the LLM Twin’s pipelines to the cloud
    • Adding LLMOps to the LLM Twin

Dynamic Arrays

2 minute read

Published:

Resources:

Dynamic Array

  • should be able to change the shape of the array dynamically.
    • should be able to add/delete element fast
    • should be able to insert/delete a element in the middle.
  • since we need to make this as efficient as possible. Let’s try what would we have done if we had to invent it for ourself.

  • First, we take the functionality of it and try to simplify it as much as possible.
  • Here, let’s take only the ‘adding dynamically’ part.

Adding Dynamically

  • say we have an fixed array of 4 elements. Then, how can we make it such that we can add an element to it.
  • Here, we know that we need to describe on a fixed space required for our task before hand to utilize a memory (refer to how memory works).

Alternative #1: Make an array of 5 element then copy all the data to the new array.

  • Here, using this we can make an dynamic array. However, it is very expensive to do this for huge amount of data.
  • For example, for an 1M length array, we need to perform around 90 billion copies.
  • Here lets assume we are continuously adding element to the array. So, for 5th element we need 5 copying operation. For the 6th element, we need to first create a new array of size 6 and copy the 6 elements. So here our total operations is 5+6. For the 7th element, it is 5+6+7. In big O notation it is $O(N^2)$.

Alternative #2: Making a new array of size of the fixed array + 8 (say).

  • Here, it will reduce a lot of copying operations, however it is still of $O(N^2)$ complexity.

Alternative #3: Making the new array the double size of the array.

  • Here, the number of copying operation needed for an array of size N is always N. So it’s complexity is $O(N)$.
  • This is very cool problem, so if you are math savvy then take out a piece of paper and do the math, it is quite fun to think about this problem. Find how this is $O(N)$.
  • This is how programming languages define dynamic arrays.

For deletion, say if the filled element is less that the half of the size then we reduce the size of array by half. Here, this means our memory usage has also been optimized.

Similarly, with this way we can also easily perform insertions and deletion from the middle or front of the array with high speed .

Youtube: Everything I Learned at Stanford Business School in 28 Minutes

2 minute read

Published:

Links:

Notes

1. Stanford MBA Module #1 : Business Strategy

  • The foundation of a successful company and the most important thing for a business is it’s business strategy. Business strategy is the game plan you employ to create a successful business.
  • To learn how to create a great business plan is to study the business plan of the multi-billion dollar companies.
  • And the most efficient way to study other business plan, is to use the “Porter’s Five Forces”

1.1. Porter’s Five Forces

  • The five forces are:
    1. Competition: How competitive is your market?
    2. Substitute : How many substitute are there in the market?
    3. New Entry: How susceptible is your product to a new entry.
    4. Buyer's Power: How much the buyer have power over you?
    5. Supply Chain : How much of the supply chain you control?
  • To understand this, John Ha, the maker of the video takes the example of Apple.
  • Apple has huge competition, in the phone market, the laptop market, the PC market and other thing in the name of Samsung, Microsoft, etc. Additionally, there are many substitute to apple products. Here, from the first two we can clearly analyze that Apple should have a hard time. However, they do not. In fact, they thrive in this environment. But why?
  • The answer is their Ecosystem-lock. Once you are in the apple eco-system, it is very hard to get out of it. Therefore, even if a new product is launched, the tendency of the apple user to choose a different company is pretty low. This means that Apple is not at all susceptible to new products. It does not matter if your first apple device is an iPhone, a MacBook, or an iPad, it is most likely the first apple product of many that you are going to own. And that is because of the ecosystem that Apple has built.
  • In addition to this, due to this ecosystem people tend to buy Apple. For example if there is a group in iMessage and an android user is in the group, it would cost money to send the person a text instead of using Wi-fi. Therefore, people have a social pressure to buy Apple. Then, once you have bought one Apple product, due to their superior inter-device functionality, it will most likely not be your last Apple device. So buyer’s power over Apple is less.
  • Apple also has a really good supply chain, of which it has a very good control on. They tend to make most of the things they use, therefore the price can be controlled. Also, if another company tries to compete with you, due to your superior supply chain, it would be really hard for another company to compete.
  • These are the reason why Apple is the first trillion dollar public company (first company was PetroChina).

A Survey on DeepTabular Learning

less than 1 minute read

Published:

My initial thoughts and what I would like to get out of this?

  1. Can DL methods outperform Decision Tree based models on tabular data?
  2. What is the common reason between different models which works well for tabular data?
  3. In terms of training time, does DL methods have any chance against Decision Based Model?

Notes

Abstract

The StatQuest Illustrated Guide to Machine Learning!!!

5 minute read

Published:

Chapter 1 : Fundamental Concepts of ML

What is Machine Learning (ML)?

  • According to the author, ML is a collection of tools and techniques that transforms data into decisions.
  • Basically, ML is about 2 things:
    1. Classifying things (Classification) and
    2. Quantifying Predictions (Regression).
  • Comparing ML Methods:
    • To choose, which methods to use for your application, we can just compare the prediction of the method/model with the actual outcomes. This is called evaluation of a model and the metrics used are called evaluation metrics.
    • For this, we first fit the model to the training data.
    • Then make predictions based on the trained model.
    • Then we evaluate the predictions made on test set with the actual outcome.
    • We can do this for different model/methods and based on the evaluation metrics we can select a suitable method for our application.
    • Here, just because a machine learning methods fits the training data well, it doesn’t mean it will perform well with the Testing Data.
    • Fit Train Data well but poor predictions = Overfitting
    • Doesn’t fit train data well = Underfitting.
  • Independent and Dependent Variables
    • variable: value of which vary from record to record.
    • Say that we have two variables, ‘height’ and ‘weight’. And let us also say that height prediction depends on weight of a person, then here, the ‘height’ is a dependent variable, and ‘weight’ is an independent variable, as this variable used to predict a dependent variable.
    • Here, the independent variables are also called features.
  • Discrete and Continuous Data
    • discrete data: countable values. only takes specific values.
    • continuous data: measurable values under a particular pre-defined range.

Chapter 2: Cross Validation

  • From Chapter 1 we learned, that we train the model on ‘train set’ and evaluate the model on ‘test set’.
  • But how do we decide on what data points to choose for ‘test set’ or the ‘train set’.
  • The answer is cross validation.
  • Say you have 10 data points. And let us say that we have chosen a 80/20 train-test split. This means that we are going to assign 8 points randomly to train set and the rest of the 2 data point for test set. Here the 2 points chosen will not be used in test set for the next cross validation. So, for second we choose another 2 data points for test set and remaining for train set. We can do this 5 times, since our total data points is 10 and we have chosen a 80/20 train-test split.
  • Therefore, cross validation is a way solving the problem of not knowing which points are the best for testing by using them all in an iterative way.
  • You can also think of it like make 5 groups. And each time using one group as the ‘test set’ and remaining as the ‘train set’.
  • The number of iterations/groups are also called folds. Therefore, this is an example of 5-fold cross validation.
  • But why can’t we use all the data as ‘train set’.
    • Because, only way to determine if a model has been overfit or not is to evaluate on new data.
    • Reusing same data points for training and testing is called Data Leakage.
  • The main advantage of cross-validation is that it is a proper measure of how good a model has performed, instead of relying on chance for train-test split. Here, if test set is by chance easy, then the model will be interpreted as better than it actually is.
  • When we have a lot of data, 10-Fold Cross Validation is commonly used.
  • Another commonly used cross validation is Leave-One-Out.
    • used all but one point for training, and the remaining point for testing.
    • iterate until every single point has been tested.
    • we usually use this for small dataset.
  • Commonly, sometimes a particular model performs better in some iteration and another model can perform better in other iteration. In such case we use Statistics to decide the better model.

Chapter 3: Fundamental Concepts in Statistics!!!

  • Main Idea of Statistics:
    • Statistics provide us a set of tools to quantify the variation that we find in everyday life.
    • For example, the number of fries you get in a bucket is not always the same. But say that we track it. Then from statistics, we can predict how many fires will we have tomorrow. And how confident can we be in that prediction can also be determined.
    • Here, say that you predict a positive result, but are not confident, then you will look for alternative approach.
    • We know to make a prediction, we need to understand the trend of data.
      • And histogram is a good way visualizing the trend of data.
        • divide the range into number of bins.
        • and stack the element based on the frequency of element that fall into a bin.
        • Here, the question to think of when making a histogram, is the number of bins you should use.
        • A Naive Bayes algorithm makes prediction using histogram.
        • Calculating probability:
          • Probability of occurrence of something is the total number of occurrence divided by the number of observations made.
          • Here, the more number of observation we have more confident we can be of our predictions.
          • But, we know that collecting more samples is expensive both monetarily and time-wise.
          • We can solve this problem using Probability Distribution.
  • Probability Distribution

Why do tree-based models still outperform deep learning tabular data?

6 minute read

Published:

My initial thoughts and what I would like to get out of this?

  • What makes it difficult for learning tabular-data for deep-learning algorithms or even Neural Networks?
  • What is the correct way for benchmarking?

Notes on the paper

  • There are benchmarks for Deep Learning, however not much for tabular-data.
  • The superiority of GBTs over NNs is explained by specific features of tabular data:
    • irregular patterns in the target function,
    • uninformative features, and
    • non rotationally-invariant data where linear combinations of features misrepresent the information.
  • the paper defines
    • a standard set of 45 datasets from varied domains with clear characteristics of tabular data and
    • a benchmarking methodology accounting for
      • fitting models and
      • finding good hyperparameters.
  • results show that tree-based models remain SOTA on medium-sized dta (~10K samples)
  • Inductive biases that lead to better performance for tree-based models for tabular data
    1. NNs are biased to overly smooth solutions
      • To test the effectiveness of learning smooth functions, the authors smoothed the train set using Gaussian Kernel.
      • Smoothing degrades the performance of the decision tress but does not effect the performance of the NNs.
      • NN’s are biased towards low-frequency functions. What does this mean?
      • regularization and careful optimization does help NNs learn irregular patters/functions.
      • Periodic embedding of PLE might help learn the high-frequency part of the target function.
      • This also explain why ExU activation is better for tabular deep learning.
      • Now the question is why does the NNs fail to learn irregular patterns? And why does PLE help NNs learn better. Does this make the train set more regular?
    2. Uninformative features affect more MLP-like NNs.
      • MLP like structure have harder time to filter out uninformative features as compared to GBTs.
      • What does uninformative features mean?
        • Features that does not provide meaningful or useful information to help make predictions or gain insights from the data.
      • For GBT, even if we remove half of the features(informative as well as uninformative), the performance does not degrade as much.
      • However, for NNs (Resnet, FT-Transformer) removing features negatively affect the performance of the model.
        • Therefore, they are less robust to uninformative features.
      • MLP like structure’s inherent rotational invariance prevents it from easily isolating and ignoring uninformative features when they are mixed through linear transformations.
      • However the, weak learners in GBTs, recursively partition the feature space by making splits based on individual feature value. Therefore they are not rotationally invariant, and therefore can easily filter out uninformative features.
      • Here, FT-Transformer requires a embedding layer due to its use of Attention mechanism. But this embedding transports the feature into different embedding space, breaking the rotational invariant bias of a MLP like architecture.
    3. Data are non invariant by rotation, so should be learning procedures.
      • Why are MLPs much more hindered by uninformative features, compared to other models?
        • Random has not lead to much difference in performance of ResNet, and leads to slight degradation in FT-Transformer. But hugely disrupts the performance of GBTs.
        • This suggests that rotation invariance is not desirable: similarly to vision [Krizhevsky et al., 2012],
        • We note that a promising avenue for further research would be to find other ways to break rotation invariance which might be less computationally costly than embeddings.
  • Challenges to build tabular-specific NNs as per the authors
    1. be robust to uninformative features,

    2. preserve the orientation of the data, and

    3. be able to easily learn irregular functions.

  • Deep Learning architectures have been crafted to create inductive bias matching invariance and spatial dependencies of data.
    • This means that the inherent assumptions of the model, like what structure does the model expect the input data to have help match with the invariance of the model (for example., how CNN has translational invariance i.e., it will detect an object wherever the object is placed. It does have some other factor but now lets only consider object that is less than the size of the window of CNN).
  • Benchmark
    • the code provides 45 datasets split across different settings
      • medium/large
      • with/without categorical features
    • accounts for hyperparameter tuning cost.
      • But how does it account for it?
      • Hyperparameter tuning leads to uncontrolled variance on a benchmark Bouthillier et al., 2021, especially with a small budget of model evaluations.
        • Here at different configuration of hyperparameter, there are different scores. A model can achieve the best score at 3rd trial or may be cannot get the best score until 300th trial. This depends on the configuration, therefore there is an variance in result and this is uncontrollable as we do not know which configuration yields the best result.
        • design a benchmarking procedure that jointly samples the variance of hyperparameter tuning and explores increasingly high budgets of model. evaluations.
    • choosing dataset.
      • what is “inter-sample” attention
    • preprocessing dataset?
  • Raw comparison of DL and tree based models.
  • Explanations of why Tree-based models work better than NNs.

BENCHMARKING

  • test

Questions that arose while reading the paper

  • What are the characteristics that let the authors to select a particular dataset?
    • The characteristics are as follows:
      • Heterogeneous Columns: Columns should correspond to features of different nature. Not signal or image.
      • Not high dimensional: Only dataset with a d/n ratio below 1/10. Note: I am not sure what d/n means in this context.
      • Dataset cannot have too little information.
      • Dataset cannot be stream-like or time series.
      • Should be real-world data.
      • Dataset cannot have features<4 or samples<3000.
      • Dataset cannot be too easy.
        • The authors use a different scoring system. But does it account for different Bayes rate for different dataset, that is the real question.
        • they remove a dataset if a default Logistic Regression (or Linear Regression for regression) reach a score whose relative difference with the score of both a default Resnet (from Gorishniy et al. 2021) and a default HistGradientBoosting model (from scikit learn) is below 5%.
      • Dataset should be non-deterministic. That means removing dataset where the target is a deterministic function of the data.
    • The benchmarking is created to make learning task as homogeneous as possible. Therefore challanges of tabular-data that requires a seperate analysis has been omitted. (Here, the question is, is it ommited form analysis or has it been ommited from the benchmarking? )
      • Only used Medim-sized training set for the analysis.
      • Remove all the missing data.
      • Balanced classes.
      • categorical features with more than 20 items are removed.
      • Numerical features with less that 10 uniques values are also removed.
  • Does tree-based models still remain SOTA for small- and large-sized data?

  • How do you account for

  • How can a model be robust to uninformative features?

  • Can NN models preserve the orientation of the data?

  • Can NNs learn irregular functions?

TabR

3 minute read

Published:

This article introduces the TabR model, a retrieval-augmented model designed for tabular data. It is part of a series on tabular deep learning using the Mambular library, which started with an introduction to using an MLP for these tasks.

Architecture Overview

TabR is a retrieval-augmented tabular deep learning method that leverages context from rest of the dataset/database to enrich the representation of the target object, producing more accurate and up-to-date responses. It uses related data points to enhance the prediction. The TabR model consists of three main components: the encoder module, the retrieval module, and the predictor module. The architecture of the TabR model is illustrated in the figure below:

tabR-architecture

The model is a feed-forward network with a retrieval component located in the residual branch. First, a target object and its candidates for retrieval are encoded with the same encoder E. Then, the retrieval module R enriches the target object’s representation by retrieving and processing relevant objects from the candidates. Finally, predictor P makes a prediction. The bold path highlights the structure of the feed-forward retrieval-free model before the addition of the retrieval module R.

Model Fitting

Now that we have outlined the TabR model, let’s move on to model fitting. The dataset and packages are publicly available, so everything can be copied and run locally or in a Google Colab notebook, provided the necessary packages are installed. We will start by installing the mambular package, loading the dataset, and fitting TabR. Subsequently, we will compare these results with those obtained in earlier articles of this series.

Install Mambular

pip install mambular
pip install delu
pip install faiss-cpu # faiss-gpu for gpu

Prepare the Data

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
# Load California Housing dataset
data = fetch_california_housing(as_frame=True)
X, y = data.data, data.target
# Drop NAs
X = X.dropna()
y = y[X.index]
# Standard normalize features and target
y = StandardScaler().fit_transform(y.values.reshape(-1, 1)).ravel()
# Train-test-validation split
X_train, X_temp, y_train, y_temp = train_test_split(
    X, y, test_size=0.5, random_state=42
)
X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.5, random_state=42
)

Train TabR with Mambular

from mambular.models import TabRRegressor
model = TabRRegressor()
model.fit(X_train, y_train, max_epochs=200)
preds = model.predict(X_test)
model.evaluate(X_test, y_test)
Mean Squared Error on Test Set:  0.1877

Compared to MLP-PLR and MLP-PLE, it is a comparable performance. However, compared to XGBoost, it is not a good fit. Let’s try the TabR with PLE as numerical pre-processing as already used in the FT Transformer article.

model = TabRRegressor(
                  numerical_preprocessing='ple' 
                  )
model.fit(X_train, y_train, max_epochs=200)
preds = model.predict(X_test)
model.evaluate(X_test, y_test)
Mean Squared Error on Test Set: 0.18666

Compared to XGBoost, this approach does not seem to be a good fit. Let’s try TabR with PLR embedding as already used in the MLP article.

model = TabRRegressor(use_embeddings=True, 
                  embedding_type='plr', 
                  numerical_preprocessing='standardization' 
                  )
model.fit(X_train, y_train, max_epochs=200)
preds = model.predict(X_test)
model.evaluate(X_test, y_test)
Mean Squared Error on Test Set:  0.1877

Again, compared to XGBoost, this approach does not seem to be a good fit. Therefore, let’s try with an alternative numerical preprocessing method — let’s try MinMax scaling.

model = TabRRegressor(use_embeddings=True, 
                  embedding_type='plr', 
                  numerical_preprocessing='minmax' 
                  )
model.fit(X_train, y_train, max_epochs=200)
preds = model.predict(X_test)
result=model.evaluate(X_test, y_test)

Mean Squared Error on Test Set:  0.1573

The Mean Squared Error (MSE) on the test set is 0.1573, making this our best-performing approach to date, even outperforming deep learning models as well as, tree-based methods like XGBoost.

Below we have summarized the results from all articles so far. Try playing around with some more parameters and improve performance even more. Throughout this series, we will add the results of each introduced method to this table:

Results

portfolio

publications

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.