Book Notes: LLM Engineer’s Handbook

7 minute read

Published: June 23, 2025

Book:

Amazon link

Notes

An LLM engineer should have the knowledge in the following:
- Data preparation
- Fine-tune LLM
- Inference Optimization
- Product Deployment (MLOps)
What the book will teach:
- Data Engineering
- Supervised Fine-tuning
- Model Evaluation
- Inference Optimization
- RAG
For every project there must be planning. And the three planning steps the book talks about is as follows:
1. Understand the problem
  - What we want to build ?
  - Why are we building it?
2. Minimal Viable Product reflecting real-world scenario.
  - Bridge the gap between the idealistic and the reality of what can be built.
  - What are the steps that is required to build it?
  - not clear on this part
3. System Design step
  - Core architecture and design choices
  - How are we going to build it?
What the book covers:

Chapter 1: Understanding

The chapter covers the following topics:
- Understanding the LLM Twin concept
- Planning the MVP of the LLM Twin product.
- Building ML systems with feature/training/inference pipelines
- Designing the system architecture of the LLM Twin
The key of the LLM Twin stands in the following:
- What data we collect
- How we preprocess the data
- How we feed the data into the LLM
- How we chain multiple prompts for the desired results
- How we evaluate the generated content
We have to consider how to do the following (MLOps):
- Ingest, clean, and validate fresh data
- Training versus inference setups
- Compute and serve features in the right environment
- Serve the model in a cost-effective way
- Version, track, and share the datasets and models
- Monitor your infrastructure and models
- Deploy the model on a scalable infrastructure
- Automate the deployments and training
In every software architecture, Database->Business Logic->UI. And, any layer can be as complex as required. But for ML, what do we require? Well, that is the FTI architecture. Feature->Training->Inference.

FTI Architecture

To conclude, the most important thing you must remember about the FTI pipelines is their interface:

The feature pipeline takes in data and outputs the features and labels saved to the feature store.
The training pipeline queries the features store for features and labels and outputs a model to the model registry.
The inference pipeline uses the features from the feature store and the model from the model registry to make predictions.

Requirements of the ML system from a purely technical perspective:

Data
- collect
- standardize
- clean the raw data
- create instruct database for fine-tuning an LLM
- chunk and embed the cleaned data. Store the vectorized data into a vector DB for RAG.
Training
- Fine-tune LLMs of various sizes
- Fine-tune on instruction datasets of multiple sizes.
- Switch between LLM types
- Track and compare experiments.
- Test potential production LLM candidates before deploying them.
- Automatically start the training when new instruction datasets are available.
Inference
- A REST API interface for clients to interact with the LLM
- Access to the vector DB in real time for RAG.
- Inference with LLMs of various sizes
- Autoscaling based on user requests
- Automatically deploy the LLMs that pass the evaluation step
LLMOPs
- Instruction dataset versioning, lineage, and reusability
- Model versioning, lineage, and reusability
- Experiment tracking
- Continuous training, continuous integration, and continuous delivery (CT/CI/CD)
- Prompt and system monitoring

LLM Twin high-level architecture

Chapter 2: Tooling and Installation

The chapter covers:
- Python ecosystem and project installation
- MLOps and LLMOps tooling
- Databases for storing unstructured and vector data
- Preparing for AWS
Any Python project needs three fundamental tools: the Python interpreter, dependency management, and a task execution tool.
Poetry is one of the most popular dependency and virtual environment managers within the Python ecosystem.
An orchestrator is a system that automates, schedules, and coordinates all your ML pipelines. It ensures that each pipeline—such as data ingestion, preprocessing, model training, and deployment—executes in the correct order and handles dependencies efficiently.
ZenML is one such orchestrator.
- It orchestrates by pipelines and steps. They are just python functions. Where steps are called in pipeline functions. Modular code should be written for this.
- ZenML transforms any step output into artifacts.
- Any file produced during the ML lifecycle.
Experiment Tracker:
- Training ML models is an entirely iterative and experimental process. Therefore, an experiment tracker is required.
- CometML is one that helps us in this aspect.
Prompt monitoring
- you cannot use standard logging tools as prompts are complex and unstructured chains.
- Optik is simple to use prompt monitoring compared to other prompt monitoring tools.
MongoDB, NoSQL dataset.
Qdrant, vector database.
For our MVP, AWS, it’s the perfect option as it provides robust features for everything we need, such as S3 (object storage), ECR (container registry), and SageMaker (compute for training and inference).

Chapter 3: Data Engineering

In this chapter, we will study the following topics:

Designing the LLM Twin’s data collection pipeline
Implementing the LLM Twin’s data collection pipeline
Gathering raw data into the data warehouse

An ETL pipeline involves three fundamental steps:

We extract data from various sources. We will crawl data from platforms like Medium, Substack, and GitHub to gather raw data.
We transform this data by cleaning and standardizing it into a consistent format suitable for storage and analysis.
We load the transformed data into a data warehouse or database.

Collect and curate the dataset

From raw data, Extract -> Transform -> Load into MongoDB. (ETL)
- crawling
- standardizing data
- load into data warehouse

Chapter 5: Supervised Fine-Tuning

SFT refines the model’s capabilities (here model refers to pre-trained model that can predict the new sequence) learning to predict instruction-answer pair.
Makes the general ability of pre-trained LLMs of understanding language to specific application, or in this case more conversational.
In this chapter, the authors cover the following topics:
- Creating a high-quality instruction dataset
- SFT techniques
- Implementing fine-tuning in practice

Chapter 6: Fine-Tuning with Preference Alignment

SFT cannot address a human’s preference of how a conversation should be, therefore we use preference alignment, specifically the Direct Preference Optimization (DPO).
Authors cover the following topics in this chapter:
- Understanding preference datasets
- How to create our own preference dataset
- Direct preference optimization (DPO)
- Implementing DPO in practice to align our model

Chapter 7: Evaluating LLMs

no unified approach to measuring a model’s performance but there are patterns and recipes that we can adapt to specific use cases.
The chapter covers:
- Model evaluation
- RAG evaluation
- Evaluating TwinLlama-3.1-8B

Chapter 8: Inference Optimization

Some tasks like document generation take hours and some tasks like code completion take a small amount of time, this is why optimization of the inference is quite important. The things that are optimized are the latency (the speed of the generation of the first token), throughput (number of tokens generated per second), and memory footprint of the LLM.
The chapter covers:
- Model optimization strategies
- Model parallelism
- Model quantization

Chapter 9: RAG Inference Pipeline

Where the magic happens for the RAG system.
The chapter covers the following topics:
- Understanding the LLM Twin’s RAG inference pipeline
- Exploring the LLM Twin’s advanced RAG techniques
- Implementing the LLM Twin’s RAG inference pipeline

Chapter 10: Inference Pipeline Deployment

The chapter covers:
- Criteria for choosing deployment types
- Understanding inference deployment types
- Monolithic versus microservices architecture in model serving
- Exploring the LLM Twin’s inference pipeline deployment strategy
- Deploying the LLM Twin service
- Autoscaling capabilities to handle spikes in usage

Chapter 11: MLOps and LLMOps

This chapter covers:
- The path to LLMOps: Understanding its roots in DevOps and MLOps
- Deploying the LLM Twin’s pipelines to the cloud
- Adding LLMOps to the LLM Twin

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Learning Deutsch by learning Phrases and Sentences

1 minute read

Published: September 28, 2025

When it comes to language learning, focusing on phrases and sentences rather than isolated words can make a significant difference. While memorizing vocabulary lists might seem like a straightforward approach, it often leaves learners struggling to use those words in real-life situations. Words alone rarely convey complete meaning; context is crucial. By learning phrases and sentences, you naturally absorb grammar, word order, and common expressions, making your speech sound more natural and fluent.

For example, knowing the word “book” is helpful, but learning the phrase “I’d like to book a table” is far more practical. Phrases provide ready-made building blocks for conversation, reducing the mental effort needed to construct sentences from scratch. This approach also helps with pronunciation and intonation, as you practice speaking in chunks rather than isolated syllables.

Moreover, sentences and phrases expose you to cultural nuances and idiomatic expressions that single words cannot convey. This leads to better comprehension when listening or reading, and more confidence when speaking. In summary, prioritizing phrases and sentences accelerates your ability to communicate effectively, making language learning more enjoyable and efficient.

Below are some of the anki decks that can be used:

Deutsch:

German Sentences
- Part 1 - A1 and A2: https://ankiweb.net/shared/info/785874566
- Part 2 - B1 : https://ankiweb.net/shared/info/17323417
- Part 3 - B2-C1 : https://ankiweb.net/shared/info/944971572
German 7000 Intermediate/Advanced Sentences w/ Audio
- Part 1 : https://ankiweb.net/shared/info/1125602705

Japanese:

LTL Japanese Deck
- Level 1 - Short: https://ankiweb.net/shared/info/1184395484
- Level 2 - Short Medium: https://ankiweb.net/shared/info/187819699
- Level 3 - Medium: https://ankiweb.net/shared/info/266834099
- Level 4 - Medium Long: https://ankiweb.net/shared/info/660574631
- Level 5 - Long: TBD

Deutsch Day 16: Compound Nouns

12 minute read

Published: August 06, 2025

📘 MASTER PLAN: German Compound Noun Vocabulary Expansion

🔶 PHASE 1: Master the Most Useful Base Nouns (Top 50)

These are the “root” or “core” nouns you will see in countless combinations.

Noun Meaning Article Example Compound

| # | Noun | Meaning | Article | Example Compound | | – | ——– | ———– | ——– | —————– | | 1 | Haus | house | das | Krankenhaus | | 2 | Kind | child | das | Kindergarten | | 3 | Arbeit | work | die | Hausarbeit | | 4 | Schule | school | die | Sprachschule | | 5 | Auto | car | das | Autounfall | | 6 | Zeit | time | die | Freizeit | | 7 | Tag | day | der | Feiertag | | 8 | Bahn | rail/train | die | U-Bahn | | 9 | Buch | book | das | Wörterbuch | | 10 | Zimmer | room | das | Schlafzimmer | | 11 | Stadt | city | die | Hauptstadt | | 12 | Name | name | der | Nachname | | 13 | Licht | light | das | Taschenlicht | | 14 | Wasser | water | das | Trinkwasser | | 15 | Luft | air | die | Luftqualität | | 16 | Weg | path/way | der | Heimweg | | 17 | Spiel | game/play | das | Kinderspiel | | 18 | Reise | travel/trip | die | Dienstreise | | 19 | Zeitung | newspaper | die | Bildzeitung | | 20 | Gerät | device | das | Küchengerät | | 21 | Mann | man | der | Geschäftsmann | | 22 | Frau | woman | die | Hausfrau | | 23 | Essen | food/eating | das | Essenszeit | | 24 | Lehrer | teacher | der | Lehrerzimmer | | 25 | Student | student | der | Studentenstadt | | 26 | Bahn | train/rail | die | Eisenbahn | | 27 | Eltern | parents | die (pl) | Elternabend | | 28 | Körper | body | der | Körperpflege | | 29 | Kopf | head | der | Kopfschmerzen | | 30 | Zahn | tooth | der | Zahnarzt | | 31 | Auge | eye | das | Augenarzt | | 32 | Herz | heart | das | Herzenswunsch | | 33 | Beruf | profession | der | Berufsleben | | 34 | Unfall | accident | der | Autounfall | | 35 | Polizei | police | die | Polizeiauto | | 36 | Freund | friend | der | Freundschaft | | 37 | Uhr | clock | die | Wanduhr | | 38 | Sprache | language | die | Fremdsprache | | 39 | Tier | animal | das | Haustier | | 40 | Leben | life | das | Lebensstil | | 41 | Welt | world | die | Weltkarte | | 42 | Feuer | fire | das | Feuerzeug | | 43 | Glas | glass | das | Weinglas | | 44 | Straße | street | die | Hauptstraße | | 45 | Fenster | window | das | Fensterrahmen | | 46 | Schuh | shoe | der | Turnschuh | | 47 | Tasche | bag | die | Handtasche | | 48 | Lampe | lamp | die | Schreibtischlampe | | 49 | Computer | computer | der | Computerprogramm | | 50 | Tisch | table/desk | der | Esstisch |

✅ Goal: Learn gender, plural form, and 2-3 common compounds per base noun.

🔶 PHASE 2: Master Compound Prefix & Suffix Builders

These turn base nouns into real-life compound nouns.

✅ Goal: Learn 10 of each and how they behave when combined.

🔶 PHASE 3: Learn Noun Combination Patterns (Grouped by Theme)

Now combine your prefix + base/suffix using patterns and themes. Grouping by theme makes it easy to recall.

🏠 House & Furniture (Wohnen und Möbel)

🧑‍⚕️ Health & Body (Gesundheit und Körper)

🎓 School & Learning (Schule und Lernen)

🚗 Travel & Transport (Reisen und Verkehr)

⏱ Time & Work (Zeit und Arbeit)

📱 Devices & Objects (Geräte und Gegenstände)

📰 Media & Reading (Medien und Lesen)

| Compound Noun      | Meaning            | | ------------------ | ------------------ | | Wörterbuch         | dictionary         | | Schulbuch          | school book        | | Tageszeitung       | daily newspaper    | | Bildzeitung        | tabloid            | | Lesebrille         | reading glasses    | | Fernsehprogramm    | TV program         | | Lieblingsbuch      | favorite book      | | Zeitungsausschnitt | newspaper clipping | | Sachbuch           | nonfiction book    | | Bibliotheksausweis | library card       | | Buchhandlung       | bookstore          | | Nachrichtenkanal   | news channel       |

💼 People & Professions (Personen und Berufe)

🐾 Nature & Environment (Natur und Umwelt)

📦 BONUS GROUP – Easy-to-Understand Compounds from A1/A2 Level

Deutsch Day 12: Subject + Verb

1 minute read

Published: July 29, 2025

Nominative

Personal Pronomen in Nominative

Nominative pronouns are personal pronouns that replace the subject in a sentence. They show who or what is doing something, e.g., I am tired.

Deutsch: Pronomen im Nominativ sind Personalpronomen, die das Subjekt im Satz ersetzen. Sie zeigen, wer oder was etwas tut, z. B. Ich bin müde.

	Singular	Plural
1st person	ich (I)	Wir (We)
2nd person	du/Sie (you)	Ihr (you all)
3rd Person	er/sie/es (he/she/it)	sie (they)

Sein (to have)

Number	Person	Personalpronomen	Sein (to be)
Singular	1st person	ich (I)	bin
Singular	2nd person	du (you - informal)	bist
Singular	3rd person	er/sie/es (he/she/it)	ist
Singular	2nd person (formal)	Sie (you - formal)	sind
Plural	1st person	wir (we)	sind
Plural	2nd person	ihr (you all - informal)	seid
Plural	3rd person	sie (they)	sind
Plural	2nd person (formal)	Sie (you all - formal)	sind

habe (to have)

Number	Person	Personalpronomen	Haben (to have)
Singular	1st person	ich (I)	habe
Singular	2nd person	du (you - informal)	hast
Singular	3rd person	er/sie/es (he/she/it)	hat
Singular	2nd person (formal)	Sie (you - formal)	haben
Plural	1st person	wir (we)	haben
Plural	2nd person	ihr (you all - informal)	habt
Plural	3rd person	sie (they)	haben
Plural	2nd person (formal)	Sie (you all - formal)	haben

Deutsch Day 11: Ja/Nein Trage

less than 1 minute read

Published: July 28, 2025

The structure for Yes/No question (Ja/Nein Trage) in Deutsch is as follows:

` Verb (konjugiert) + Subjekt + Rest`

For Example:

Deutsch	Englisch
Bist du müde?	Are you tired?
Hast du ein Buch?	Do you have a book?
Kommt er aus Spanien?	Does he come from Spain?
Geht sie zur Schule?	Does she go to school?
Wohnst du in Berlin?	Do you live in Berlin?

Other example:

Ist das die Brille?
Ist das die Handdy?
Ist das der Apfel?
Ist das der Tasse?

Bishnu Khadka