Unsupervised document classification for imbalanced data sets poses a major challenge as it requires carefully curated dataset.
The authors propose an integration of web scraping, one-class Support Vector Machines (SVM) and Latent Dirich-let Allocation (LDA) topic modelling as a multi-step classification rule that circumvents manual labelling.
Topic modeling with method like Latent Dirichlet Allocation (LDA) topic modeling (most commonly used model) is satisfactory for large text, but for small texts like (tweets) it is challenging.
The author compared the performance of LDA, Gibbs Sampler Dirichlet Multinomial Model (GSDMM) and the Gamma Poisson Mixture Model (GPM), which are specifically designed for sparse data.
They found that GSDMM and GPM are better for sparse data.
Notes
EMBEDDING → DIMENSIONALITY REDUCTION → CLUSTERING → TOP WORDS ( I-TF-IDF and COSINE SIMILARITY) → LLM (GPT 4.0 AND GPT 3.5) for topic naming and description
Answering questions using LLM (ChatGPT + RAG Implementation)
When it comes to language learning, focusing on phrases and sentences rather than isolated words can make a significant difference. While memorizing vocabulary lists might seem like a straightforward approach, it often leaves learners struggling to use those words in real-life situations. Words alone rarely convey complete meaning; context is crucial. By learning phrases and sentences, you naturally absorb grammar, word order, and common expressions, making your speech sound more natural and fluent.
For example, knowing the word “book” is helpful, but learning the phrase “I’d like to book a table” is far more practical. Phrases provide ready-made building blocks for conversation, reducing the mental effort needed to construct sentences from scratch. This approach also helps with pronunciation and intonation, as you practice speaking in chunks rather than isolated syllables.
Moreover, sentences and phrases expose you to cultural nuances and idiomatic expressions that single words cannot convey. This leads to better comprehension when listening or reading, and more confidence when speaking. In summary, prioritizing phrases and sentences accelerates your ability to communicate effectively, making language learning more enjoyable and efficient.
Below are some of the anki decks that can be used:
Deutsch:
German Sentences
Part 1 - A1 and A2: https://ankiweb.net/shared/info/785874566
Part 2 - B1 : https://ankiweb.net/shared/info/17323417
Part 3 - B2-C1 : https://ankiweb.net/shared/info/944971572
German 7000 Intermediate/Advanced Sentences w/ Audio
Part 1 : https://ankiweb.net/shared/info/1125602705
| Compound Noun | Meaning | | ————— | —————- | | Krankenhaus | hospital | | Zahnarzt | dentist | | Augenarzt | eye doctor | | Kopfschmerzen | headache | | Rückenschmerzen | back pain | | Körperpflege | body care | | Krankenkasse | health insurance | | Herzschlag | heartbeat | | Blutdruck | blood pressure | | Hausarzt | family doctor | | Notaufnahme | emergency room | | Krankenwagen | ambulance |
🎓 School & Learning (Schule und Lernen)
| Compound Noun | Meaning | | —————– | ——————- | | Schulbuch | school book | | Lehrerzimmer | teacher’s room | | Sprachschule | language school | | Hausaufgabe | homework | | Klassenarbeit | class test | | Stundenplan | schedule/timetable | | Schultasche | school bag | | Schulweg | way to school | | Schülerausweis | student ID card | | Schulzeit | school time | | Unterrichtsstunde | lesson | | Schulanfang | beginning of school |