
We’re excited to offer future sessions on this topic! If you’re interested in attending our upcoming seminar with updated content, sign up below, and we’ll notify you when the event is scheduled.
Basics of Language Modeling: Understanding the Fundamentals of AI/NLP
This event introduces participants to the essential concepts of Natural Language Processing (NLP) and Language Modeling. Learn how AI processes language, explore basic and advanced text representation techniques and understand the role of word embeddings in modern NLP models like BERT.
Feature Engineering / Text Representation
- Basic Vectorization Approaches
Techniques to convert text into numerical representations:- One-Hot Encoding
A simple approach to represent text by converting words into binary vectors. - Bag of Words (BoW)
Capture word occurrences in a document without considering word order. - Bag of N-Grams (BoN)
Extend the BoW model by considering groups of words (n-grams) to capture some word order information. - TF-IDF
A statistical method to represent words based on their frequency across documents.
- One-Hot Encoding
- Distributed Representation
Understand the concept of distributed text representation, where words are mapped to dense vectors in a multi-dimensional space. - Universal Language Representation
Explore approaches that aim to provide general-purpose text embeddings applicable across various NLP tasks. - Hand-Crafted Features
Learn how to manually extract specific features from text to improve model performance.
Word Embeddings
- Types of Embeddings
- Frequency-Based Embeddings
Methods that rely on word frequency in a corpus to generate embeddings. - Prediction-Based Embeddings
Techniques where word embeddings are generated based on prediction tasks:- CBOW (Continuous Bag of Words)
Predict the target word using surrounding context words. - Skip-Gram
Predict surrounding words based on the target word.
- CBOW (Continuous Bag of Words)
- Frequency-Based Embeddings
- Pre-Trained Word Embeddings
Explore widely-used pre-trained word embeddings:- Word2Vec by Google
A popular word embedding method based on prediction tasks. - GloVe by Stanford
A frequency-based method that creates word vectors. - fastText by Facebook
An extension of Word2Vec that captures subword information.
- Word2Vec by Google
- BERT Word Embedding
Discover how BERT (Bidirectional Encoder Representations from Transformers) generates contextual word embeddings:- Token Embedding
Learn how individual words are broken down into tokens for embedding. - Segment Embedding
Understand how embeddings differ based on sentence segments in the model. - Position Embedding
BERT incorporates positional information into embeddings to understand word order.
- Token Embedding
- WordPiece Tokenizer
A tokenizer used in BERT that breaks words into smaller subword units for more flexible representation. - Language Modeling Mechanism in BERT
- Masked Language Modeling (MLM)
Learn how BERT predicts masked words in a sentence to understand context. - Next Sentence Prediction (NSP)
Understand how BERT predicts sentence relationships to handle tasks like question answering and text classification.
- Masked Language Modeling (MLM)
- Subword Tokenization Methods
Explore various methods of tokenizing text into subwords for improved representation in NLP models.
Interested in attending this event?
Register on this page, and we’ll notify you when we schedule the event!