Build AI Apps with Python: How AI Understands Meaning — Embeddings | Episode 14
Video: Build AI Apps with Python: How AI Understands Meaning — Embeddings | Episode 14 by Taught by Celeste AI - AI Coding Coach
Watch full page →Build AI Apps with Python: How AI Understands Meaning — Embeddings
Understanding how AI captures the meaning behind words is essential for building intelligent applications. This episode demonstrates how to convert text into numerical vectors called embeddings using the sentence-transformers library, enabling semantic similarity comparisons with cosine similarity—all implemented in pure Python.
Code
from sentence_transformers import SentenceTransformer
import numpy as np
# Load a pre-trained model that converts sentences to 384-dimensional embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')
# Define sentences to compare
sentences = [
"cat sat on mat",
"kitten rested on rug",
"cat",
"Python programming"
]
# Get embeddings for each sentence
embeddings = model.encode(sentences)
def cosine_similarity(vec1, vec2):
# Compute cosine similarity between two vectors
dot_product = np.dot(vec1, vec2)
norm1 = np.linalg.norm(vec1)
norm2 = np.linalg.norm(vec2)
return dot_product / (norm1 * norm2)
# Compare semantic similarity between sentence pairs
sim_cat_kitten = cosine_similarity(embeddings[0], embeddings[1])
sim_cat_python = cosine_similarity(embeddings[2], embeddings[3])
print(f'Similarity between "cat sat on mat" and "kitten rested on rug": {sim_cat_kitten:.2f}')
print(f'Similarity between "cat" and "Python programming": {sim_cat_python:.2f}')
Key Points
- Embeddings convert text into high-dimensional vectors that capture semantic meaning beyond keywords.
- The
sentence-transformerslibrary provides easy access to powerful pre-trained models without needing API keys. - Cosine similarity measures how close two embedding vectors are, indicating semantic similarity between sentences.
- Similar sentences like "cat sat on mat" and "kitten rested on rug" score high, while unrelated pairs score low.
- This technique underpins retrieval-augmented generation (RAG) by helping find relevant information chunks based on meaning.