Back to Blog

Build AI Apps with Python: How AI Understands Meaning — Embeddings | Episode 14

Celest KimCelest Kim

Video: Build AI Apps with Python: How AI Understands Meaning — Embeddings | Episode 14 by Taught by Celeste AI - AI Coding Coach

Watch full page →

Build AI Apps with Python: How AI Understands Meaning — Embeddings

Understanding how AI captures the meaning behind words is essential for building intelligent applications. This episode demonstrates how to convert text into numerical vectors called embeddings using the sentence-transformers library, enabling semantic similarity comparisons with cosine similarity—all implemented in pure Python.

Code

from sentence_transformers import SentenceTransformer
import numpy as np

# Load a pre-trained model that converts sentences to 384-dimensional embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')

# Define sentences to compare
sentences = [
  "cat sat on mat",
  "kitten rested on rug",
  "cat",
  "Python programming"
]

# Get embeddings for each sentence
embeddings = model.encode(sentences)

def cosine_similarity(vec1, vec2):
  # Compute cosine similarity between two vectors
  dot_product = np.dot(vec1, vec2)
  norm1 = np.linalg.norm(vec1)
  norm2 = np.linalg.norm(vec2)
  return dot_product / (norm1 * norm2)

# Compare semantic similarity between sentence pairs
sim_cat_kitten = cosine_similarity(embeddings[0], embeddings[1])
sim_cat_python = cosine_similarity(embeddings[2], embeddings[3])

print(f'Similarity between "cat sat on mat" and "kitten rested on rug": {sim_cat_kitten:.2f}')
print(f'Similarity between "cat" and "Python programming": {sim_cat_python:.2f}')

Key Points

  • Embeddings convert text into high-dimensional vectors that capture semantic meaning beyond keywords.
  • The sentence-transformers library provides easy access to powerful pre-trained models without needing API keys.
  • Cosine similarity measures how close two embedding vectors are, indicating semantic similarity between sentences.
  • Similar sentences like "cat sat on mat" and "kitten rested on rug" score high, while unrelated pairs score low.
  • This technique underpins retrieval-augmented generation (RAG) by helping find relevant information chunks based on meaning.