Skip to main content

OpenSearch Serverless - Vector Search Guide

Deep dive into implementing vector search for semantic similarity, RAG applications, and document embeddings.

Overview

Vector search (also called k-NN or nearest neighbor search) enables semantic similarity search by representing documents as high-dimensional vectors (embeddings). Unlike traditional keyword search, vector search finds documents based on meaning rather than exact word matches.

Traditional Search (keyword-based):

  • Query: "car repair"
  • Matches: Documents containing "car" AND "repair"
  • Misses: "automobile maintenance", "vehicle service"

Vector Search (semantic):

  • Query: "car repair" → embedding vector
  • Matches: Documents with similar meaning (automobile maintenance, vehicle service, etc.)
  • Finds: Semantically similar content regardless of exact words

Use Cases

  1. Semantic Document Search - Find documents by meaning, not just keywords
  2. RAG (Retrieval Augmented Generation) - Retrieve relevant context for LLM prompts
  3. Recommendation Systems - Find similar products, articles, or content
  4. Question Answering - Match questions to similar answered questions
  5. Duplicate Detection - Find near-duplicate documents
  6. Anomaly Detection - Identify outliers in vector space

How Vector Search Works

┌──────────────────────────────────────────────────────────────┐
│ Vector Search Pipeline │
├──────────────────────────────────────────────────────────────┤
│ │
│ 1. Document Ingestion │
│ ┌─────────────┐ ┌──────────────┐ │
│ │ Document │─────►│ Embedding │ │
│ │ "Car needs │ │ Model │ │
│ │ repair" │ │ (OpenAI, │ │
│ └─────────────┘ │ Cohere) │ │
│ └──────┬───────┘ │
│ │ │
│ v │
│ [0.23, -0.45, 0.67, ...] │
│ (1536-dimensional vector) │
│ │ │
│ v │
│ ┌──────────────┐ │
│ │ OpenSearch │ │
│ │ Index │ │
│ └──────────────┘ │
│ │
│ 2. Query Processing │
│ ┌─────────────┐ ┌──────────────┐ │
│ │ Query │─────►│ Embedding │ │
│ │ "vehicle │ │ Model │ │
│ │ service" │ │ (same model)│ │
│ └─────────────┘ └──────┬───────┘ │
│ │ │
│ v │
│ [0.21, -0.43, 0.69, ...] │
│ │ │
│ v │
│ 3. Similarity Search │
│ ┌──────────────┐ │
│ │ k-NN Search │ │
│ │ (cosine │ │
│ │ similarity)│ │
│ └──────┬───────┘ │
│ │ │
│ v │
│ ┌──────────────┐ │
│ │ Top K │ │
│ │ Results │ │
│ └──────────────┘ │
│ │
└──────────────────────────────────────────────────────────────┘

Creating a Vector Search Index

Step 1: Define Index Mapping

Create an index with vector field mapping:

from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
import boto3

# Connect to OpenSearch (via bastion port forward)
credentials = boto3.Session().get_credentials()
auth = AWSV4SignerAuth(credentials, 'us-east-1', 'aoss')

client = OpenSearch(
hosts=[{'host': 'localhost', 'port': 9200}],
http_auth=auth,
use_ssl=True,
verify_certs=False,
connection_class=RequestsHttpConnection
)

# Create index with vector mapping
index_name = 'documents'
index_body = {
'settings': {
'index': {
'number_of_shards': 2,
'number_of_replicas': 0,
'knn': True # Enable k-NN
}
},
'mappings': {
'properties': {
'title': {
'type': 'text'
},
'content': {
'type': 'text'
},
'content_vector': {
'type': 'knn_vector',
'dimension': 1536, # OpenAI text-embedding-3-small
'method': {
'name': 'hnsw', # Hierarchical Navigable Small World
'engine': 'faiss',
'space_type': 'cosinesimil', # Cosine similarity
'parameters': {
'ef_construction': 128,
'm': 16
}
}
},
'metadata': {
'properties': {
'source': {'type': 'keyword'},
'created_at': {'type': 'date'},
'category': {'type': 'keyword'}
}
}
}
}
}

client.indices.create(index=index_name, body=index_body)
print(f"Index '{index_name}' created successfully")

Vector Field Configuration

Key Parameters:

  • dimension: Vector size (must match embedding model)

    • OpenAI text-embedding-3-small: 1536
    • OpenAI text-embedding-3-large: 3072
    • Cohere embed-english-v3.0: 1024
  • method.name: Algorithm for k-NN search

    • hnsw: Fast, approximate search (recommended)
    • ivf: Inverted file index (good for large datasets)
  • method.engine: Vector search engine

    • faiss: Facebook AI Similarity Search (recommended)
    • nmslib: Non-Metric Space Library
  • space_type: Distance metric

    • cosinesimil: Cosine similarity (recommended for text)
    • l2: Euclidean distance
    • innerproduct: Inner product (dot product)
  • ef_construction: Build-time parameter (higher = better recall, slower indexing)

  • m: Number of connections per node (higher = better recall, more memory)

Generating Embeddings

Using OpenAI

from openai import OpenAI

client_openai = OpenAI(api_key="your-api-key")

def get_embedding(text, model="text-embedding-3-small"):
"""Generate embedding for text using OpenAI."""
text = text.replace("\n", " ")
response = client_openai.embeddings.create(
input=[text],
model=model
)
return response.data[0].embedding

# Example
text = "How to repair a car engine"
embedding = get_embedding(text)
print(f"Embedding dimension: {len(embedding)}") # 1536

Using Cohere

import cohere

co = cohere.Client("your-api-key")

def get_embedding_cohere(text, model="embed-english-v3.0"):
"""Generate embedding for text using Cohere."""
response = co.embed(
texts=[text],
model=model,
input_type="search_document" # or "search_query" for queries
)
return response.embeddings[0]

# Example
embedding = get_embedding_cohere("How to repair a car engine")
print(f"Embedding dimension: {len(embedding)}") # 1024

Using Sentence Transformers (Open Source)

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

def get_embedding_local(text):
"""Generate embedding using local model."""
return model.encode(text).tolist()

# Example
embedding = get_embedding_local("How to repair a car engine")
print(f"Embedding dimension: {len(embedding)}") # 384

Indexing Documents with Vectors

Single Document

def index_document(client, index_name, doc_id, title, content):
"""Index a document with its vector embedding."""
# Generate embedding
embedding = get_embedding(content)

# Index document
document = {
'title': title,
'content': content,
'content_vector': embedding,
'metadata': {
'source': 'documentation',
'created_at': '2025-01-01T00:00:00Z',
'category': 'automotive'
}
}

response = client.index(
index=index_name,
id=doc_id,
body=document
)

return response

# Example
index_document(
client,
'documents',
'1',
'Car Engine Repair Guide',
'Learn how to diagnose and repair common car engine problems...'
)

Bulk Indexing

from opensearchpy import helpers

def bulk_index_documents(client, index_name, documents):
"""Bulk index multiple documents with embeddings."""
actions = []

for doc in documents:
# Generate embedding
embedding = get_embedding(doc['content'])

# Prepare action
action = {
'_index': index_name,
'_id': doc['id'],
'_source': {
'title': doc['title'],
'content': doc['content'],
'content_vector': embedding,
'metadata': doc.get('metadata', {})
}
}
actions.append(action)

# Bulk index
success, failed = helpers.bulk(client, actions)
print(f"Indexed {success} documents, {len(failed)} failed")

return success, failed

# Example
documents = [
{
'id': '1',
'title': 'Car Engine Repair',
'content': 'How to repair car engines...',
'metadata': {'category': 'automotive'}
},
{
'id': '2',
'title': 'Vehicle Maintenance',
'content': 'Regular vehicle maintenance tips...',
'metadata': {'category': 'automotive'}
}
]

bulk_index_documents(client, 'documents', documents)

Vector Search Queries

def vector_search(client, index_name, query_text, k=10):
"""Perform k-NN vector search."""
# Generate query embedding
query_vector = get_embedding(query_text)

# k-NN search
search_body = {
'size': k,
'query': {
'knn': {
'content_vector': {
'vector': query_vector,
'k': k
}
}
}
}

response = client.search(index=index_name, body=search_body)

# Extract results
results = []
for hit in response['hits']['hits']:
results.append({
'id': hit['_id'],
'score': hit['_score'],
'title': hit['_source']['title'],
'content': hit['_source']['content']
})

return results

# Example
results = vector_search(client, 'documents', 'vehicle service and maintenance', k=5)
for result in results:
print(f"Score: {result['score']:.4f} - {result['title']}")

Combine vector search with filters:

def filtered_vector_search(client, index_name, query_text, category, k=10):
"""Vector search with metadata filters."""
query_vector = get_embedding(query_text)

search_body = {
'size': k,
'query': {
'bool': {
'must': [
{
'knn': {
'content_vector': {
'vector': query_vector,
'k': k
}
}
}
],
'filter': [
{
'term': {
'metadata.category': category
}
}
]
}
}
}

response = client.search(index=index_name, body=search_body)
return response['hits']['hits']

# Example
results = filtered_vector_search(
client,
'documents',
'repair guide',
category='automotive',
k=5
)

Hybrid Search (Full-Text + Vector)

Combine traditional keyword search with vector search for best results:

def hybrid_search(client, index_name, query_text, k=10, text_weight=0.3, vector_weight=0.7):
"""Hybrid search combining full-text and vector search."""
query_vector = get_embedding(query_text)

search_body = {
'size': k,
'query': {
'bool': {
'should': [
# Full-text search
{
'multi_match': {
'query': query_text,
'fields': ['title^2', 'content'],
'boost': text_weight
}
},
# Vector search
{
'knn': {
'content_vector': {
'vector': query_vector,
'k': k,
'boost': vector_weight
}
}
}
]
}
}
}

response = client.search(index=index_name, body=search_body)
return response['hits']['hits']

# Example
results = hybrid_search(
client,
'documents',
'car repair tips',
k=10,
text_weight=0.3, # 30% weight to keyword search
vector_weight=0.7 # 70% weight to vector search
)

RAG (Retrieval Augmented Generation)

Use vector search to retrieve relevant context for LLM prompts:

from openai import OpenAI

openai_client = OpenAI(api_key="your-api-key")

def rag_query(opensearch_client, index_name, question, k=3):
"""Answer question using RAG with vector search."""
# Step 1: Retrieve relevant documents
results = vector_search(opensearch_client, index_name, question, k=k)

# Step 2: Build context from top results
context = "\n\n".join([
f"Document {i+1}:\n{result['content']}"
for i, result in enumerate(results)
])

# Step 3: Generate answer with LLM
prompt = f"""Answer the following question based on the provided context.

Context:
{context}

Question: {question}

Answer:"""

response = openai_client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant that answers questions based on the provided context."},
{"role": "user", "content": prompt}
],
temperature=0.7,
max_tokens=500
)

answer = response.choices[0].message.content

return {
'answer': answer,
'sources': results
}

# Example
result = rag_query(
client,
'documents',
'How do I change my car oil?',
k=3
)

print(f"Answer: {result['answer']}\n")
print("Sources:")
for source in result['sources']:
print(f"- {source['title']} (score: {source['score']:.4f})")

Performance Optimization

1. Batch Embedding Generation

Generate embeddings in batches to reduce API calls:

def batch_get_embeddings(texts, batch_size=100):
"""Generate embeddings in batches."""
embeddings = []

for i in range(0, len(texts), batch_size):
batch = texts[i:i+batch_size]
response = client_openai.embeddings.create(
input=batch,
model="text-embedding-3-small"
)
embeddings.extend([item.embedding for item in response.data])

return embeddings

2. Optimize Index Settings

# For large datasets
index_body = {
'settings': {
'index': {
'knn': True,
'knn.algo_param.ef_search': 100, # Query-time parameter
'refresh_interval': '30s' # Reduce refresh frequency during bulk indexing
}
},
'mappings': {
'properties': {
'content_vector': {
'type': 'knn_vector',
'dimension': 1536,
'method': {
'name': 'hnsw',
'engine': 'faiss',
'space_type': 'cosinesimil',
'parameters': {
'ef_construction': 256, # Higher for better recall
'm': 32 # Higher for better recall
}
}
}
}
}
}

For very large datasets, use approximate k-NN:

search_body = {
'size': k,
'query': {
'knn': {
'content_vector': {
'vector': query_vector,
'k': k,
'method_parameters': {
'ef_search': 100 # Lower = faster but less accurate
}
}
}
}
}

Best Practices

1. Embedding Model Selection

  • OpenAI text-embedding-3-small (1536 dims): Best balance of cost/performance
  • OpenAI text-embedding-3-large (3072 dims): Highest quality, more expensive
  • Cohere embed-english-v3.0 (1024 dims): Good for English text
  • Sentence Transformers (384-768 dims): Free, run locally, lower quality

2. Chunking Strategy

For long documents, split into chunks:

def chunk_document(text, chunk_size=500, overlap=50):
"""Split document into overlapping chunks."""
words = text.split()
chunks = []

for i in range(0, len(words), chunk_size - overlap):
chunk = ' '.join(words[i:i+chunk_size])
chunks.append(chunk)

return chunks

# Example
long_document = "..." # 5000 words
chunks = chunk_document(long_document, chunk_size=500, overlap=50)

for i, chunk in enumerate(chunks):
index_document(client, 'documents', f'doc1_chunk{i}', f'Document 1 - Part {i+1}', chunk)

3. Metadata Enrichment

Add metadata for better filtering:

document = {
'title': title,
'content': content,
'content_vector': embedding,
'metadata': {
'source': 'documentation',
'created_at': '2025-01-01T00:00:00Z',
'category': 'automotive',
'tags': ['repair', 'maintenance', 'engine'],
'author': 'John Doe',
'version': '1.0'
}
}

4. Reranking

Improve results with reranking:

def rerank_results(results, query_text, top_k=5):
"""Rerank results using cross-encoder."""
from sentence_transformers import CrossEncoder

model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

# Score each result
pairs = [[query_text, result['content']] for result in results]
scores = model.predict(pairs)

# Combine with original scores
for i, result in enumerate(results):
result['rerank_score'] = scores[i]

# Sort by rerank score
reranked = sorted(results, key=lambda x: x['rerank_score'], reverse=True)

return reranked[:top_k]

Track Key Metrics

# Search latency
import time

start = time.time()
results = vector_search(client, 'documents', query_text, k=10)
latency = time.time() - start

print(f"Search latency: {latency*1000:.2f}ms")

# Result quality
avg_score = sum(r['score'] for r in results) / len(results)
print(f"Average relevance score: {avg_score:.4f}")

A/B Testing

Compare different embedding models or search strategies:

def compare_search_methods(query_text):
"""Compare vector vs hybrid search."""
# Vector only
vector_results = vector_search(client, 'documents', query_text, k=10)

# Hybrid
hybrid_results = hybrid_search(client, 'documents', query_text, k=10)

print("Vector Search Top 3:")
for r in vector_results[:3]:
print(f" {r['title']} ({r['score']:.4f})")

print("\nHybrid Search Top 3:")
for r in hybrid_results[:3]:
print(f" {r['title']} ({r['score']:.4f})")

Next Steps

References