OpenSearch Serverless - Vector Search Guide

Deep dive into implementing vector search for semantic similarity, RAG applications, and document embeddings.

Overview

Vector search (also called k-NN or nearest neighbor search) enables semantic similarity search by representing documents as high-dimensional vectors (embeddings). Unlike traditional keyword search, vector search finds documents based on meaning rather than exact word matches.

Why Vector Search?

Traditional Search (keyword-based):

Query: "car repair"
Matches: Documents containing "car" AND "repair"
Misses: "automobile maintenance", "vehicle service"

Vector Search (semantic):

Query: "car repair" → embedding vector
Matches: Documents with similar meaning (automobile maintenance, vehicle service, etc.)
Finds: Semantically similar content regardless of exact words

Use Cases

Semantic Document Search - Find documents by meaning, not just keywords
RAG (Retrieval Augmented Generation) - Retrieve relevant context for LLM prompts
Recommendation Systems - Find similar products, articles, or content
Question Answering - Match questions to similar answered questions
Duplicate Detection - Find near-duplicate documents
Anomaly Detection - Identify outliers in vector space

How Vector Search Works

┌──────────────────────────────────────────────────────────────┐
│                    Vector Search Pipeline                     │
├──────────────────────────────────────────────────────────────┤
│                                                               │
│  1. Document Ingestion                                       │
│     ┌─────────────┐      ┌──────────────┐                   │
│     │  Document   │─────►│  Embedding   │                   │
│     │  "Car needs │      │  Model       │                   │
│     │   repair"   │      │  (OpenAI,    │                   │
│     └─────────────┘      │   Cohere)    │                   │
│                          └──────┬───────┘                   │
│                                 │                             │
│                                 v                             │
│                          [0.23, -0.45, 0.67, ...]            │
│                          (1536-dimensional vector)            │
│                                 │                             │
│                                 v                             │
│                          ┌──────────────┐                    │
│                          │  OpenSearch  │                    │
│                          │  Index       │                    │
│                          └──────────────┘                    │
│                                                               │
│  2. Query Processing                                         │
│     ┌─────────────┐      ┌──────────────┐                   │
│     │  Query      │─────►│  Embedding   │                   │
│     │  "vehicle   │      │  Model       │                   │
│     │   service"  │      │  (same model)│                   │
│     └─────────────┘      └──────┬───────┘                   │
│                                 │                             │
│                                 v                             │
│                          [0.21, -0.43, 0.69, ...]            │
│                                 │                             │
│                                 v                             │
│  3. Similarity Search                                        │
│                          ┌──────────────┐                    │
│                          │  k-NN Search │                    │
│                          │  (cosine     │                    │
│                          │   similarity)│                    │
│                          └──────┬───────┘                    │
│                                 │                             │
│                                 v                             │
│                          ┌──────────────┐                    │
│                          │  Top K       │                    │
│                          │  Results     │                    │
│                          └──────────────┘                    │
│                                                               │
└──────────────────────────────────────────────────────────────┘

Creating a Vector Search Index

Step 1: Define Index Mapping

Create an index with vector field mapping:

from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
import boto3

# Connect to OpenSearch (via bastion port forward)
credentials = boto3.Session().get_credentials()
auth = AWSV4SignerAuth(credentials, 'us-east-1', 'aoss')

client = OpenSearch(
    hosts=[{'host': 'localhost', 'port': 9200}],
    http_auth=auth,
    use_ssl=True,
    verify_certs=False,
    connection_class=RequestsHttpConnection
)

# Create index with vector mapping
index_name = 'documents'
index_body = {
    'settings': {
        'index': {
            'number_of_shards': 2,
            'number_of_replicas': 0,
            'knn': True  # Enable k-NN
        }
    },
    'mappings': {
        'properties': {
            'title': {
                'type': 'text'
            },
            'content': {
                'type': 'text'
            },
            'content_vector': {
                'type': 'knn_vector',
                'dimension': 1536,  # OpenAI text-embedding-3-small
                'method': {
                    'name': 'hnsw',  # Hierarchical Navigable Small World
                    'engine': 'faiss',
                    'space_type': 'cosinesimil',  # Cosine similarity
                    'parameters': {
                        'ef_construction': 128,
                        'm': 16
                    }
                }
            },
            'metadata': {
                'properties': {
                    'source': {'type': 'keyword'},
                    'created_at': {'type': 'date'},
                    'category': {'type': 'keyword'}
                }
            }
        }
    }
}

client.indices.create(index=index_name, body=index_body)
print(f"Index '{index_name}' created successfully")

Vector Field Configuration

Key Parameters:

dimension: Vector size (must match embedding model)
- OpenAI text-embedding-3-small: 1536
- OpenAI text-embedding-3-large: 3072
- Cohere embed-english-v3.0: 1024
method.name: Algorithm for k-NN search
- hnsw: Fast, approximate search (recommended)
- ivf: Inverted file index (good for large datasets)
method.engine: Vector search engine
- faiss: Facebook AI Similarity Search (recommended)
- nmslib: Non-Metric Space Library
space_type: Distance metric
- cosinesimil: Cosine similarity (recommended for text)
- l2: Euclidean distance
- innerproduct: Inner product (dot product)
ef_construction: Build-time parameter (higher = better recall, slower indexing)
m: Number of connections per node (higher = better recall, more memory)

Generating Embeddings

Using OpenAI

from openai import OpenAI

client_openai = OpenAI(api_key="your-api-key")

def get_embedding(text, model="text-embedding-3-small"):
    """Generate embedding for text using OpenAI."""
    text = text.replace("\n", " ")
    response = client_openai.embeddings.create(
        input=[text],
        model=model
    )
    return response.data[0].embedding

# Example
text = "How to repair a car engine"
embedding = get_embedding(text)
print(f"Embedding dimension: {len(embedding)}")  # 1536

Using Cohere

import cohere

co = cohere.Client("your-api-key")

def get_embedding_cohere(text, model="embed-english-v3.0"):
    """Generate embedding for text using Cohere."""
    response = co.embed(
        texts=[text],
        model=model,
        input_type="search_document"  # or "search_query" for queries
    )
    return response.embeddings[0]

# Example
embedding = get_embedding_cohere("How to repair a car engine")
print(f"Embedding dimension: {len(embedding)}")  # 1024

Using Sentence Transformers (Open Source)

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

def get_embedding_local(text):
    """Generate embedding using local model."""
    return model.encode(text).tolist()

# Example
embedding = get_embedding_local("How to repair a car engine")
print(f"Embedding dimension: {len(embedding)}")  # 384

Indexing Documents with Vectors

Single Document

def index_document(client, index_name, doc_id, title, content):
    """Index a document with its vector embedding."""
    # Generate embedding
    embedding = get_embedding(content)
    
    # Index document
    document = {
        'title': title,
        'content': content,
        'content_vector': embedding,
        'metadata': {
            'source': 'documentation',
            'created_at': '2025-01-01T00:00:00Z',
            'category': 'automotive'
        }
    }
    
    response = client.index(
        index=index_name,
        id=doc_id,
        body=document
    )
    
    return response

# Example
index_document(
    client,
    'documents',
    '1',
    'Car Engine Repair Guide',
    'Learn how to diagnose and repair common car engine problems...'
)

Bulk Indexing

from opensearchpy import helpers

def bulk_index_documents(client, index_name, documents):
    """Bulk index multiple documents with embeddings."""
    actions = []
    
    for doc in documents:
        # Generate embedding
        embedding = get_embedding(doc['content'])
        
        # Prepare action
        action = {
            '_index': index_name,
            '_id': doc['id'],
            '_source': {
                'title': doc['title'],
                'content': doc['content'],
                'content_vector': embedding,
                'metadata': doc.get('metadata', {})
            }
        }
        actions.append(action)
    
    # Bulk index
    success, failed = helpers.bulk(client, actions)
    print(f"Indexed {success} documents, {len(failed)} failed")
    
    return success, failed

# Example
documents = [
    {
        'id': '1',
        'title': 'Car Engine Repair',
        'content': 'How to repair car engines...',
        'metadata': {'category': 'automotive'}
    },
    {
        'id': '2',
        'title': 'Vehicle Maintenance',
        'content': 'Regular vehicle maintenance tips...',
        'metadata': {'category': 'automotive'}
    }
]

bulk_index_documents(client, 'documents', documents)

Vector Search Queries

Basic k-NN Search

def vector_search(client, index_name, query_text, k=10):
    """Perform k-NN vector search."""
    # Generate query embedding
    query_vector = get_embedding(query_text)
    
    # k-NN search
    search_body = {
        'size': k,
        'query': {
            'knn': {
                'content_vector': {
                    'vector': query_vector,
                    'k': k
                }
            }
        }
    }
    
    response = client.search(index=index_name, body=search_body)
    
    # Extract results
    results = []
    for hit in response['hits']['hits']:
        results.append({
            'id': hit['_id'],
            'score': hit['_score'],
            'title': hit['_source']['title'],
            'content': hit['_source']['content']
        })
    
    return results

# Example
results = vector_search(client, 'documents', 'vehicle service and maintenance', k=5)
for result in results:
    print(f"Score: {result['score']:.4f} - {result['title']}")

Filtered Vector Search

Combine vector search with filters:

def filtered_vector_search(client, index_name, query_text, category, k=10):
    """Vector search with metadata filters."""
    query_vector = get_embedding(query_text)
    
    search_body = {
        'size': k,
        'query': {
            'bool': {
                'must': [
                    {
                        'knn': {
                            'content_vector': {
                                'vector': query_vector,
                                'k': k
                            }
                        }
                    }
                ],
                'filter': [
                    {
                        'term': {
                            'metadata.category': category
                        }
                    }
                ]
            }
        }
    }
    
    response = client.search(index=index_name, body=search_body)
    return response['hits']['hits']

# Example
results = filtered_vector_search(
    client,
    'documents',
    'repair guide',
    category='automotive',
    k=5
)

Hybrid Search (Full-Text + Vector)

Combine traditional keyword search with vector search for best results:

def hybrid_search(client, index_name, query_text, k=10, text_weight=0.3, vector_weight=0.7):
    """Hybrid search combining full-text and vector search."""
    query_vector = get_embedding(query_text)
    
    search_body = {
        'size': k,
        'query': {
            'bool': {
                'should': [
                    # Full-text search
                    {
                        'multi_match': {
                            'query': query_text,
                            'fields': ['title^2', 'content'],
                            'boost': text_weight
                        }
                    },
                    # Vector search
                    {
                        'knn': {
                            'content_vector': {
                                'vector': query_vector,
                                'k': k,
                                'boost': vector_weight
                            }
                        }
                    }
                ]
            }
        }
    }
    
    response = client.search(index=index_name, body=search_body)
    return response['hits']['hits']

# Example
results = hybrid_search(
    client,
    'documents',
    'car repair tips',
    k=10,
    text_weight=0.3,  # 30% weight to keyword search
    vector_weight=0.7  # 70% weight to vector search
)

RAG (Retrieval Augmented Generation)

Use vector search to retrieve relevant context for LLM prompts:

from openai import OpenAI

openai_client = OpenAI(api_key="your-api-key")

def rag_query(opensearch_client, index_name, question, k=3):
    """Answer question using RAG with vector search."""
    # Step 1: Retrieve relevant documents
    results = vector_search(opensearch_client, index_name, question, k=k)
    
    # Step 2: Build context from top results
    context = "\n\n".join([
        f"Document {i+1}:\n{result['content']}"
        for i, result in enumerate(results)
    ])
    
    # Step 3: Generate answer with LLM
    prompt = f"""Answer the following question based on the provided context.

Context:
{context}

Question: {question}

Answer:"""
    
    response = openai_client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful assistant that answers questions based on the provided context."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.7,
        max_tokens=500
    )
    
    answer = response.choices[0].message.content
    
    return {
        'answer': answer,
        'sources': results
    }

# Example
result = rag_query(
    client,
    'documents',
    'How do I change my car oil?',
    k=3
)

print(f"Answer: {result['answer']}\n")
print("Sources:")
for source in result['sources']:
    print(f"- {source['title']} (score: {source['score']:.4f})")

Performance Optimization

1. Batch Embedding Generation

Generate embeddings in batches to reduce API calls:

def batch_get_embeddings(texts, batch_size=100):
    """Generate embeddings in batches."""
    embeddings = []
    
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i+batch_size]
        response = client_openai.embeddings.create(
            input=batch,
            model="text-embedding-3-small"
        )
        embeddings.extend([item.embedding for item in response.data])
    
    return embeddings

2. Optimize Index Settings

# For large datasets
index_body = {
    'settings': {
        'index': {
            'knn': True,
            'knn.algo_param.ef_search': 100,  # Query-time parameter
            'refresh_interval': '30s'  # Reduce refresh frequency during bulk indexing
        }
    },
    'mappings': {
        'properties': {
            'content_vector': {
                'type': 'knn_vector',
                'dimension': 1536,
                'method': {
                    'name': 'hnsw',
                    'engine': 'faiss',
                    'space_type': 'cosinesimil',
                    'parameters': {
                        'ef_construction': 256,  # Higher for better recall
                        'm': 32  # Higher for better recall
                    }
                }
            }
        }
    }
}

3. Use Approximate Search

For very large datasets, use approximate k-NN:

search_body = {
    'size': k,
    'query': {
        'knn': {
            'content_vector': {
                'vector': query_vector,
                'k': k,
                'method_parameters': {
                    'ef_search': 100  # Lower = faster but less accurate
                }
            }
        }
    }
}

Best Practices

1. Embedding Model Selection

OpenAI text-embedding-3-small (1536 dims): Best balance of cost/performance
OpenAI text-embedding-3-large (3072 dims): Highest quality, more expensive
Cohere embed-english-v3.0 (1024 dims): Good for English text
Sentence Transformers (384-768 dims): Free, run locally, lower quality

2. Chunking Strategy

For long documents, split into chunks:

def chunk_document(text, chunk_size=500, overlap=50):
    """Split document into overlapping chunks."""
    words = text.split()
    chunks = []
    
    for i in range(0, len(words), chunk_size - overlap):
        chunk = ' '.join(words[i:i+chunk_size])
        chunks.append(chunk)
    
    return chunks

# Example
long_document = "..." # 5000 words
chunks = chunk_document(long_document, chunk_size=500, overlap=50)

for i, chunk in enumerate(chunks):
    index_document(client, 'documents', f'doc1_chunk{i}', f'Document 1 - Part {i+1}', chunk)

3. Metadata Enrichment

Add metadata for better filtering:

document = {
    'title': title,
    'content': content,
    'content_vector': embedding,
    'metadata': {
        'source': 'documentation',
        'created_at': '2025-01-01T00:00:00Z',
        'category': 'automotive',
        'tags': ['repair', 'maintenance', 'engine'],
        'author': 'John Doe',
        'version': '1.0'
    }
}

4. Reranking

Improve results with reranking:

def rerank_results(results, query_text, top_k=5):
    """Rerank results using cross-encoder."""
    from sentence_transformers import CrossEncoder
    
    model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
    
    # Score each result
    pairs = [[query_text, result['content']] for result in results]
    scores = model.predict(pairs)
    
    # Combine with original scores
    for i, result in enumerate(results):
        result['rerank_score'] = scores[i]
    
    # Sort by rerank score
    reranked = sorted(results, key=lambda x: x['rerank_score'], reverse=True)
    
    return reranked[:top_k]

Monitoring Vector Search

Track Key Metrics

# Search latency
import time

start = time.time()
results = vector_search(client, 'documents', query_text, k=10)
latency = time.time() - start

print(f"Search latency: {latency*1000:.2f}ms")

# Result quality
avg_score = sum(r['score'] for r in results) / len(results)
print(f"Average relevance score: {avg_score:.4f}")

A/B Testing

Compare different embedding models or search strategies:

def compare_search_methods(query_text):
    """Compare vector vs hybrid search."""
    # Vector only
    vector_results = vector_search(client, 'documents', query_text, k=10)
    
    # Hybrid
    hybrid_results = hybrid_search(client, 'documents', query_text, k=10)
    
    print("Vector Search Top 3:")
    for r in vector_results[:3]:
        print(f"  {r['title']} ({r['score']:.4f})")
    
    print("\nHybrid Search Top 3:")
    for r in hybrid_results[:3]:
        print(f"  {r['title']} ({r['score']:.4f})")

Next Steps

OpenSearch Serverless Main Guide - Setup and deployment
Bastion Orchestrator - Access management
Infrastructure Modules - All infrastructure modules

Overview​

Why Vector Search?​

Use Cases​

How Vector Search Works​

Creating a Vector Search Index​

Step 1: Define Index Mapping​

Vector Field Configuration​

Generating Embeddings​

Using OpenAI​

Using Cohere​

Using Sentence Transformers (Open Source)​

Indexing Documents with Vectors​

Single Document​

Bulk Indexing​

Vector Search Queries​

Basic k-NN Search​

Filtered Vector Search​

Hybrid Search (Full-Text + Vector)​

RAG (Retrieval Augmented Generation)​

Performance Optimization​

1. Batch Embedding Generation​

2. Optimize Index Settings​

3. Use Approximate Search​

Best Practices​

1. Embedding Model Selection​

2. Chunking Strategy​

3. Metadata Enrichment​

4. Reranking​

Monitoring Vector Search​

Track Key Metrics​

A/B Testing​

Next Steps​

References​