Amazon Elasticsearch Service recently added support for k-nearest neighbor search. It enables you to run high scale and low latency k-NN search across thousands of dimensions with the same ease as running any regular Elasticsearch query.
k-NN similarity search is powered by Open Distro for Elasticsearch, an Apache 2.0-licensed distribution of Elasticsearch.
In this post, I’ll show you how to build a scalable similarity questions search api using Amazon Sagemaker, Amazon Elasticsearch, Amazon Elastic File System (EFS) and Amazon ECS.
What we’ll cover in this example:
- Deploy and run a Sagemaker notebook instance in VPC.
- Mount EFS to notebook instance.
- Download Quora Question Pairs dataset, then map variable-length questions from dataset to fixed-length vectors using DistilBERT model.
- Create downstream task to reduce embedding dimensions and save sentence embedder to EFS.
- Transform questions text to vectors, and index all vectors to Elasticsearch.
- Deploy a containerized Flask rest api  to ECS. 
The following diagram shows the architecture of above steps:
Deploying and running Sagemaker notebook instance in VPC
First, Let’s create a Sagemaker notebook instance which connects to Elasticsearch and make sure then are in the same VPC. 
To configure VPC options in Sagemaker console, within Network section of Create notebook instance page, set VPC network configuration details such as VPC subnet IDs and security group IDs:
Mounting EFS to notebook instance
We will do all the necessary sentence transforming steps in a 
SageMaker notebook (code found 
here).
%%sh
mkdir model
sudo mount -t nfs \
    -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \
    fs-xxxxxx.efs.ap-southeast-2.amazonaws.com:/ \
    ./model
NOTE:
fs-xxxxx.efs.ap-southeast-2.amazonaws.com
 is DNS name of EFS.
EFS Mount targets and Sagemaker are in same VPC.
Mapping variable-length questions to fixed-length vectors using DistilBERT model
To run nearest neighbor search, we have to get sentence and tokens embeddings. We can use 
sentence-transformers which is a sentence embeddings using BERT / RoBERTa / DistilBERT / ALBERT / XLNet with PyTorch. It allows us to map sentences into fixed-length representations in just a few lines of code.
We will use light weight DistilBERT model to generate sentence embeddings in this example, please note that the number of hidden units of DistilBERT is 768. This dimension seems too big to Elasticsearch index, we can reduce the dimension to 256 by adding a dense layer after the pooling:
from sentence_transformers import models, losses, SentenceTransformer
word_embedding_model = models.DistilBERT('distilbert-base-uncased')
pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(),
                            pooling_mode_mean_tokens=True,
                            pooling_mode_cls_token=False,
                            pooling_mode_max_tokens=False)
# reduce dim from 768 to 256
dense_model = models.Dense(in_features=768, out_features=256)
transformer = SentenceTransformer(modules=[word_embedding_model, pooling_model, dense_model])
Next, Save sentence embedder to EFS mounted directory:
transformer.save("model/transformer-v1/")
import pandas as pd
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()
api.dataset_download_files("quora/question-pairs-dataset", path='quora_dataset', unzip=True)
Next, extract full text of each question to dataframe:
import pandas as pd
pd.set_option('display.max_colwidth', -1)
df = pd.read_csv("quora_dataset/questions.csv", usecols=["qid1", "question1"], index_col=False)
df = df.sample(frac=1).reset_index(drop=True)
df_questions_imp = df[:5000]
Transforming questions text to vectors, and index all vectors to Elasticsearch
To start with, create a kNN index,
import boto3
from requests_aws4auth import AWS4Auth
from elasticsearch import Elasticsearch, RequestsHttpConnection
region = 'ap-southeast-2'
service = 'es'
ssm = boto3.client('ssm', region_name=region)
es_parameter = ssm.get_parameter(Name='/KNNSearch/ESUrl')
es_host = es_parameter['Parameter']['Value']
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key,
                   region, service, session_token=credentials.token)
                   
es = Elasticsearch(
    hosts=[{'host': es_host, 'port': 443}],
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection
)
knn_index = {
    "settings": {
        "index.knn": True
    },
    "mappings": {
        "properties": {
            "question_vector": {
                "type": "knn_vector",
                "dimension": 256
            }
        }
    }
}
es.indices.create(index="questions",body=knn_index,ignore=400)
then transform and index question vectors to Elasticsearch.
def es_import(df):
    for index, row in df.iterrows():
        vectors = local_transformer.encode([row["question1"]])
        es.index(index='questions',
                 id=row["qid1"], 
                 body={"question_vector": vectors[0].tolist(), 
                       "question": row["question1"]})
        
es_import(df_questions_imp)
Questions in Elasticsearch have the following structure:
{'question_vector': [-0.06435434520244598, ... ,0.0726890116930008],
'question': 'How hard is it to learn to play piano as an adult?'}
We have embedded questions into fixed-length vectors and indexed all of vectors to Elasticsearch. Let’s create a rest api connects to Elasticsearch and test it out!
Deploying a containerized Flask rest API
We will use sample cloud formation template to create ECS Cluster and service in VPC (templates and bash scripts found 
here).
We will use EFS volumes with ECS, search flow is 
- Flask application loads saved sentence embedder in EFS volume, 
- transform input parameter sentence to vectors, 
- then query K-Nearest neighbors in Elasticsearch.
import json
import boto3
from flask import Flask
from flask_restful import reqparse, Resource, Api
from elasticsearch import Elasticsearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
from sentence_transformers import SentenceTransformer
app = Flask(__name__)
api = Api(app)
region = 'ap-southeast-2'
ssm = boto3.client('ssm', region_name=region)
es_parameter = ssm.get_parameter(Name='/KNNSearch/ESUrl')
host = es_parameter['Parameter']['Value']
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key,
                   region, service, session_token=credentials.token)
parser = reqparse.RequestParser()
parser.add_argument('question')
parser.add_argument('size')
parser.add_argument('min_score')
es = Elasticsearch(
    hosts=[{'host': host, 'port': 443}],
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection
)
transform_model = SentenceTransformer(
    'model/transformer-v1/')
class SimilarQuestionList(Resource):
    def post(self):
        args = parser.parse_args()
        sentence_embeddings = transform_model.encode([args
                                                      ["question"]])
        res = es.search(index="questions",
                        body={
                            "size": args.get("size", 5),
                            "_source": {
                                "exclude": ["question_vector"]
                            },
                            "min_score": args.get("min_score", 0.3),
                            "query": {
                                "knn": {
                                    "question_vector": {
                                        "vector": sentence_embeddings[0].tolist(),
                                        "k": args.get("size", 5)
                                    }
                                }
                            }
                        })
        return res, 201
api.add_resource(SimilarQuestionList, '/search')
if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0', port=8000)
We now have flask api running in ECS container, let’s use the basic search function to find similar questions when it comes to the query: “What is best way to make money online?”:
$curl --data 'question=What is best way to make money online?' --data 'size=5' --data 'min_score=0.3'  -X POST http://knn-s-publi-xxxx-207238135.ap-southeast-2.elb.amazonaws.com/search
Check out the result:
{
    "took": 10,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 15,
            "relation": "eq"
        },
        "max_score": 0.69955945,
        "hits": [
            {
                "_index": "questions",
                "_type": "_doc",
                "_id": "210905",
                "_score": 0.69955945,
                "_source": {
                    "question": "What is an easy way make money online?"
                }
            },
            {
                "_index": "questions",
                "_type": "_doc",
                "_id": "547612",
                "_score": 0.61820024,
                "_source": {
                    "question": "What is the best way to make passive income online?"
                }
            },
            {
                "_index": "questions",
                "_type": "_doc",
                "_id": "1891",
                "_score": 0.5624176,
                "_source": {
                    "question": "What are the easy ways to earn money online?"
                }
            },
            {
                "_index": "questions",
                "_type": "_doc",
                "_id": "197580",
                "_score": 0.46031988,
                "_source": {
                    "question": "What is the best way to download YouTube videos for free?"
                }
            },
            {
                "_index": "questions",
                "_type": "_doc",
                "_id": "359930",
                "_score": 0.45543614,
                "_source": {
                    "question": "What is the best way to get traffic on your website?"
                }
            }
        ]
    }
}
As you can see, the result was pretty amazing, you can also fine-tune your own sentence embedding methods, so that you get task-specific sentence embeddings for k-NN search.
Great! We have what we need! I hope you have found this article useful.