# Cohere Embedding Models Source: https://thenile.dev/docs/ai-embeddings/embedding_models/cohere ## Availble models Cohere has multiple models with various sizes and languages, here's the one we use in our example. You can replace it with any embedding model supported by Cohere: | Model | Dimensions | Max Tokens | Cost | MTEB Avg Score | Similarity Metrics | | ------------------ | ---------- | ---------- | ------------------ | -------------- | ------------------ | | embed-english-v3.0 | 1024 | 512 | \$0.10 / 1M Tokens | 64.5 | cosine | ## Usage Cohere's API includes an `inputType` that allows you to specify the type of input you are embedding. For example, is it a document that you will later retrieve? a question that you use for searching documents? Texts for classification or clustering? ### Installing dependencies ```bash npm install @niledatabase/server cohere-ai ``` ### Generating embeddings with Cohere ```javascript const { CohereClient } = require("cohere-ai"); const COHERE_API_KEY = "your cohere api key"; const cohere = new CohereClient({ token: COHERE_API_KEY, }); const model = "embed-english-v3.0"; // replace with your favorite cohere model const input_text = `The future belongs to those who believe in the beauty of their dreams.`; const doc_vec_resp = await cohere.embed({ texts: [input_text], // you can pass multiple texts to embed in one batch call model: model, inputType: "search_document", // for embedding of documents stored in pg_vector }); const question = "Who does the future belong to?"; const question_vec_resp = await cohere.embed({ texts: [question], model: model, inputType: "search_query", // for embedding of questions used to search documents }); // Cohere's response is an object with an array of vector embeddings // The object has other useful info like the model used, input text, billing, etc. const doc_vec = doc_vec_resp.embeddings[0]; const question_vec = question_vec_resp.embeddings[0]; ``` ### Storing and retrieving the embeddings ```javascript // set up Nile as the vector store const { Nile } = await import("@niledatabase/server"); const NILEDB_USER = "you need a nile user with access to a nile database"; const NILEDB_PASSWORD = "and a password for that user"; const nile = Nile({ user: NILEDB_USER, password: NILEDB_PASSWORD, }); // store vector in a table await nile.db.query("INSERT INTO embeddings (embedding) values ($1)", [ JSON.stringify(doc_vec.map((v) => Number(v))), ]); // search for similar vectors let db_resp = await nile.db.query( "select embedding from embeddings order by embedding<=>$1 limit 1", [JSON.stringify(question_vec.map((v) => Number(v)))] ); // Postgres returns an object, with array of rows each column is a // property of the row the vector is represented as a string let similar_str = db_resp.rows[0].embedding; let similar = similar_str .substring(1, similar_str.length - 1) .split(",") .map((v) => parseFloat(v)); // check that we got the same vector back let same = true; for (let i = 0; i < similar.length; i++) { if (Math.abs(similar[i] - doc_vec[i]) > 0.000001) { same = false; } } console.log("got same vector? " + same); ``` # Embedding Models served by DeepInfra Source: https://thenile.dev/docs/ai-embeddings/embedding_models/deepinfra ## Availble models Fireworks provide a larger variety of models, here's the one we use in our example. You can replace it with any embedding model supported by DeepInfra: | Model | Dimensions | Max Tokens | Cost | MTEB Avg Score | Similarity Metric | | ------------------ | ---------- | ---------- | ------------------- | -------------- | ----------------- | | thenlper/gte-large | 1024 | 512 | \$0.010 / 1M tokens | 63.23 | cosine | ## Usage Note that DeepInfra doesn't have an SDK. Their documentation shows the use of the REST API directly with a client library in your language of choice, or you can use OpenAI's SDK - DeepInfra is compatible with OpenAI's API. In the examples below, we use OpenAI's SDK with DeepInfra URL, API key and models. ### Installing dependencies ```bash npm install @niledatabase/server openai ``` ### Generating embeddings with DeepInfra ```javascript // set up OpenAI SDK with Fireworks configuration const { OpenAI } = await import("openai"); const DEEPINFRA_API_KEY = "your deepinfra api key"; const openai = new OpenAI({ apiKey: DEEPINFRA_API_KEY, baseURL: "https://api.deepinfra.com/v1/openai", }); const MODEL = "thenlper/gte-large"; // or any other model supported by DeepInfra const input_text = `The future belongs to those who believe in the beauty of their dreams.`; // generate embeddings let resp = await openai.embeddings.create({ model: MODEL, input: input_text, }); // OpenAI's response is an object with an array of // objects that contain the vector embeddings let embedding = resp.data[0].embedding; ``` ### Storing and retrieving the embeddings ```javascript // set up Nile as the vector store const { Nile } = await import("@niledatabase/server"); const NILEDB_USER = "you need a nile user with access to a nile database"; const NILEDB_PASSWORD = "and a password for that user"; const nile = Nile({ user: NILEDB_USER, password: NILEDB_PASSWORD, }); // store vector in a table await nile.db.query("INSERT INTO embeddings (embedding) values ($1)", [ JSON.stringify(embedding.map((v) => Number(v))), ]); // search for similar vectors let db_resp = await nile.db.query( "select embedding from embeddings order by embedding<=>$1 limit 1", [JSON.stringify(embedding.map((v) => Number(v)))] ); // Postgres returns an object, with array of rows each column is a // property of the row the vector is represented as a string let similar_str = db_resp.rows[0].embedding; let similar = similar_str .substring(1, similar_str.length - 1) .split(",") .map((v) => parseFloat(v)); // check that we got the same vector back let same = true; for (let i = 0; i < similar.length; i++) { if (Math.abs(similar[i] - embedding[i]) > 0.000001) { same = false; } } console.log("got same vector? " + same); ``` ## Additional information ### Reducing dimensions Using larger embeddings generally costs more and consumes more compute, memory and storage than using smaller embeddings. This is especially true for embeddings stored with `pg_vector`. When storing embeddings in Postgres, it is important that each vector will be stored in a row that fits in a single PG block (typically 8K). If this size is exceeded, the vector will be stored in TOAST storage which can slow down queries. In addition vectors that are "TOASTed" are not indexed, which means you can't reliably use vector indexes. Fireworks supports multiple models. `gte-large` and `nomic-embed-text-v1.5` are two of the models available. The `gte-large` model has 1024 dimensions and does not support scaling down. The `nomic-embed-text-v1.5` model has 768 dimensions and can scale down to 512, 256, 128 and 64. # Embedding Models served by Fireworks Source: https://thenile.dev/docs/ai-embeddings/embedding_models/fireworks ## Available models Fireworks provide a larger variety of models, here are two of them. You can use any of the models supported by Fireworks in the examples below: | Model | Dimensions | Max Tokens | Cost | MTEB Avg Score | Similarity Metric | | ------------------------------ | ----------------- | ---------- | ------------------- | -------------- | ----------------- | | thenlper/gte-large | 1024 | 512 | \$0.016 / 1M tokens | 63.23 | cosine | | nomic-ai/nomic-embed-text-v1.5 | 768 (scales down) | 8192 | \$0.008 / 1M tokens | 62.28 | cosine | ## Usage Note that Fireworks doesn't have an SDK. Their documentation shows the use of the REST API directly with a client library in your language of choice, or you can use OpenAI's SDK - Fireworks is compatible with OpenAI's API. In the examples below, we use OpenAI's SDK with Fireworks URL, API key and models. ### Installing dependencies ```bash npm install @niledatabase/server openai ``` ### Generating embeddings with Fireworks ```javascript // set up OpenAI SDK with Fireworks configuration const { OpenAI } = await import("openai"); const FIREWORKS_API_KEY = "your fireworks api key"; const openai = new OpenAI({ apiKey: FIREWORKS_API_KEY, baseURL: "https://api.fireworks.ai/inference/v1", }); const MODEL = "thenlper/gte-large"; // or nomic-ai/nomic-embed-text-v1.5 const input_text = `The future belongs to those who believe in the beauty of their dreams.`; // generate embeddings let resp = await openai.embeddings.create({ model: MODEL, input: input_text, // nomic-ai/nomic-embed-text-v1.5 can scale down, thenlper/gte-large can't // dimensions: 512, }); // OpenAI's response is an object with an array of // objects that contain the vector embeddings let embedding = resp.data[0].embedding; ``` ### Storing and retrieving the embeddings ```javascript // set up Nile as the vector store const { Nile } = await import("@niledatabase/server"); const NILEDB_USER = "you need a nile user with access to a nile database"; const NILEDB_PASSWORD = "and a password for that user"; const nile = Nile({ user: NILEDB_USER, password: NILEDB_PASSWORD, }); // store vector in a table await nile.db.query("INSERT INTO embeddings (embedding) values ($1)", [ JSON.stringify(embedding.map((v) => Number(v))), ]); // search for similar vectors let db_resp = await nile.db.query( "select embedding from embeddings order by embedding<=>$1 limit 1", [JSON.stringify(embedding.map((v) => Number(v)))] ); // Postgres returns an object, with array of rows each column is a // property of the row the vector is represented as a string let similar_str = db_resp.rows[0].embedding; let similar = similar_str .substring(1, similar_str.length - 1) .split(",") .map((v) => parseFloat(v)); // check that we got the same vector back let same = true; for (let i = 0; i < similar.length; i++) { if (Math.abs(similar[i] - embedding[i]) > 0.000001) { same = false; } } console.log("got same vector? " + same); ``` ## Additional information ### Reducing dimensions Using larger embeddings generally costs more and consumes more compute, memory and storage than using smaller embeddings. This is especially true for embeddings stored with `pg_vector`. When storing embeddings in Postgres, it is important that each vector will be stored in a row that fits in a single PG block (typically 8K). If this size is exceeded, the vector will be stored in TOAST storage which can slow down queries. In addition vectors that are "TOASTed" are not indexed, which means you can't reliably use vector indexes. Fireworks supports multiple models. `gte-large` and `nomic-embed-text-v1.5` are two of the models available. The `gte-large` model has 1024 dimensions and does not support scaling down. The `nomic-embed-text-v1.5` model has 768 dimensions and can scale down to 512, 256, 128 and 64. # Google Embedding Models Source: https://thenile.dev/docs/ai-embeddings/embedding_models/google Note that Google has to AI API products: * [Google Cloud AI](https://cloud.google.com/ai-platform), also known as Vertex AI. * [Gemini API](https://ai.google.dev/gemini-api/docs), also known as Generative Language API. The NodeJS SDK for Gemini API is significantly nicer, so the example below will use it. However, Vertex AI has more embedding models - multi-lingual, multi-modal, images and video. ## Availble models The example below is based on the new `text-embedding-preview-0409` model, also called `text-embedding-0004` in Gemini APIs.\ It is the best text embedding model available from Google and ranks well in the MTEB benchmark. | Model | Dimensions | Max Tokens | Cost | MTEB Avg Score | Similarity Metric | | ------------------------------------------------- | ----------------- | ---------- | ------------------------------------------- | -------------- | ----------------- | | text-embedding-preview-0409 / text-embedding-0004 | 768 (scales down) | 2048 | \$0.025/1M tokens in Vertex, free in Gemini | 66.31 | cosine, L2 | ## Usage To use Google's embedding models, you need a Google Cloud project. The example below uses Gemini, so you will need to have [Gemini Generative Language APIs enabled](https://console.developers.google.com/apis/api/generativelanguage.googleapis.com/overview). and you will also need an API key with permissions to access the Generative Language API. You can get one by going to [APIs & Services -> Credentials](https://console.cloud.google.com/apis/credentials) in your Google Cloud Console. (You can also use Google's AI Studio to get an API key). Vertex has separate API to enable, separate key permissions, separate pricing and a different SDK (which we don't document here). ### Installing dependencies ```bash npm install @niledatabase/server @google/generative-ai ``` ### Generating embeddings with Google ```javascript const { GoogleGenerativeAI } = require("@google/generative-ai"); const GOOGLE_API_KEY = "my-google-api-key"; const model = "models/text-embedding-004"; // or your favorite Google model const genAI = new GoogleGenerativeAI(GOOGLE_API_KEY); const embed = genAI.getGenerativeModel({ model: model, outputDimensionality: 768, // optional, default 768 but can be scaled down }); const input_text = `The future belongs to those who believe in the beauty of their dreams.`; // Note, if you don't want to specify taskType and title, you can simply use: // const doc_vec_resp = await model.embedContent(input_text); const doc_vec_resp = await embed.embedContent({ content: { parts: [{ text: input_text }] }, taskType: "RETRIEVAL_DOCUMENT", // for embeddings of documents stored in pg_vector // title of the document. can be used with RETRIEVAL_DOCUMENT and // possibly improve the quality of the embeddings // title: "" }); const question = "Who does the future belong to?"; const question_vec_resp = await embed.embedContent({ content: { parts: [{ text: question }] }, taskType: "RETRIEVAL_QUERY", //embeddings of questions used to search documents }); // Google returns a response object with an embedding field that // contains the embeddings in the value field const doc_vec = doc_vec_resp.embedding.values; const question_vec = question_vec_resp.embedding.values; ``` ### Storing and retrieving the embeddings ```javascript // set up Nile as the vector store const { Nile } = await import("@niledatabase/server"); const NILEDB_USER = "you need a nile user with access to a nile database"; const NILEDB_PASSWORD = "and a password for that user"; const nile = Nile({ user: NILEDB_USER, password: NILEDB_PASSWORD, }); // create table to store vectors - vector size must match the model dimensions await nile.db.query( "CREATE TABLE IF NOT EXISTS embeddings (embedding vector(768))" ); // store vector in a table await nile.db.query("INSERT INTO embeddings (embedding) values ($1)", [ JSON.stringify(doc_vec.map((v) => Number(v))), ]); // search for similar vectors let db_resp = await nile.db.query( "select embedding from embeddings order by embedding<=>$1 limit 1", [JSON.stringify(question_vec.map((v) => Number(v)))] ); // Postgres returns an object, with array of rows each column is a // property of the row the vector is represented as a string let similar_str = db_resp.rows[0].embedding; let similar = similar_str .substring(1, similar_str.length - 1) .split(",") .map((v) => parseFloat(v)); // check that we got the same vector back let same = true; for (let i = 0; i < similar.length; i++) { if (Math.abs(similar[i] - doc_vec[i]) > 0.000001) { same = false; } } console.log("got same vector? " + same); ``` ## Additional notes ### Scale down Google's `text-embedding-0004` model has 768 dimensions, but you can scale it down to lower dimensions. The older model, `text-embedding-0001` does not support scaling down. ### Task types Google's documentation about taskTypes is a bit confusing. Some documents say that `taskType` is only supported by `text-embedding-0001` model, and other say that it works with `0004` as well. My experiments showed that `taskType` works with `0004`, so I have included it in the example above. I assume there are typos in the docs. ### Distance metrics Google documentation doesn't mention the distance metric used for similarity search and doesn't mention anything about normalization either. However, the MTEB benchmark records for Google's model show use of Cosine and L2 distance. # Open AI Embedding Models Source: https://thenile.dev/docs/ai-embeddings/embedding_models/open_ai ## Availble models | Model | Dimensions | Max Tokens | Cost | MTEB Avg Score | Similarity Metric | | ---------------------- | ------------------ | ---------- | ------------------ | -------------- | ----------------------- | | text-embedding-3-small | 1536 (scales down) | 8191 | \$0.02 / 1M tokens | 62.3 | cosine, dot product, L2 | | text-embedding-3-large | 3072 (scales down) | 8191 | \$0.13 / 1M tokens | 64.6 | cosine, dot product, L2 | ## Usage ### Installing dependencies ```bash npm install @niledatabase/server openai ``` ### Generating embeddings with OpenAI ```javascript // set up OpenAI const { OpenAI } = await import("openai"); const OPENAI_API_KEY = "your openai api key"; const openai = new OpenAI({ apiKey: OPENAI_API_KEY, // not necessary if you have this env variable }); const MODEL = "text-embedding-3-small"; // or text-embedding-3-large const input_text = `The future belongs to those who believe in the beauty of their dreams.`; // generate embeddings let resp = await openai.embeddings.create({ model: MODEL, input: input_text, dimensions: 1024, // OpenAI models scale down to lower dimensions }); // OpenAI's response is an object with an array of // objects that contain the vector embeddings let embedding = resp.data[0].embedding; ``` ### Storing and retrieving the embeddings ```javascript // set up Nile as the vector store const { Nile } = await import("@niledatabase/server"); const NILEDB_USER = "you need a nile user with access to a nile database"; const NILEDB_PASSWORD = "and a password for that user"; const nile = Nile({ user: NILEDB_USER, password: NILEDB_PASSWORD, }); // store vector in a table await nile.db.query("INSERT INTO embeddings (embedding) values ($1)", [ JSON.stringify(embedding.map((v) => Number(v))), ]); // search for similar vectors let db_resp = await nile.db.query( "select embedding from embeddings order by embedding<=>$1 limit 1", [JSON.stringify(embedding.map((v) => Number(v)))] ); // Postgres returns an object, with array of rows each column is a // property of the row the vector is represented as a string let similar_str = db_resp.rows[0].embedding; let similar = similar_str .substring(1, similar_str.length - 1) .split(",") .map((v) => parseFloat(v)); // check that we got the same vector back let same = true; for (let i = 0; i < similar.length; i++) { if (Math.abs(similar[i] - embedding[i]) > 0.000001) { same = false; } } console.log("got same vector? " + same); ``` ## Additional information ### Reducing dimensions Using larger embeddings generally costs more and consumes more compute, memory and storage than using smaller embeddings. This is especially true for embeddings stored with `pg_vector`. When storing embeddings in Postgres, it is important that each vector will be stored in a row that fits in a single PG block (typically 8K). If this size is exceeded, the vector will be stored in TOAST storage which can slow down queries. In addition vectors that are "TOASTed" are not indexed, which means you can't reliably use vector indexes. Both OpenAI's embedding models were trained with a technique that allows developers to trade-off performance and cost of using embeddings. Specifically, developers can shorten embeddings (i.e. remove some numbers from the end of the sequence) without the embedding losing its concept-representing properties by passing in the dimensions API parameter. Using the dimensions parameter when creating the embedding (as shown above) is the preferred approach. In certain cases, you may need to change the embedding dimension after you generate it. When you change the dimension manually, you need to be sure to normalize the dimensions of the embedding. ### Distance metrics OpenAI embeddings are normalized to length 1, which means that you can use L2, cosine, and dot product similarity metrics interchangeably. Dot product is the fastest to compute. # Voyage Embedding Models Source: https://thenile.dev/docs/ai-embeddings/embedding_models/voyage ## Availble models Voyage has a collection of specialized models for embedding text from different domains: financial, legal (and large documents), code and medical. It also has highly ranked general embedding model that can be used for a variety of tasks, a general model that is optimized for retrieval, and a smaller cost-efficient retrieval model. We included a subset here for reference: | Model | Dimensions | Max Tokens | Cost | MTEB Avg Score | Similarity Metric | | ----------------------- | ---------- | ---------- | ------------------ | -------------- | ----------------------- | | voyage-large-2-instruct | 1024 | 16000 | \$0.12 / 1M tokens | 68.28 | cosine, dot product, L2 | | voyage-2 | 1024 | 4000 | \$0.1 / 1M tokens | | cosine, dot product, L2 | | voyage-code-2 | 1536 | 16000 | \$0.12 / 1M tokens | | cosine, dot product, L2 | | voyage-law-2 | 1024 | 16000 | \$0.12 / 1M tokens | | cosine, dot product, L2 | ## Usage Voyage has a Python, but not a Javascript, SDK. Their REST API is **almost** compatible with OpenAI's API, but unfortunately, their powerful general purpose model, `voyage-large-2-instruct` requires `inputType` parameter, which is not supported by OpenAI's SDK. Fortunately, LangChain has a nice community-contributed JS library for Voyage, which supports the `inputType` parameter. So we are going to use the LangChain community library in the example below. ### Installing dependencies ```bash npm install @niledatabase/server @langchain/community ``` ### Generating embeddings with Voyage ```javascript const VOYAGE_API_KEY = "your voyage api key"; const { VoyageEmbeddings } = await import( "@langchain/community/embeddings/voyage" ); // we need a separate model object for documents and queries // because the inputType is different and is in this object const embedDocs = new VoyageEmbeddings({ apiKey: VOYAGE_API_KEY, // In Node.js defaults to process.env.VOYAGEAI_API_KEY inputType: "document", // for the embedding of documents stored in pg_vector }); const embedQueries = new VoyageEmbeddings({ apiKey: VOYAGE_API_KEY, inputType: "query", // for the embedding of questions used to search documents }); const input_text = `The future belongs to those who believe in the beauty of their dreams.`; // embedDocuments is the batch API and can take multiple documents in one call const doc_vec_resp = await embedDocs.embedDocuments([input_text]); // it returns an array of embeddings, we have just one const doc_vec = doc_vec_resp[0]; const question = "Who does the future belong to?"; // embedQuery takes a single query string and returns a single vector const question_vec = await embedQueries.embedQuery(question); ``` ### Storing and retrieving the embeddings ```javascript // set up Nile as the vector store const { Nile } = await import("@niledatabase/server"); const NILEDB_USER = "you need a nile user with access to a nile database"; const NILEDB_PASSWORD = "and a password for that user"; const nile = Nile({ user: NILEDB_USER, password: NILEDB_PASSWORD, }); // store vector in a table await nile.db.query("INSERT INTO embeddings (embedding) values ($1)", [ JSON.stringify(doc_vec.map((v) => Number(v))), ]); // search for similar vectors let db_resp = await nile.db.query( "select embedding from embeddings order by embedding<=>$1 limit 1", [JSON.stringify(question_vec.map((v) => Number(v)))] ); // Postgres returns an object, with array of rows each column is a // property of the row the vector is represented as a string let similar_str = db_resp.rows[0].embedding; let similar = similar_str .substring(1, similar_str.length - 1) .split(",") .map((v) => parseFloat(v)); // check that we got the same vector back let same = true; for (let i = 0; i < similar.length; i++) { if (Math.abs(similar[i] - doc_vec[i]) > 0.000001) { same = false; } } console.log("got same vector? " + same); ``` ## Additional information ### Distance metrics Voyage embeddings are normalized to length 1, which means that you can use L2, cosine, and dot product similarity metrics interchangeably. Dot product is the fastest to compute. ### API Rate limits Note that on the free plan, the rate limits are quite strict. Voyage gives you only 3 API calls a minute. Which means that after you embedded some documents, you can only generate embeddings for 2 queries before running out of the minute and having to wait. # Generative AI & Embeddings Source: https://thenile.dev/docs/ai-embeddings/introduction Generative AI is truly transforming every industry and vertical in B2B. It significantly improves the experience of the product, the value the user receives and increases the overall productivity. For example, 1. A corporate wiki (eg. Confluence, Notion) where the employees can perform semantic search on their companies data 2. A chatbot for a CRM (eg. Salesforce, Hubspot) that sales reps can use to ask questions about past and future customer deals and can have a back and forth conversation 3. An autopilot for developers in their code repository (Github, Gitlab) to improve productivity. The autopilot should run on the companies code as well apart from learning on public repositories. ## Challenges with AI in B2B *Separate databases for vector embeddings and customer data* In recent years, numerous vector databases have emerged. This trend separates customers' core data and metadata from their embeddings, forcing companies to manage multiple databases. Such separation increases costs, significantly complicates application development and operation, and leads to inefficient resource utilization between vector embeddings and customer metadata. Moreover, keeping these databases synchronized with customer changes adds yet another layer of complexity. *Lack of isolation for customer workloads* AI workloads demand significantly more memory and compute than traditional SaaS workloads. Customer adoption and growth are much faster with AI, though some of this can be attributed to a hype cycle. Moreover, rebuilding indexes for embeddings requires additional resources and may impact production workloads. The ability to isolate customer data and their AI workloads has a significant impact on the customer's experience. Isolation is a key customer requirement (no one wants their data mixed with anyone else’s) and also critical to performance - 3 million embeddings is very large. 1000 tenants with 3000 embeddings each is very manageable - you get lower latency and 100% recall. *Scaling to billions of embeddings across customers* AI workloads scale to 50-100 million embeddings and in some cases even a billion embeddings. The biggest unlock with AI is the ability to search through unstructured data. All the data in different PDFs, Images, Wikis are now searchable. In addition, these unstructured data need to be chunked to do better contextual search. The explosion of vector embeddings requires a scalable database that can store billions of embeddings at a really low cost. *Connecting all the customer’s data to the OLTP* 90% of AI use cases involve extracting data from customers' various SaaS services, making it accessible to LLMs, and allowing users to write prompts against this data. For instance, Glean, an AI-first company, aggregates data from issue trackers, wikis, and Salesforce, making it searchable in one central location using LLMs. Glean must offer a streamlined process for each customer to extract data from their SaaS APIs and transfer it to Glean's database. This data needs to be stored and managed on a per-customer basis. Vector embeddings must be computed during data ingestion. In the AI era, ETL pipelines from SaaS services to OLTP databases need to be reimagined for each customer. *Cost of computing, storing and querying customer vector embeddings* The sheer scale of vector embeddings and their associated workloads significantly increases the cost of managing AI infrastructure. The primary expenses stem from compute and storage, which typically align with customer activity. Ideally, you'd want to pay only for the exact resources a customer uses for compute. Similarly, you'd prefer cheaper storage options when embeddings aren't being accessed. By implementing per-customer cost management for their workloads, it should be possible to reduce expenses by 10 to 20 times. ## What are embeddings? In generative AI development, embeddings refer to numerical representations of data that capture meaningful relationships, semantics, or context within the data. These representations are often used to convert high-dimensional, categorical, or unstructured data into lower-dimensional, continuous vectors that can be processed by machine learning models. 1. **Word Embeddings** Word embeddings are one of the most common types of embeddings. They represent words from a vocabulary as dense numerical vectors in a lower-dimensional space. Word embeddings capture semantic and syntactic relationships between words. For example, words with similar meanings will have similar embeddings, and word arithmetic can be performed using embeddings (e.g., "king" - "man" + "woman" ≈ "queen"). Well-known word embedding methods include Word2Vec, GloVe, FastText, and BERT. 2. **Sentence and Document Embeddings**: Instead of representing individual words, sentence and document embeddings represent entire sentences, paragraphs, or documents as numerical vectors.These embeddings aim to capture the overall meaning and context of the text. They are useful for applications like text summarization, document classification, and sentiment analysis. Models like BERT and the Universal Sentence Encoder can generate sentence and document embeddings. 3. **Image Embeddings**: In computer vision, image embeddings represent images as vectors in a lower-dimensional space. Image embeddings capture visual features, allowing generative AI models to understand and generate images or perform tasks like image search and object detection. Convolutional Neural Networks (CNNs) are commonly used to generate image embeddings. There are many ways to compare embeddings. L2 distance (Euclidean distance), inner product, and cosine distance are different similarity or dissimilarity measures used to compare vectors in multi-dimensional spaces. Let's use sentence embeddings to explain how embeddings help with finding the similarity between sentences: **Example Sentences**: **Sentence 1**: "The sun rises in the east every morning." **Sentence 2**: "The moon sets in the west at night." **Sentence 3**: "Bananas are a source of potassium." **Sentence Embeddings** (Hypothetical Values in a 4-dimensional space): * Sentence 1 Embedding: \[2.2, 1.0, -0.8, 0.9] * Sentence 2 Embedding: \[2.0, 1.3, 0.9, 1.1] * Sentence 3 Embedding: \[0.6, 2.4, 2.1, 0.8] **Similarity Calculation**: We'll use cosine similarity to measure the similarity between sentence embeddings. The closer the cosine similarity value is to 1, the more similar the sentences are: * Cosine Similarity between Sentence 1 and Sentence 2 ≈ 0.979 * Cosine Similarity between Sentence 1 and Sentence 3 ≈ 0.089 * Cosine Similarity between Sentence 2 and Sentence 3 ≈ 0.083 In this example, we used sentence embeddings to represent the entire sentences in a four-dimensional space. The cosine similarity between Sentence 1 and Sentence 2 is approximately 0.979, indicating high similarity because both sentences share a similar context related to celestial objects and directions. Sentence 3, which discusses a different topic, has lower similarity with both Sentence 1 and Sentence 2. ## Support for embeddings in Nile - pg\_vector Embeddings in Nile is enabled using pg\_vector, the Postgres extension. This extension is enabled by default in Nile and is available to be used once you create a database. You can read more about how to use pg\_vector and build a real world AI native B2B application in the pg\_vector section. # Model Context Protocol Source: https://thenile.dev/docs/ai-embeddings/model_context_protocol Model Context Protocol for Nile The Model Context Protocol (MCP) is a standardized interface that enables Large Language Models (LLMs) like Claude to interact with external tools and services. For Nile, MCP provides a natural language interface to manage databases, execute queries, and handle database operations through AI assistants. Key benefits: * Natural language interaction with your databases * Standardized interface across different AI tools * Type-safe operations with input validation * Real-time database management and querying * Secure credential handling ## Installing Nile Model Context Protocol ### Stable Version ```bash npm install @niledatabase/nile-mcp-server ``` This will install @niledatabase/nile-mcp-server in your node\_modules folder. For example: node\_modules/@niledatabase/nile-mcp-server/dist/ ### Latest Alpha/Preview Version ```bash npm install @niledatabase/nile-mcp-server@alpha ``` This will install @niledatabase/nile-mcp-server in your node\_modules folder. For example: node\_modules/@niledatabase/nile-mcp-server/dist/ ### Manual Installation ```bash # Clone the repository git clone https://github.com/yourusername/nile-mcp-server.git cd nile-mcp-server # Install dependencies npm install # Build the project npm run build ``` ### Configuration Create a `.env` file in your project root: ```env NILE_API_KEY=your_api_key_here NILE_WORKSPACE_SLUG=your_workspace_slug ``` You can get the API key and workspace from [Nile console](https://console.thenile.dev/). Note the workspace name at the top and you can create an API key from the Security tab on the left. ![API Key Workspace](https://mintlify.s3.us-west-1.amazonaws.com/nile/images/apikeyworkspace.png) ## Available Tools ### Database Management * **create-database**: Create new databases * **list-databases**: List all databases in workspace * **get-database**: Get detailed database information * **delete-database**: Delete databases ### Credential Management * **list-credentials**: List database credentials * **create-credential**: Create new database credentials ### Region Management * **list-regions**: List available regions for database creation ### SQL Operations * **execute-sql**: Execute SQL queries with results as markdown tables ### Resource Management * **read-resource**: Get detailed schema information * **list-resources**: List all database resources ### Tenant Management * **list-tenants**: List all database tenants * **create-tenant**: Create new tenants * **delete-tenant**: Delete existing tenants ## Server Modes ### STDIN Mode (Default) STDIN mode uses standard input/output for communication, ideal for direct integration with AI tools. 1. Start the server: ```bash node dist/index.js ``` 2. The server automatically uses STDIN/STDOUT for communication ### SSE Mode Server-Sent Events (SSE) mode enables HTTP-based, event-driven communication. 1. Configure SSE mode in `.env`: ```env MCP_SERVER_MODE=sse ``` 2. Start the server: ```bash node dist/index.js ``` 3. Connect to SSE endpoint: ```bash # Listen for events curl -N http://localhost:3000/sse ``` 4. Send commands: ```bash # Send a command curl -X POST http://localhost:3000/messages \ -H "Content-Type: application/json" \ -d '{ "type": "function", "name": "list-databases", "parameters": {} }' ``` ## Claude Desktop Setup 1. Install [Claude Desktop](https://claude.ai/desktop) 2. Configure MCP Server: * Open Claude Desktop * Go to Settings > MCP Servers * Click "Add Server" * Add configuration: ```json { "mcpServers": { "nile-database": { "command": "node", "args": [ "/path/to/your/nile-mcp-server/dist/index.js" ], "env": { "NILE_API_KEY": "your_api_key_here", "NILE_WORKSPACE_SLUG": "your_workspace_slug" } } } } ``` 3. Replace: * `/path/to/your/nile-mcp-server` with your actual project path * `your_api_key_here` with your Nile API key * `your_workspace_slug` with your workspace slug 4. Click Save and restart Claude Desktop ## Cursor Setup 1. Install [Cursor](https://cursor.sh) 2. Configure MCP Server: * Open Cursor * Go to Settings (⌘,) > Features > MCP Servers * Click "Add New MCP Server" 3. Add server configuration: * Name: `nile-database` * Command: ```bash env NILE_API_KEY=your_key NILE_WORKSPACE_SLUG=your_workspace node /absolute/path/to/nile-mcp-server/dist/index.js ``` 4. Replace: * `your_key` with your Nile API key * `your_workspace` with your workspace slug * `/absolute/path/to` with your project path 5. Click Save and restart Cursor ## Windsurf Setup 1. Open Windsurf and go to the Cascade assistant 2. Click on the hammmer icon (should show MCP on hover) and then on configure 3. Paste the configuration below into the mcp\_configuration.json file ```json { "mcpServers": { "nile-database": { "command": "node", "args": [ "/path/to/your/nile-mcp-server/dist/index.js" ], "env": { "NILE_API_KEY": "your_api_key_here", "NILE_WORKSPACE_SLUG": "your_workspace_slug" } } } } ``` 4. Replace: * `/path/to/your/nile-mcp-server` with your actual project path * `your_api_key_here` with your Nile API key * `your_workspace_slug` with your workspace slug 5. On saving, Windsurf will ask for confirmation. Once confirmed, it should show nile-database MCP active with a green active status ## Cline Setup 1. Click on MCP Servers in the Cline chat window 2. Click on the Installed tab and tap the "Configure MCP Servers" SignOutButton 3. Paste the configuration below into the cline\_mcp\_settings.json file ```json { "mcpServers": { "nile-database": { "command": "node", "args": [ "/path/to/your/nile-mcp-server/dist/index.js" ], "env": { "NILE_API_KEY": "your_api_key_here", "NILE_WORKSPACE_SLUG": "your_workspace_slug" } } } } ``` 4. Replace: * `/path/to/your/nile-mcp-server` with your actual project path * `your_api_key_here` with your Nile API key * `your_workspace_slug` with your workspace slug 5. On saving, Cline will ask for confirmation. Once confirmed, it should show nile-database MCP active with a green active status ## Claude Code Setup 1. Install Claude Code ```bash npm install -g @anthropic-ai/claude-code ``` 2. Add the MCP Server to Claude Code (note that right now Claude Code only supports STDIO mode, not SSE): ```bash claude mcp add nile_local_mcp node /path/to/your/nile-mcp-server/dist/index.js -e NILE_API_KEY=your_api_key_here -e NILE_WORKSPACE_SLUG=your_workspace_slug ``` 3. You should see the following output: ```bash Added stdio MCP server nile_local_mcp with command: node /path/to/your/nile-mcp-server/dist/index.js to project config ``` 4. You can now use the MCP Server in Claude Code: ![Claude Code MCP](https://mintlify.s3.us-west-1.amazonaws.com/nile/images/claude_code_w_nile_mcp.png) ## Example Usage ```bash # Create a database "Create a new database named 'my-app' in AWS_US_WEST_2" # List databases "Show me all my databases" # Execute SQL "Run this query on my-app: SELECT * FROM users LIMIT 5" # Get schema information "Show me the schema for the users table in my-app" ``` ## Troubleshooting ### Common Issues 1. **Connection Issues** * Verify API key and workspace slug * Check server path in configuration * Ensure project is built (`npm run build`) 2. **Permission Issues** * Verify API key permissions * Check workspace access * Ensure correct environment variables 3. **Tool Access Issues** * Restart AI tool (Claude/Cursor) * Check server logs for errors * Verify tool configuration ### Getting Help * Check server logs in the `logs` directory * Review error messages in AI tool * Ensure all environment variables are set * Verify network connectivity # Retrieval Augmented Generation (RAG) Source: https://thenile.dev/docs/ai-embeddings/rag RAG, or retrieval augumented generation is one of the most popular methods of building applications on top of general large language model. It allows using the knowledge and language skills of a world-class pre-trained model, while delivering results specific to a vertical or a company. ## Why RAG? Generative AI models like GPT-3 are powerful, but they are not perfect. They can generate text that is incorrect, biased, or not relevant to the context. After all, they are trained on information from the internet, and people often post wrong, misleading or fake information. But the main limitation is that they do not have access to specific knowledge or data. ChatGPT can write an employee handbook for a company, but it cannot answer questions about the company's specific policies or procedures. There are two popular ways to address this core limitations, and they can be used separately or together. One is to **fine-tune** a model on a specific dataset or task. And the other is **RAG**. For every question asked, retrieve the most relevant information from the preferred sources, and provide it to the generative model as context to use when responding to the question. RAG is generally cheaper, faster and lower-effort than fine-tuning, and it can be used to improve results from any generative model, so it is a popular choice for many applications. ## RAG Architecture From the short description above, you can see that the key to successful RAG is the ability to provide the generative model with the most relevant information. And this requires preparing and storing data in a way that makes it efficient (and even plain possible) to retrieve the most relevant information. Putting together the data preparation, storage, and retrieval methods with the generative model in order to have an informed conversation with a generative model is the essence of retrieval-augmented generation (RAG). The general architecture of RAG-based application is illustrated in the figure below: ![RAG Architecture](https://mintlify.s3.us-west-1.amazonaws.com/nile/images/rag-architecture.webp) As you can see, RAG based applications typically have two independent phases: * **Ingestion phase:** In which they load the source documents, split them into chunks, generate vector embeddings, store them in a database, and optionally generate an index on the stored vectors. * **Conversation phase:** In which they take the user question, generate a vector embedding, search the stored vectors for related chunks, and then send the question together with the related text and perhaps older questions and responses in the conversation and a creative prompt to ChatGPT’s conversation API. When ChatGPT responds, the question and answer are stored and displayed to the user. Lets discuss each of these tasks in more detail. We'll start with the last task, the retrieval of relevant information, since our choice of retrieval method will influence the data preparation and storage methods. ### Retrieval Methods For text retrieval, there are two popular methods, and a hybrid of the two: **Full-text search** Full-text search is the more traditional approach. Text search algorithms rely on word frequency (how often the word appears in each text vs in general) and lexical similarity (The word “ice cream” is similar to “ice” and “cream” but not to “scoop” and “vanilla”). These algorithms can handle typos, synonyms, partial words, and fuzzy matching. **Vector search (also called semantic search)** This method uses AI models, based on the transformer architecture, to find documents that are similar in meaning. These specially trained transformers (called **embedding models** or embedders) convert text to a vector (also called **embedding**). The vector doesn't represent exactly the words in the text but rather the semantic meaning of the text. The embedding model is trained on a giant corpus of texts (such as the entire internet), so it has “knowledge” of which terms are related. Once texts have been converted to vectors, you can check if any two texts are related by checking how close the two vectors are. When checking the similarity of texts using embeddings, it is expected that “ice cream” will be fairly close to “scoop” and “vanilla” since these words often show up next to each other. The popular embedders are based on transformer models, fine tuned for the task of generating embeddings. There are also specialized embedding models, pre-trained or fine-tuned for specific subsets of the language such as code, legalese or medical jargon. Words like "function", "class" and "variable" will be close to each other in the vector space of a code embedding model, but not in a general english-language model. **Hybrid** In the hybrid approach, you use each method separately to find the closest matching text. Then, you combine the distance score from each method (the combination is typically weighted, and the scores have to be normalized first). Then, you re-rank the results based on the combined score and finally choose the highest-ranking documents based on the combined score. There are also a few embedding models (notably, [BGE-M3](https://arxiv.org/pdf/2402.03216)) that combine the two methods in the same model, and use relevancy scores from each model to re-rank the results themselves. ### Storing Documents and Embeddings Regardless of the method you choose, you will need a data store that can efficiently store and retrieve the vectors or the text based on these methods. Some data stores that specialize in one of the two algorithms, and a few offer both. Postgres has built-in full text search and extensions for both vector and text search. [Pg\_vector](https://www.thenile.dev/docs/ai-embeddings/pg_vector), which is built into Nile, has an efficient implementation of vector *distance metrics*, so you can order results by distance (also called ranking) or efficiently find vectors within a certain distance of another vector. There are many distance metrics, and different databases support different ones, the most common ones are: **Cosine distance**, the most popular distance metric for text search, and **dot product** which is more efficient but only works for normalized vectors. In addition, vector databases have indexes that can make searching large collections of vectors even more efficient. These indexes are different from those you are familiar with because they are based on machine learning algorithms. These algorithms find close vectors very efficiently, even with millions of vectors. But they have accuracy and memory use tradeoffs. Because of the accuracy tradeoffs (also known as **recall tradeoffs**, since it looks like the database “forgot” some of the data), the algorithms used in the indexes are called **ANN - approximate nearest neighbors** (as opposed to **KNN** - the accurate results you get by fully scanning the vector collection). Because these indexes use machine learning techniques, creating them is often called training. Just don't get confused: this type of training fundamentally differs from the pre-training or fine-tuning of large language models. You can read more about the [pg\_vector extension](https://www.thenile.dev/docs/ai-embeddings/pg_vector) and how to use it in the Nile documentation. ### Preparing documents for storage and embedding Before you can generate embeddings or create text search indexes for documents, you need to prepare them. The preparation steps depend on the document type and the retrieval method you choose. For example, if you use PDFs or HTML documents, you will need to extract the text from them. If you used scanned documents, you will need to use OCR to extract the text. If you use full-text search you will need to tokenize the text, remove stop words and possibly chunk it into smaller portions. If you use embeddings, you will need to chunk the text, possibly normalize it and generate embeddings for each chunk. The specific preparation steps you'll need also depend on the search or embedding model you use - size limits on the input, whether you need to pad short documents, how it handles common words, etc. Check the model documentation for the specific requirements. **Chunking** is the most common step in preparing documents. It is almost mandatory for two reasons: * Embedding models have a limit on input text size, although this is improving and popular models can often accept 4K or even 8K tokens (words or subwords). * Providing generative models with smaller chunks that have more closely related information usually works better than providing the model with larger, somewhat relevant chunks. [AI results degrade with large context windows](https://arxiv.org/abs/2307.03172). ## Simple RAG Example with OpenAI and Nile In this example, we will use OpenAI's GPT-3.5 model and Nile's pg\_vector extension to demonstrate how RAG works. We will embed a few documents, and then search for the most relevant document to a user's question: ```javascript const { OpenAIEmbeddings } = await import("@langchain/openai"); const { Nile } = await import("@niledatabase/server"); const EMBEDDING_MODEL = "text-embedding-3-large"; const OPEN_API_KEY = "bring your own key"; const NILEDB_USER = "we use nile as the vector store, so we need a user"; const NILEDB_PASSWORD = "and their password"; let model = new OpenAIEmbeddings({ apiKey: OPEN_API_KEY, model: EMBEDDING_MODEL, dimensions: 1024, // we'll explain why in the next blog }); let nile = Nile({ user: NILEDB_USER, password: NILEDB_PASSWORD, }); // some documents: const documents = [ "JavaScript is a programming language commonly used in web development.", "Node.js is a runtime environment that allows you to run JavaScript on the server side.", "React is a JavaScript library for building user interfaces.", ]; // embed documents let vectors = await model.embedDocuments(documents); // store embeddings await nile.db.query( "CREATE TABLE IF NOT EXISTS embeddings (id integer, embedding vector(1024));" ); for (const [i, vec] of vectors.entries()) { await nile.db.query( "INSERT INTO embeddings (id, embedding) values ($1, $2)", [i, JSON.stringify(vec.map((v) => Number(v)))] ); } // now lets ask a question let question_vec = await model.embedDocuments(["Tell me about React"]); // search for the nearest document by cosine distance of embedding let answer_vec = await nile.db.query( "select id from embeddings order by embedding<=>$1 limit 1", [JSON.stringify(question_vec[0])] ); // return the answer: console.log( "based on your question, this document is relevant: " + documents[answer_vec.rows[0].id] ); ``` As you can see in the example, each step in RAG is simple and can be implemented in a single line of code - embedding, storing, and searching. But the result is powerful and allows you to build AI-native SaaS applications that can answer questions about specific topics with greater accuracy, like a human expert would. ## RAG best practices ## Use cases for RAG RAG performs best when: * You have more source material than can fit in the generative model's context window. * You have a large number of questions that can be answered by the source material. So you can embed and index the source material once and then use it to answer many questions. * The source material is structured or can be chunked into smaller parts that are relevant to the questions. Here are a few examples of use cases where RAG can be very useful: * **Customer Support Automation:** RAG can be used to automate customer support by providing answers to common questions. The source material can be product documentation and previous support tickets. * **Sales Enablement:** RAG can be used to generate custom responses based on a vast repository of sales documents, case studies, and product specifications. This allows sales representatives to quickly address potential customer questions and objections * **Knowledge Management:** RAG enhance the company's internal knowledge base with a bot that can generate answers by pulling information from internal documents, wikis, and databases. This helps employees find the information they need quickly. * **Legal Research:** RAG can be used to quickly find relevant legal documents, case law, and statutes to answer legal questions. * **Medical Diagnosis:** RAG can be used to provide doctors with relevant medical information, research papers, and case studies to help diagnose patients. * **Code Generation:** RAG can be used to generate code snippets based on a repository of code examples and documentation. * **Content Creation:** RAG can be used to generate content based on a repository of articles, blog posts, and other written material. * **Product Development:** Use RAG to retrieve relevant information from past project documentation, market research reports, and customer feedback. ## Use cases where RAG is not the best choice RAG is a powerful tool, but it is not always the best choice. There are two main cases where RAG is not the best choice: **When the source material is small and will be used once:** For example, if you need the model to format few paragraphs of text in a table or diagram. Or you want to turn a short blog into a series of tweets. It will be faster and just as accurate to simply provide the text to the model. **Summarization:** If you need to summarize a large document, RAG is not the best choice. The retrieval methods will not be able to find relevant chunks with a question like "summarize this document". And the summary generated by the model will be based on a few random paragraphs from the text. There are summarization models that were specifically trained for this task. # When to use RAG and when to fine-tune a model Source: https://thenile.dev/docs/ai-embeddings/rag_vs_finetune Existing LLM models like GPT-4, Claude, Llama, Gemini and others are trained on a diverse range of data and can generate high-quality responses for a wide range of tasks. However, they may not be the best choice for tasks that require specialized knowledge, domain-specific information, or a type of problem that requires a different approach. In such cases, you have two options: use a model that is fine-tuned on a specific dataset or use a model that can retrieve information from a knowledge base. Comparing RAG (Retrieval Augumented Generation) to fine-tuning is a bit like comparing apples to turkeys. Yes, they are both foods and can be used when you are hungry. But they are different in many ways - how they are grown, packages, cooked, served, and eaten. And also in their health benefits. With that in mind, we'll compare RAG and fine-tuning in terms of when to use them, their benefits, and challenges. While highlighting the fundamental differences between the two approaches. ## When to use RAG **RAG is a form of prompt engineering.** It is a collection of techniques in which applications retrieve relevant documents and then include them in the prompt to the generative model. The goal of RAG is to provide the model with relevant information at the time a question is asked. This allows using LLM with information that was not available during training (because it is private or new), reminds it of relevant facts and reduces hallucinations. ### Example problems where RAG is a good fit: * **Customer support:** Automate customer support by providing answers to common questions. The source material can be product documentation and previous support tickets. * **Sales enablement:** Generate custom responses based on a vast repository of sales documents, case studies, and product specifications. * **Legal:** Quickly find relevant legal documents, case law, and statutes to answer legal questions. ### Challenges with RAG RAG's effectiveness depends on the quality of the knowledge base and the retrieval strategy. In many cases, the knowledge base does not exist and has to be created by painstakingly collecting and structuring data from various sources across a company. Getting high quality data in a place where it can be easily retrieved isn't a new challenge - it existed, under different names and disciplines for decades. Many companies have entire data engineering or data platform teams dedicated to this task. The fact that we have LLMs now helps, but it's not a silver bullet. The main areas of challenge are: * Preparing the data for storage and retrieval: Cleaning and structuring the data, chunking large texts, enriching it with additional information, indexing it for fast retrieval, etc. * Retrieval strategy: How to retrieve the most relevant documents for a given prompt. Techniques include semantic similarity using vector embedings, keyword matching, use of relation graphs, and combinations of these. And of course each technique has variety of algorithms and models with different strengths and different usage patterns. ## When to fine-tune a model **Fine-tuning a model is a form of training.** It is the process of taking a pre-trained model and training it further on a specific dataset. The goal of the training isn't to get the model to memorize new data, but rather to learn how to generalize better on a specific task. Because of the generalization process, a fine-tuned model can perform well on a new task when there isn't a lot of data available or when the relevant data is challenging to retrieve. ### Example problems where fine-tuning is a good fit: * **Code generation**: If a model wasn't trained on a specific language or framework, it won't perform well in that language, even if you provide examples using RAG. Code generation doesn't generalize well from a small number of examples, even when they are relevant. * **Asking customers clarifying questions**. Customers rarely share all available information when they open a support ticket. But most language models are "reluctant" to ask clarifying questions and weren't trained to do so. RAG isn't useful here because the model isn't missing the data, it is missing a behavior. A large set of example scenarios and which questions are appropriate can train the model with this new behavior. * **Writing in the style of a specific person**. With enough examples of a specific writing style, a model will generalize and learn to write in that style. While you could provide examples using RAG, a few examples in the prompt can be costly and hard to generalize from. * **Generating audio in a specific voice**. An audio generating model can be trained to mimic specific voices. ### Challenges with fine-tuning Fine-tuning is a form of training, and most of the challenges result from the training process. * Training requires a large and high quality data set that was prepared specifically for traning the model - separated into "good" and "bad" examples, and covering many variations of the type of tasks the model will perform. Large number of examples and high variaty is important for generalization (to avoid overfitting). Creating these data sets is time consuming and expensive. For some use-cases there are public datasets available, which helps. Also worth mentioning that some companies use AI models to generate synthetic data for training. This is controversial, since it can lead to degredation of the model's performance on real data, however using AI to add variations to real data can be a good way to create a larger data set and avoid overfitting. * Training requires a lot of compute power and time. GPUs are expensive and the latest models are hard to acquire at any price. * Training require not just GPUs, but also use of machine learning tools, languages and libraries. And expertise in using these. * Training requires expertise beyond just the tools. There are many techniques for training and many hyperparameters to tune. Some techniques are specifically designed for efficiency and can be used even with a single consumer-grade GPU. These are known as PEFT (Parameter Efficient Model Fine-Tuning), and the most famous of these is called [LORA](https://arxiv.org/pdf/2106.09685). PEFT only trains a small number of parameters, and if the task requires deeper training, less efficient techniques like full or partial parameter fine-tuning are used. * Training requires a good process. Knowing how to train a model, how to evaluate it, managing various versions, etc. * Training can go very wrong very fast. There are a lot of cases where a model performs significantly worse after fine-tuning, and the training must be restarted from an older version of the model. This behavior is sometimes called "catasrophic forgetting" and is a well-known problem in machine learning, and this is just one example of how models can quickly and unexpectedly degrade in performance. * To use a fine-tuned model, you need to deploy it (although some vendors, like OpenAI, allow you to fine-tune on their platform). This is a whole other set of challenges, including how to serve the model, how to monitor it, how to update it. All this, on top of the GPU costs of serving the model. ## Conclusion In many cases, choosing between RAG and fine-tuning is trivial. Some problems clearly require just relevant data or clearly require training in new tasks. However, there are cases where the choice isn't obvious. For example, if you look at online resources for generating SQL from text, you'll see many who advocate for each one of these approaches. Generally speaking, RAG is simpler to implement and maintain. Many software engineers already have the skills required or can learn them quickly with some experimentation. If you get the results you need with RAG, you should use it. If you don't, or if you have the expertise and resources to fine-tune a model, you should consider it. This investment in fine-tuning could become your competitive edge, as many companies avoid fine-tuning or lack the skills to do it well. Note that there are also quite a few cases where you could combine both approaches. For example, in customer support scenario you can use RAG to retrieve relevant documents and use a model that was fine-tuned to ask customers clarifying questions. Or you can fine-tune a model to generate embeddings for your specific domain, and then use this model in RAG architecture. In either case, there will be a lot of iterations and experimentation with various models, data sets, and techniques. # Streaming model responses to the client in real-time Source: https://thenile.dev/docs/ai-embeddings/streaming Some generative models offer streaming responses. This greatly improves the user experience by providing ongoing interaction as the model generates the response. ## Getting streaming response from a model Getting a streaming response from a model is as simple as setting the `stream` parameter to `true` in the model request. This tells the model to start streaming the response as soon as it starts generating the response. Here is an example that uses OpenAI: ```javascript import OpenAI from "openai"; const openai = new OpenAI(); async function main() { const stream = await openai.chat.completions.create({ model: "gpt-3.5-turbo", messages: [{ role: "user", content: "Say this is a test" }], stream: true, }); for await (const chunk of stream) { process.stdout.write(chunk.choices[0]?.delta?.content || ""); } } main(); ``` As you can see, it is straightforward to receive a streaming response from a model. The response is streamed in chunks, and you can process each chunk as it arrives using a simple iterator. In this case, we printed each chunk to the console. ## Streaming the response from the web framework to the client Most of the time, however, you would want to stream the response to a client. This means converting the iterator to a ReadableStream (Javascript's Stream API interface), and then sending the stream to the client via Javascript's Response object. The conversion can be done with a simple function like this: ```javascript function iteratorToStream(iterator: any) { return new ReadableStream({ async pull(controller) { const { value, done } = await iterator.next(); if (done) { controller.close(); } else { controller.enqueue(value.content); } }, }); } ``` Then you can use this function to send the model response to the client like this: ```javascript const stream = await openai.chat.completions.create({ model: "gpt-3.5-turbo", messages: [{ role: "user", content: "Say this is a test" }], stream: true, }); const stream = iteratorToStream(respStream); return new Response(stream); ``` ## Reading the streaming response in the client The last step is for the client to display the response as it arrives. This involves: * Calling the backend API that streams the response * Reading the stream in chunks * Calling a hook to update the UI with the new chunk This is how you can read a streaming response in the client and call a hook to update the UI: ```javascript const resp = await fetch("/api/ask-question", { body: JSON.stringify({ question: input, tenant_id: tenantid }), }); // Reader to process the streamed response if (resp.body) { const reader = resp.body.getReader(); const decoder = new TextDecoder(); let done = false; // we need to read the stream until it's done while (!done) { const { value, done: doneReading } = await reader.read(); done = doneReading; const chunkValue = decoder.decode(value); // dispatch function updates the UI with the new chunk dispatch({ type: "updateAnswer", text: chunkValue }); if (done) { dispatch({ type: "done", text: "" }); } } } ``` ## Updating the UI with the new chunk When using React, you can update the UI as each chunk arrives by updating the state with the new chunk. This will cause the component to re-render and display the new chunk. In this example, we are using a reducer to update the state with the new chunk. The reducer is a function that takes the current state and an action, and returns the new state. The reducer here has three actions: `addQuestion`, `updateAnswer`, and `done`. The `addQuestion` action adds a new question to the conversation, the `updateAnswer` action updates the answer with the new chunk, and the `done` action marks the end of the last answer. Note that we are creating a new array with the new chunk and updating the state with this new array, if you reuse the existing array, React will not re-render the component. ```javascript type AppActions = { type: string, text: string, }; interface AppState { messages: MessageType[] | []; } function reducer(state: AppState, action: AppActions): AppState { switch (action.type) { case "addQuestion": return { ...state, messages: [ ...state.messages, { type: "question", text: action.text }, { type: "answer", text: "" }, ], }; case "updateAnswer": const conversationListCopy = [...state.messages]; const lastIndex = conversationListCopy.length - 1; conversationListCopy[lastIndex] = { ...conversationListCopy[lastIndex], text: conversationListCopy[lastIndex].text + action.text, }; return { ...state, messages: conversationListCopy, }; case "done": return { ...state, }; default: return state; } } ``` Now we need to tie the reducer to the component. We can do this using the `useReducer` hook. The `useReducer` hook takes the reducer function and the initial state, and returns the current state and a dispatch function. The hook has two parts - the state and the dispatch function. The state is the current state of the component, and we use this to show the current conversation. The dispatch function is used to send actions to the reducer, which updates the state. As you saw in an earlier snippet, we call `dispatch` with the action when we read a new chunk from the stream. The snippet below shows how to use the `useReducer` hook to update the state with the new chunk, and how to use the state in the component to display the conversation. ```javascript const Chatbox: React.FC = () => { const [state, dispatch] = useReducer(reducer, { messages: [] }); // ... lots of UI code here.... {state.messages.map((msg, index) => ( { // display the message, this will get updated as the response streams msg.text } ))} ; }; ``` ## Structured streaming responses If you are used to writing web applications, you are probably used to sending structured responses like JSON. JSON allows you to package multiple pieces of information in a single response, and easily handle all the information in the client. Therefore, it is important to note that JSON and streaming responses don't mix well. The problem is that JSON structure is unparsable until the client recieves the very last `}` of the response. Which means that the client will not be able to parse the response until the very end, and will not be able to display the response until the very end. Which is the opposite of what we want with streaming responses. To solve this, you need to use a slightly different format. A common way to handle this is to have structured metadata at the beginning of the response, and then stream the content. You also need a unique delimiter to separate the metadata from the content. This way, the client can parse the metadata and start displaying the content as it arrives. Here is an example of how you can structure the response that you send from the backend. First, the `iteratorToStream` function needs to be updated to send the metadata. We'll modify it to accept a metadata string (we'll `stringify` the JSON to generate it), and send the metadata and the delimiter when the stream starts, before any of the iterator chunks are streamed: ```javascript function iteratorToStream(iterator: any, metadata: string) { return new ReadableStream({ start(controller) { controller.enqueue(metadata); // this is our separator. Streaming answer comes next controller.enqueue("EOJSON"); }, async pull(controller) { const { value, done } = await iterator.next(); if (done) { controller.close(); } else { controller.enqueue(value.content); } }, }); } ``` Then, we need to update the way the client parses the response. We need to wait for all the metadata to arrive, parse it, and then start displaying the content: ```javascript if (resp.body) { const reader = resp.body.getReader(); const decoder = new TextDecoder(); let done = false; // accumulate the data as it comes in, so we can parse the metadata let partialData = ""; let recievedMetadata = false; // track if we finished with the JSON yet while (!done) { const { value, done: doneReading } = await reader.read(); done = doneReading; const chunkValue = decoder.decode(value); if (!recievedMetadata) { partialData += chunkValue; // first part of the response is json, second part is the answer const dataParts = partialData.split("EOJSON"); if (dataParts.length > 1) { const metadata = JSON.parse(dataParts[0]); // use the metadata, for example to update the UI with the model name handleMetadata(metadata); // handle the first chunk of the answer dispatch({ type: "updateAnswer", text: dataParts[1] }); recievedFiles = true; partialData = ""; } } else { // handle the rest of the answer dispatch({ type: "updateAnswer", text: chunkValue }); if (done) { dispatch({ type: "done", text: "" }); } } } } ``` # DiskANN Source: https://thenile.dev/docs/ai-embeddings/vectors/diskann DiskANN index support for PostgreSQL with pgvectorscale The `pgvectorscale` extension adds diskANN index support for pgvector. This extension is useful in cases where `pgvector`'s `hnsw` index does not fit into available memory and as a result the ANN search does not perform as expected. ## Key Features * StreamingDiskANN index - disk-backed HNSW variant. * Statistical Binary Quantization (SBQ) * Label-based filtering combined with DiskANN index. ## Example: DiskANN index on shared table To keep the example readable we'll work with **3-dimensional vectors**. Swap `VECTOR(3)` for `VECTOR(768)` or `VECTOR(1536)` in real apps. ```sql -- 1. Shared data table CREATE TABLE document_embedding ( id BIGSERIAL PRIMARY KEY, contents TEXT, metadata JSONB, embedding VECTOR(3) ); -- 2. Seed with tiny sample data INSERT INTO document_embedding (contents, metadata, embedding) VALUES ('T-shirt', '{"category":"apparel"}', '[0.10, 0.20, 0.30]'), ('Sweater', '{"category":"apparel"}', '[0.12, 0.18, 0.33]'), ('Coffee mug', '{"category":"kitchen"}', '[0.90, 0.80, 0.70]'); -- 3. Build a DiskANN index (cosine distance) CREATE INDEX document_embedding_diskann_idx ON document_embedding USING diskann (embedding vector_cosine_ops); -- 4. k-NN query (top-2 similar items) SELECT id, contents, metadata FROM document_embedding ORDER BY embedding <=> '[0.11, 0.21, 0.29]' -- query vector LIMIT 2; ``` You should see the two apparel rows first - a good sanity check that the index works. ## Example: DiskANN index on tenant-aware table ```sql -- 1. Tenant-aware table CREATE TABLE tenant_embedding ( tenant_id UUID NOT NULL, doc_id BIGINT, embedding VECTOR(2), -- using tiny 2‑dim vectors for demo metadata JSONB, PRIMARY KEY (tenant_id, doc_id) ); -- 2. Create some tenants INSERT INTO tenants (id, name) VALUES ('11111111-1111-1111-1111-111111111111', 'Tenant A'); INSERT INTO tenants (id, name) VALUES ('22222222-2222-2222-2222-222222222222', 'Tenant B'); -- 3. Seed soome data INSERT INTO tenant_embedding (tenant_id, doc_id, embedding, metadata) VALUES ('11111111-1111-1111-1111-111111111111', 1, '[0.05, 0.95]', '{"title":"DocA"}'), ('11111111-1111-1111-1111-111111111111', 2, '[0.04, 0.90]', '{"title":"DocB"}'); INSERT INTO tenant_embedding (tenant_id, doc_id, embedding, metadata) VALUES ('22222222-2222-2222-2222-222222222222', 1, '[0.80, 0.20]', '{"title":"DocC"}'); -- 3. Create an index (Nile will partition by tenant_id) CREATE INDEX tenant_embedding_diskann_idx ON tenant_embedding USING diskann (embedding vector_cosine_ops); -- 4. Tenant‑scoped ANN query SET nile.tenant_id = '11111111-1111-1111-1111-111111111111'; SELECT doc_id, metadata FROM tenant_embedding ORDER BY embedding <=> '[0.06, 0.92]' LIMIT 2; ``` ## Example: Label-based filtering Label-based filtering is a technique that allows you to filter the results of an ANN search based on a label while using the DiskANN index. Other filters are supported, but will use pgvector's post-filtering (i.e. after the ANN search). In order to use label based filtering, you need to: * Create a label column in your table. It has to be an array of `smallint`s. Other types will revert to using the post-filtering. * Create a diskann index that uses both the embedding and the label column. * Use the `&&` (array intersection) operator in search queries. * Optional, but recommended: Use a separate table and joins to translate smallint labels to meaningful descriptions. ```sql -- 1. Create a label column CREATE TABLE documents ( id BIGSERIAL PRIMARY KEY, embedding VECTOR(3), labels SMALLINT[] ); -- 2. Create an index on the label column -- Insert a couple of demo rows INSERT INTO documents (embedding, labels) VALUES ('[0.3,0.2,0.1]', ARRAY[1]), -- label 1 = science ('[0.35,0.25,0.05]', ARRAY[1,2]), -- label 2 = business ('[0.9,0.8,0.7]', ARRAY[3]); -- label 3 = art -- 3. Create an index on the label column CREATE INDEX documents_ann_idx ON documents USING diskann (embedding vector_cosine_ops, labels); -- 4. Query with label-based filtering SELECT * FROM documents WHERE labels && ARRAY[1,2] ORDER BY embedding <=> '[0.32,0.18,0.12]' LIMIT 5; -- 5. Optional: Translate labels to descriptions CREATE TABLE labels ( id SMALLINT PRIMARY KEY, description TEXT ); INSERT INTO labels (id, description) VALUES (1, 'Science'), (2, 'Business'), (3, 'Art'); -- 6. Query with label-based filtering and description SELECT d.* FROM documents d WHERE d.labels && ( SELECT array_agg(id) FROM labels WHERE description in ('Science', 'Business') ) ORDER BY d.embedding <=> '[0.32,0.18,0.12]' LIMIT 5; ``` ## Limitations * DiskANN index supports `cosine`, `l2` and `inner_product` distance metrics, not the entire pgvector's set of distance metrics. * Label-based filtering is only supported for `smallint` arrays and the `&&` operator. Other types will revert to using the post-filtering. * DiskANN is best suited for datasets where `hnsw` index would be too large to fit into memory. For smaller datasets, `hnsw` is still a good choice. ## Additional Resources [Pgvectorscale github repository](https://github.com/timescale/pgvectorscale) # Getting started with pgvector Source: https://thenile.dev/docs/ai-embeddings/vectors/pg_vector The **`pgvector`** extension in PostgreSQL is used to efficiently store and query vector data. The **`pgvector`** extension provides PostgreSQL with the ability to store and perform operations on vectors directly within the database. Nile supports **`pgvector`** out of the box on the latest version - `0.8.0`. Pgvector lets you store and query vectors directly within your usual Postgres database - with the rest of your data. This is both convenient and efficient. It supports: * Exact and approximate nearest neighbor search (with optional HNSW and IVFFlat indexes) * Single-precision, half-precision, binary, and sparse vectors * L2 distance, inner product, cosine distance, L1 distance, Hamming distance, and Jaccard distance * Any language with a Postgres client Plus ACID compliance, point-in-time recovery, JOINs, and all of the other great features of Postgres ## Create tenant table with vector type Vector types work like any other standard types. You can make them the type of a column in a tenant table and Nile will take care of isolating the embeddings per tenant. ```sql -- creating a table to store wiki documents for a Notion like -- SaaS application with vector dimension of 3 CREATE TABLE wiki_documents( tenant_id uuid, id integer, embedding vector(3) ); ``` ## Store vectors per tenant Once you have the table defined, you would want to populate the embeddings. Typically, this is done by querying a large language model (eg. OpenAI, HuggingFace), retrieving the embeddings and storing them in the vector store. Once stored, the embeddings follow the standard tenant rules. They can be isolated, sharded and placed based on the tenant they belong to. ```sql INSERT INTO wiki_documents (tenant_id,id, embedding) VALUES ('018ade1a-7843-7e60-9686-714bab650998',1, '[1,2,3]'); ``` ## Query vectors Pgvector supports 6 types of vector similarity operators:
Operator Name Description Use Cases
\<-> vector\_l2\_ops L2 distance. Measure of the straight-line distance between two points in a multi-dimensional space. It calculates the length of the shortest path between the points, which corresponds to the hypotenuse of a right triangle. Used in clustering, k-means clustering, and distance-based classification algorithms
\<#> vector\_ip\_ops Inner product. The inner product, also known as the dot product, measures the similarity or alignment between two vectors. It calculates the sum of the products of corresponding elements in the vectors. Used in similarity comparison or feature selection. Note that for normalized vectors, inner product will result in the same ranking as cosine distance, but is more efficient to calculate. So this is a good choice if you use an embedding algorith that produces normalized vectors (such as OpenAI's){" "}
\<=> vector\_cosine\_ops Cosine distance. Cosine distance, often used as cosine similarity when measuring similarity, quantifies the cosine of the angle between two vectors in a multi-dimensional space. It focuses on the direction rather than the magnitude of the vectors. Used in text similarity, recommendation systems, and any context where you want to compare the direction of vectors
\<+> vector\_l1\_ops L1 distance. The L1 distance, also known as the Manhattan distance, measures the distance between two points in a grid-like path (like a city block). It is the distance between two points measured along axes at right angles. Less sensitive to outliers than L2 distance and according to some research, better for high-dimensional data.
\<\~> bit\_hamming\_ops Hamming distance. The Hamming distance measures the number of positions at which the corresponding symbols are different. Used with binary vectors. Mostly for discrete data like categories. Also used for error-correcting codes and data compression.
\<%> bit\_jaccard\_ops Jaccard distance. Measures similarity between sets by calculating the ratio of the intersection to the union of the two sets (how many positions are the same out of the total positions). Used with binary vectors. Useful for comparing customers purchase history, recommendation systems, similarites in terms used in different texts, etc.
You could use any one of them to find the distance between vectors. The choice of operator depends not only on the use-case, but also on the model used to generate the embeddings. For example, OpenAI's models return normalized embeddings, so using inner product or cosine distance will give the same results and inner product is more efficient. However, some models return non-normalized embeddings, so cosine distance should be used. Real world vectors are quite large - 768 to 4096 dimensions are not uncommon. But for the sake of the example, we'll use small vectors: To get the L2 distance ```sql SELECT embedding <-> '[3,1,2]' AS distance FROM wiki_documents; ``` For inner product, multiply by -1 (since `<#>` returns the negative inner product) ```sql SELECT (embedding <#> '[3,1,2]') * -1 AS inner_product FROM wiki_documents; ``` For cosine similarity, use 1 - cosine distance ```sql SELECT 1 - (embedding <=> '[3,1,2]') AS cosine_similarity FROM wiki_documents; ``` ## Vector Indexes `pgvector` supports two types of indexes: * HNSW * IVFFlat Keep in mind that vector indexes are unlike other database indexes. They are used to perform efficient **approximate** nearest neighbor searches. Without vector indexes (also called **flat indexes** by other vector stores), queries sequentially scan through all vectors for the given query, and compute the distance to the query vector. This is computationally expensive, but guarantees to find the nearest neighbors (also called **exact nearest neighbors** or **KNN**). With vector indexes, the query will search a subset of the vectors that is expected to contain the nearest neighbors (but may not contain all of them). This is computationally efficient, but the results are not guaranteed to be the nearest neighbors. When using vector indexes, you can control the trade-off between speed and recall by specifying the index type and parameters. ### HNSW An HNSW index creates a multilayer graph. It has slower build times and uses more memory than IVFFlat, but has better query performance (in terms of speed-recall tradeoff). There’s no training step like IVFFlat, so the index can be created without any data in the table. When creating HNSW, you can specify the maximum number of connections in a layer (`m`) and the number of candidate vectors considered when building the graph (`ef_construction`). More connections and more candidate vectors will improve recall but will increase build time and memory. If you don't specify the parameters, the default values are `m = 16` and `ef_construction = 64`. Add an index for each distance function you want to use. ```sql CREATE INDEX ON wiki_documents USING hnsw (embedding vector_l2_ops); ``` Inner product ```sql CREATE INDEX ON wiki_documents USING hnsw (embedding vector_ip_ops); ``` Cosine distance ```sql CREATE INDEX ON wiki_documents USING hnsw (embedding vector_cosine_ops); ``` Specifying the parameters ```sql CREATE INDEX ON wiki_documents USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 100); ``` While querying, you can specify the size of the candidate list that will be searched (`hnsw_ef`): ```sql SET hnsw_ef = 100; SELECT * FROM wiki_documents ORDER BY embedding <=> '[3,1,2]' LIMIT 10; ``` Vectors with up to 2,000 dimensions can be indexed. ### IVFLAT An IVFFlat index divides vectors into lists, and then searches a subset of those lists that are closest to the query vector. It has faster build times and uses less memory than HNSW, but has lower query performance (in terms of speed-recall tradeoff). Three keys to achieving good recall are: 1. Create the index **after** the table has some data 2. Choose an appropriate number of lists - a good place to start is `rows / 1000` for up to 1M rows and `sqrt(rows)` for over 1M rows. 3. When querying, specify an appropriate number of probes (higher is better for recall, lower is better for speed) - a good place to start is `sqrt(lists)` Add an index for each distance function you want to use. L2 distance ```sql CREATE INDEX ON wiki_documents USING ivfflat (embedding vector_l2_ops) WITH (lists = 100); ``` Inner product ```sql CREATE INDEX ON wiki_documents USING ivfflat (embedding vector_ip_ops) WITH (lists = 100); ``` Cosine distance ```sql CREATE INDEX ON wiki_documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); ``` When querying, you can specify the number of probes (i.e. how many lists to search): ```sql SET ivfflat_probes = 100; SELECT * FROM wiki_documents ORDER BY embedding <=> '[3,1,2]' LIMIT 10; ``` More probes will improve recall but will slow down the query. Vectors with up to 2,000 dimensions can be indexed. ## Filtering Typically, vector search is used to find the nearest neighbors, which means that you would limit the number after ordering by distance: ```sql SELECT * FROM wiki_documents ORDER BY embedding <=> '[3,1,2]' LIMIT 10; ``` It is a good idea to also filter by distance, so you won't include vectors that are too far away, even if they are the nearest neighbors. This will prevent you from getting nonsensical results in cases where there isn't much data in the vector store. ```sql SELECT * FROM wiki_documents WHERE embedding <=> '[3,1,2]' < 0.9 ORDER BY embedding <=> '[3,1,2]' LIMIT 10; ``` Nile will automatically limit the results to only the vectors that belong to the current tenant: ```sql SET nile.tenant_id = '018ade1a-7843-7e60-9686-714bab650998'; SELECT * FROM wiki_documents WHERE embedding <=> '[3,1,2]' < 0.9 ORDER BY embedding <=> '[3,1,2]' LIMIT 10; ``` And you can also filter results by additional criteria (some vector stores call this "metadata filtering" or "post-filtering"): ```sql SELECT * FROM wiki_documents WHERE embedding <=> '[3,1,2]' < 0.9 AND category = 'product' ORDER BY embedding <=> '[3,1,2]' LIMIT 10; ``` It is recommended to add indexes on the columns used for filtering, so the query can be optimized. Especially when the use of the filter can lead to small enough result sets that a sequential scan is faster than use of the vector index. This will optimize not just performance but also recall. With approximate indexes, filtering is applied after the index is scanned. If a condition matches 10% of rows, with HNSW and the default hnsw\.ef\_search of 40, only 4 rows will match on average. Starting in version `0.8.0`, you can enable **iterative index scans**, which will automatically scan more of the index when needed. ### Iterative Index scans Using iterative index scans, Postgres will scan the approximate index for nearest neighbors, apply additional filters and, if the number of neighbors after filtering is insufficient, it will continue scanning until sufficient results are found. Each index has its own configuration (GUC) for iterative scans: `hnsw.iterative_scan` and `ivfflat.iterative_scan`. By default both configurations are set to `off`. HNSW indexes support both relaxed and strict ordering for the iterative scans. Strict order guarantees that the returned results are ordered by exact distance. Relaxed order allows results that are slightly out of order, but provides better recall (i.e. fewer missed results due to the approximate nature of the index). ```sql SET hnsw.iterative_scan = strict_order; -- or SET hnsw.iterative_scan = relaxed_order; ``` IVFFlat indexes only allow relaxed ordering: ```sql SET ivfflat.iterative_scan = relaxed_order; ``` Once you set these configs, you don't need to change your existing queries. You should immediately see the same queries return the correct number of results. However, if you use relaxed ordering, you can re-order the result using materialized CTE: ```sql WITH relaxed_results AS MATERIALIZED ( SELECT id, embedding <-> '[1,2,3]' AS distance FROM items WHERE category_id = 123 ORDER BY distance LIMIT 5 ) SELECT * FROM relaxed_results ORDER BY distance; ``` If you filter by distance (recommended, to avoid nonsense results in case there aren't many similar vectors), it is recommended to use materialized CTE and place the filter outside the CTE in order to avoid overscanning: ```sql WITH nearest_results AS MATERIALIZED ( SELECT id, embedding <=> '[1,2,3]' AS distance FROM items ORDER BY distance LIMIT 5 ) SELECT * FROM nearest_results WHERE distance < 1 ORDER BY distance; ``` Even with iterative scans, pgvector limits the index scan in order to balance time, resource use and recall. Increasing these limits can increase query latency but potentially improve recall: HSNW has a configuration that controls the total number of rows that will be scanned by a query across all iterations (20,000 is the default): ```sql SET hnsw.max_scan_tuples = 20000; ``` IVFFLat lets you configure the maximum number of lists that will be checked for nearest neighbors: ```sql SET ivfflat.max_probes = 100; ``` ## Quantization *Introduced in pgvector 0.7.0* Quantization is a technique of optimizing vector storage and query performance by using fewer bits to store the vectors. By default, pgvector's Vector type is in 32-bit floating point format. The `halfvec` data type uses 16-bit floating point format, which has the following benefits: * Reduced storage requirements (half the memory) * Faster query performance * Reduced index size (both in disk and memory) * Can index vectors with up to 4096 dimensions (which covers the most popular embedding models) To use `halfvec`, you can create a table with the `halfvec` type: ```sql CREATE TABLE wiki_documents( tenant_id uuid, id integer, embedding halfvec(3) -- put the real number of dimensions here ); ``` You can also create quantized indexes on regular sized vectors: ```sql CREATE INDEX ON wiki_documents USING hnsw ((embedding::halfvec(3)) halfvec_l2_ops); ``` In this case, you need to cast the vector to `halfvec` in the query: ```sql SELECT * FROM wiki_documents WHERE embedding::halfvec(3) <=> '[3,1,2]' LIMIT 10; ``` Note that to create an index on `halfvec`, you need to specify the distance function as `halfvec_l2_ops` or `halfvec_cosine_ops`. ## Sparse Vectors *Introduced in pgvector 0.7.0* Sparse vectors are vectors in which the values are mostly zero. These are common in text search algorithms, where each dimension represents a word and the value represents the relative frequency of the word in the document - BM25, for example. Some embedding models, such as BGE-M3, also use sparse vectors. Pgvector supports sparse vector type `sparsevec` and the associated similarity operators. Because sparse vectors can be extremely large but most of the values are zero, pgvector stores them in a compressed format. ```sql CREATE TABLE wiki_documents( tenant_id uuid, id integer, embedding sparsevec(5) -- put the real number of dimensions here ); INSERT INTO wiki_documents (tenant_id, id, embedding) VALUES ('018ade1a-7843-7e60-9686-714bab650998', 1, '{1:1,3:2,5:3}/5'); SELECT * FROM wiki_documents ORDER BY embedding <-> '{1:3,3:1,5:2}/5' LIMIT 5; ``` The format is `{index1:value1,index2:value2,...}/N`, where N is the number of dimensions and the indices start from 1 (like SQL arrays). Because the format is a bit unusual, it is recommended to use [pgvector's libraries for your favorite language](https://github.com/pgvector/pgvector?tab=readme-ov-file#languages) to insert and query sparse vectors. ## Summary You can read more about pgvector on their [github](https://github.com/pgvector/pgvector/blob/master/README.md) If you have any feedback or questions on building AI-native SaaS applications on Nile, please do reach out on our [Github discussion forum](https://github.com/orgs/niledatabase/discussions) or our [Discord community](https://discord.gg/8UuBB84tTy). # Vector Database Source: https://thenile.dev/docs/ai-embeddings/vectors/vector_database ### **What is a Vector Database?** A **vector database** is a specialized type of database designed to store and search **vectors**, which are numerical representations of data. These vectors capture the essential characteristics of complex data like text, images, or audio. For example, in a **natural language processing (NLP)** task, a sentence like "The quick brown fox jumps over the lazy dog" can be converted into a vector, such as `[0.12, -0.34, 0.67, -0.23, ...]`, where each number represents specific features of the sentence, like its semantic meaning. Similarly, in an **image recognition** task, an image of a dog can be represented as a vector, like `[0.45, 0.78, -0.32, 0.67, ...]`, where each value encodes key attributes of the image such as texture, color, or shape. Vector databases are optimized for **similarity searches**, meaning they help you find data points that are "close" to each other in terms of meaning or features, rather than exact matches. For instance, if you're running a **search engine** for articles, and you input the phrase "machine learning applications," the query is converted into a vector. The database then searches for articles whose vectors are similar to the query vector, retrieving content like "AI use cases in healthcare" or "Deep learning in robotics," even if the exact phrase “machine learning applications” isn’t present. Similarly, in an **e-commerce** recommendation system, when a user views a product like a red T-shirt, its vector representation is used to find and recommend similar products, such as other T-shirts with similar color or style. *** ### **Core Concepts and Examples** 1. **Vectors**: A vector is a list of numbers that represents the essential characteristics of data. In AI, vectors are created using models that capture relationships between elements of the data. * **Example in NLP**: A sentence like "The cat sat on the mat" can be converted into a vector by a language model such as BERT or GPT. The vector might look something like `[0.34, 0.67, -0.23, 0.88, ...]`. This vector contains semantic information about the sentence. * **Example in Image Recognition**: An image of a car could be transformed into a vector that represents its visual features, such as color, shape, and texture: `[0.23, -0.12, 0.98, 0.34, ...]`. 2. **Similarity Search**: The main task of a vector database is to find vectors that are "close" to each other in a high-dimensional space. For example, you may want to find images that are visually similar to a query image or articles that are semantically similar to a query sentence. * **Example in Product Recommendations**: If a user looks at a product on an e-commerce website, the system converts the product into a vector. The vector database is then queried to find vectors of similar products, which are displayed as recommendations. * **Example in Large Language Models (LLMs)**: After processing a user query like "What is quantum computing?", the LLM generates a vector representing the query’s meaning. The vector database is queried to find documents or articles with similar meaning, even if the exact words differ. 3. **Nearest Neighbor Search**: Vectors in a vector database are stored in such a way that when you search for one vector, the database can efficiently find the "nearest" vectors, based on a distance metric such as **cosine similarity** or **Euclidean distance**. * **Example in Music Recommendation**: A music streaming app converts songs into vectors based on their audio features (e.g., tempo, rhythm). When a user likes a song, the app searches the vector database to find similar songs. 4. **Indexing in Vector Databases**: To search efficiently in high-dimensional spaces, vector databases use special indexing methods such as **Hierarchical Navigable Small World (HNSW)** or **IVF (Inverted File Index)**. *** ### **How Vector Databases Work with Large Language Models (LLMs)** Large Language Models, such as GPT-4 or BERT, generate vectors (embeddings) from text that represent the semantic meaning of the input. These embeddings can be stored in vector databases for efficient retrieval in tasks like semantic search, question-answering, and more. ### **Example 1: Semantic Search with LLMs** When a user enters a query like "Best way to learn machine learning," an LLM transforms this query into a vector embedding that captures the essence of the question. This vector is then passed to the vector database, which searches for similar vectors (e.g., documents, articles, blog posts) related to machine learning education. * **How it works**: 1. LLM converts the query into a vector. 2. The vector database performs a similarity search to find the closest matching vectors. 3. The system retrieves and returns relevant documents, even if they don’t contain the exact words “best way to learn machine learning” but discuss similar concepts. ### **Example 2: Document Search and Retrieval** Imagine a legal firm storing thousands of legal documents. Each document is processed by a transformer model, generating a vector that represents its content. These vectors are then stored in a vector database. When a user searches for "case law on intellectual property infringement," the query is transformed into a vector, and the database retrieves documents that are semantically similar. * **How it works**: 1. The LLM transforms each document into a vector embedding at the time of ingestion. 2. When the user performs a search, the query is also converted into a vector. 3. The vector database searches for documents with vectors close to the query vector, effectively finding relevant documents. ### **Example 3: Chatbots with LLMs and Vector Databases** In an LLM-based chatbot, user queries can be mapped into vector embeddings, and these embeddings can be stored for future use. If the user asks a similar question later, the chatbot can use the vector database to retrieve past responses based on vector similarity, enabling more coherent and contextually aware responses. * **How it works**: 1. User query gets converted into a vector by the LLM. 2. The chatbot stores this vector in a vector database alongside the generated response. 3. Future similar queries are compared with vectors in the database to retrieve the most relevant responses. *** ### **Use cases for Vector Databases** 1. **Recommendation Engines**: * **Use Case**: A movie streaming platform uses a vector database to store user preferences and movie vectors (e.g., genre, ratings). When a user watches or likes a movie, the system searches for movies with similar vectors. * **Example**: The user likes a drama film with vectors such as `[0.45, -0.32, 0.78, 0.11]`. The vector database returns films with vectors close to this one. 2. **Image Search**: * **Use Case**: A social media platform uses vector databases to store image embeddings. When users upload an image, the platform uses a deep learning model to generate a vector for the image. A similarity search is then performed to find images with similar vectors. * **Example**: A user uploads a picture of a sunset, and the vector database retrieves other sunset images, even though they are different in pixel composition. 3. **Fraud Detection**: * **Use Case**: In financial services, transaction histories are represented as vectors. Fraud detection systems use vector databases to find anomalous behavior by searching for transactions whose vectors deviate significantly from normal patterns. * **Example**: A fraudulent transaction might produce a vector that is distant from typical transaction vectors. The system flags it as suspicious based on vector distance. 4. **Search in Chatbots**: * **Use Case**: LLMs combined with vector databases are used in chatbots to improve user query resolution. When a user asks a question, the chatbot converts the query into a vector and searches a vector database to retrieve the most relevant answer. * **Example**: A user asks, "What are the top programming languages?" The LLM produces a vector for the query and retrieves responses that contain information about programming languages like Python, Java, and C++. *** ### **Challenges of Vector Databases** 1. **Curse of Dimensionality**: As the number of dimensions in vector data increases, it becomes harder to distinguish between similar and dissimilar vectors. This can make searches less effective unless carefully managed. 2. **Approximation vs. Accuracy**: Vector databases often use **approximate nearest-neighbor (ANN)** search techniques to improve performance. This can sacrifice accuracy, especially in highly sensitive applications, in favor of speed. 3. **Memory and Storage**: High-dimensional vector data can take up significant amounts of memory and storage. Managing this efficiently is crucial for scaling vector databases to handle millions or billions of vectors. # Create a database Source: https://thenile.dev/docs/api-reference/databases/create-a-database https://global.thenile.dev/openapi.yaml post /workspaces/{workspaceSlug}/databases Creates a database record in the control plane and triggers creation in the target region. Database names must be less than 64 characters, and unique within a workspace. Names may contain letters (lower or uppercase), numbers, and underscores, and must begin with a letter or underscore. # Create a database credential Source: https://thenile.dev/docs/api-reference/databases/create-a-database-credential https://global.thenile.dev/openapi.yaml post /workspaces/{workspaceSlug}/databases/{databaseName}/credentials Generates a new secure credential for a database. The generated password is returned in the response, and is the only time the password is available in plaintext. No request body is required." # Delete a database Source: https://thenile.dev/docs/api-reference/databases/delete-a-database https://global.thenile.dev/openapi.yaml delete /workspaces/{workspaceSlug}/databases/{databaseName} Marks a database as deleted and queues its data for deletion. The database will no longer appear in your Nile dashboard. You will be able to connect to it and operate on it while it is queued for deletion, but compute costs still apply. # Delete a database credential Source: https://thenile.dev/docs/api-reference/databases/delete-a-database-credential https://global.thenile.dev/openapi.yaml delete /workspaces/{workspaceSlug}/databases/{databaseName}/credentials/{credentialId} Deletes an existing database credential. No request body is required." # Get a database Source: https://thenile.dev/docs/api-reference/databases/get-a-database https://global.thenile.dev/openapi.yaml get /workspaces/{workspaceSlug}/databases/{databaseName} Gets details of a database. # List available regions Source: https://thenile.dev/docs/api-reference/databases/list-available-regions https://global.thenile.dev/openapi.yaml get /workspaces/{workspaceSlug}/regions Retrieve the list of available region identifiers for a workspace. Identifiers are composed of a prefix indicating the underlying cloud provider followed by a region name in that provider. For example `AWS_US_WEST_2` is associated with the `us-west-2` region (Portland, OR) of AWS. # List databases Source: https://thenile.dev/docs/api-reference/databases/list-databases https://global.thenile.dev/openapi.yaml get /workspaces/{workspaceSlug}/databases Lists all of the databases in a workspace. # Lists credentials for a database Source: https://thenile.dev/docs/api-reference/databases/lists-credentials-for-a-database https://global.thenile.dev/openapi.yaml get /workspaces/{workspaceSlug}/databases/{databaseName}/credentials Lists all credentials that exist for a database. The id of the credential is included in the response, but not the password. Passwords are provided only once, at the time the credential is created, and are encrypted for storage. # Deletes a previously issued invite Source: https://thenile.dev/docs/api-reference/developers/deletes-a-previously-issued-invite https://global.thenile.dev/openapi.yaml delete /workspaces/{workspaceSlug}/invites/{inviteId} # Identify developer details Source: https://thenile.dev/docs/api-reference/developers/identify-developer-details https://global.thenile.dev/openapi.yaml get /developers/me Returns information about the developer associated with the token provided, including the workspaces and database ids the developer is authorized to access, and the email address associated with the developer. # Invite a developer to a workspace Source: https://thenile.dev/docs/api-reference/developers/invite-a-developer-to-a-workspace https://global.thenile.dev/openapi.yaml post /workspaces/{workspaceSlug}/invites Sends an email to another developer with a link that allows them to join the workspace. # List developer invites Source: https://thenile.dev/docs/api-reference/developers/list-developer-invites https://global.thenile.dev/openapi.yaml get /workspaces/{workspaceSlug}/invites Lists all developer invites in a workspace # Post oauth2token Source: https://thenile.dev/docs/api-reference/post-oauth2token https://global.thenile.dev/openapi.yaml post /oauth2/token # Create a workspace Source: https://thenile.dev/docs/api-reference/workspaces/create-a-workspace https://global.thenile.dev/openapi.yaml post /workspaces Creates a workspace for the authenticated developer. A workspace slug is generated from the workspace name and used as the workspace identifier. Workspace slugs must be globally unique. Generated slugs will only include ASCII characters. Spaces and non-ASCII characters are replaced with hyphens. # Get workspaces Source: https://thenile.dev/docs/api-reference/workspaces/get-workspaces https://global.thenile.dev/openapi.yaml get /workspaces/{workspaceSlug} Get details about a specific workspace. # Get workspaces compute types Source: https://thenile.dev/docs/api-reference/workspaces/get-workspaces-compute-types https://global.thenile.dev/openapi.yaml get /workspaces/{workspaceSlug}/compute-types List dedicated compute types available for a workspace. # Get workspaces metricscompute Source: https://thenile.dev/docs/api-reference/workspaces/get-workspaces-metricscompute https://global.thenile.dev/openapi.yaml get /workspaces/{workspaceSlug}/metrics/compute List dedicated compute metrics for a workspace. # List developers with access to a workspace Source: https://thenile.dev/docs/api-reference/workspaces/list-developers-with-access-to-a-workspace https://global.thenile.dev/openapi.yaml get /workspaces/{workspaceSlug}/developers List details about the developers authorized to access the workspace. # List workspaces Source: https://thenile.dev/docs/api-reference/workspaces/list-workspaces https://global.thenile.dev/openapi.yaml get /workspaces Lists all workspaces that an authenticated developer is authorized to use. # Comparison Source: https://thenile.dev/docs/auth/comparison There are many auth solutions. A common question is how Nile Auth is different. We have tried to provide our most unbiased opinion about why and when you should consider using Nile Auth. Note that our opinions could be biased, and we encourage our users to do their due diligence. ## Why use Nile Auth over other options? Nile Auth is a 100% open-source auth solution for B2B apps. It supports multi-tenant workflows, drop-in UI modules, stores user data in Nile’s Postgres database, and you get unlimited active users. You can self host it or use the managed cloud solution. ### vs OSS libraries focused on a specific language or framework 1. Auth as a service. It provides centralized control, helps B2B companies roll out security fixes quickly across all their apps, and gives an easy way to audit 2. Routes auto-generate with Nile Auth. You have to write a lot less backend code. 3. Drop-in B2B components, which makes end-to-end integration possible in a few minutes. Most libraries do not provide this. 4. Tightly integrated to Nile Postgres built-in tables. So, no DB setups are required to bootstrap 5. Multi-language support - A nice benefit is getting auth features across services in multiple languages. 6. With the hosted version, we manage the service and help scale to millions of users across the globe with Nile's Postgres. We offer unlimited active users. 7. If using NextAuth, you do not have any B2B feature support. Nile Auth also supports tenant-level overrides. 8. Unified customer dashboard showing both DB and auth usage with management features ### vs closed-source auth service 1. Not open source. 2. You need to sync user data from a third-party service to your DB. Not required with Nile Auth. 3. We are biased, but Nile Auth UX is superior to most proprietary auth solutions. 4. There is no pricing on active users. For example, Auth0 costs \$1700 for 7500 active users. With with Nile Auth, it is \$0 5. Single customer dashboard showing both DB and auth usage with management features. 6. Routes auto-generate with Nile Auth. You have to write a lot less backend code. ### vs OSS auth service 1. Routes auto-generate with Nile Auth. You have to write a lot less backend code. 2. Drop-in B2B components make end-to-end integration possible in a few minutes, which is not provided by most OSS auth as a service solution. 3. Tightly integrated to Nile Postgres built-in tables. So, no DB setups are required to bootstrap 4. Nile Auth hosted solution offers unlimited active users and only prices for the database. 5. Full support for B2B features, which may not be comprehensive or not fully supported in other solutions like tenant-level overrides. 6. Single customer dashboard showing both DB and auth usage with management features. ### vs build your own auth solution 1. Nile Auth will invest more time and people into security. It may not be the best use of time to build your own auth solution. 2. Nile Auth will be patched proactively, and vulnerabilities will be addressed quickly. It could be hard to do this while focusing on building your product. 3. Nile Auth is open source, and data is stored in your database. There is no lock-in, and you can always access your user data. This has the same benefits as building your own auth solution. 4. With the hosted version, we manage the service and help scale to millions of users across the globe with Nile's Postgres. We offer unlimited active users. 5. A lot of effort is needed to support B2B features. You can leverage Nile Auth for this. 6. Unified customer dashboard showing both DB and auth usage with management features 7. You can self-host Nile Auth, which gives you all the benefits without having to write the code. # Customize Source: https://thenile.dev/docs/auth/components/customization Learn how to style and customize the Nile Auth components The Nile Auth components are imported from the `@niledatabase/react` package. They are built with Tailwind CSS and React and are designed to be easily customized. There are a few ways to customize the components, and the best approach depends on how much customization you need. In this document we will cover the different approaches and provide examples of how to customize the components. You will find more details in the documentation for the specific component you are using. ## Nile Auth Default CSS Nile's react package includes a CSS file that you can use to style the components. ```jsx import "@niledatabase/react/styles.css"; ``` This will apply the default styles to all the components and is an easy way to get started if you just want things to look nice but don't need to match a specific design. ## Styling with Component Props Some of Nile Auth components are a single element, such as a button. This includes the ``, `` and all the social login buttons. These components can be customized using the `className` prop. For example, if you want your signout button to have a different color, you can do the following: ```jsx ``` More complex components, such as the ``, ``, `` and `` components, include child components and text. They can still be customized using the `className` prop, to an extent - for example you can add margins to the form by adding a `className` to the component. But it isn't the best approach for achieving a specific design. Some components also have a `buttonText` prop that can be used to change the text of the button. Check the documentation for the specific component for more details. ## Theming with CSS Variables All Nile Auth components use CSS variables for theming. This means that you can override the colors and other styles by setting the CSS variables. We support the same CSS variables that [Shadcn uses](https://ui.shadcn.com/docs/theming#list-of-variables). If you are not using Shadcn, you can still use Tailwind CSS to style the components. This requires two steps: 1. Modify Tailwind config to include Nile Auth components and the theme variables. If you are using a Tailwind config file, add the following to your config: ```js tailwind.config.js [expandable] module.exports = { content: [ "./src/**/*.{js,jsx,ts,tsx}", "./node_modules/@niledatabase/react/dist/**/*.js", ], theme: { extend: { colors: { border: 'hsl(var(--border))', input: 'hsl(var(--input))', ring: 'hsl(var(--ring))', background: 'hsl(var(--background))', foreground: 'hsl(var(--foreground))', primary: { DEFAULT: 'hsl(var(--primary))', foreground: 'hsl(var(--primary-foreground))', }, secondary: { DEFAULT: 'hsl(var(--secondary))', foreground: 'hsl(var(--secondary-foreground))', }, destructive: { DEFAULT: 'hsl(var(--destructive))', foreground: 'hsl(var(--destructive-foreground))', }, muted: { DEFAULT: 'hsl(var(--muted))', foreground: 'hsl(var(--muted-foreground))', }, accent: { DEFAULT: 'hsl(var(--accent))', foreground: 'hsl(var(--accent-foreground))', }, popover: { DEFAULT: 'hsl(var(--popover))', foreground: 'hsl(var(--popover-foreground))', }, card: { DEFAULT: 'hsl(var(--card))', foreground: 'hsl(var(--card-foreground))', }, }, borderRadius: { lg: 'var(--radius)', md: 'calc(var(--radius) - 2px)', sm: 'calc(var(--radius) - 4px)', }, fontFamily: { sans: ['var(--font-sans)', ...fontFamily.sans], roboto: ['Roboto', 'sans-serif'], }, }, }, plugins: [], }; ``` If you are using Tailwind v4 without a config file, add the following to your `globals.css` file: ```css global.css [expandable] @source "../../node_modules/@niledatabase/react/**/*.{html,js,jsx,ts,tsx}"; @theme { --theme-background-color: hsl(var(--background)); --theme-foreground-color: hsl(var(--foreground)); --radius: 0.5rem; --color-border: hsl(var(--border)); --color-input: hsl(var(--input)); --color-ring: hsl(var(--ring)); --color-background: hsl(var(--background)); --color-foreground: hsl(var(--foreground)); --color-primary: hsl(var(--primary)); --color-primary-foreground: hsl(var(--primary-foreground)); --color-secondary: hsl(var(--secondary)); --color-secondary-foreground: hsl(var(--secondary-foreground)); --color-destructive: hsl(var(--destructive)); --color-destructive-foreground: hsl(var(--destructive-foreground)); --color-muted: hsl(var(--muted)); --color-muted-foreground: hsl(var(--muted-foreground)); --color-accent: hsl(var(--accent)); --color-accent-foreground: hsl(var(--accent-foreground)); --color-popover: hsl(var(--popover)); --color-popover-foreground: hsl(var(--popover-foreground)); --color-card: hsl(var(--card)); --color-card-foreground: hsl(var(--card-foreground)); --radius-sm: calc(var(--radius) - 4px); --radius-md: calc(var(--radius) - 2px); --radius-lg: var(--radius); --font-family-poppins: var(--font-poppins); --font-family-inter: var(--font-inter); } ``` 2. Add the theme variables to the `:root` selector in the `globals.css` file. You can use [shadcn theme generator](https://ui.shadcn.com/themes) to get a nice theme for your app. In the example below, I picked a purple theme. ```css [expandable] @layer base { :root { --background: 276 100% 95%; --foreground: 276 5% 10%; --card: 276 50% 90%; --card-foreground: 276 5% 15%; --popover: 276 100% 95%; --popover-foreground: 276 100% 10%; --primary: 276 85% 48%; --primary-foreground: 0 0% 100%; --secondary: 276 30% 70%; --secondary-foreground: 0 0% 0%; --muted: 238 30% 85%; --muted-foreground: 276 5% 35%; --accent: 238 30% 80%; --accent-foreground: 276 5% 15%; --destructive: 0 100% 30%; --destructive-foreground: 276 5% 90%; --border: 276 30% 50%; --input: 276 30% 26%; --ring: 276 85% 48%; --radius: 0.5rem; } .dark { --background: 276 50% 10%; --foreground: 276 5% 90%; --card: 276 50% 10%; --card-foreground: 276 5% 90%; --popover: 276 50% 5%; --popover-foreground: 276 5% 90%; --primary: 276 85% 48%; --primary-foreground: 0 0% 100%; --secondary: 276 30% 20%; --secondary-foreground: 0 0% 100%; --muted: 238 30% 25%; --muted-foreground: 276 5% 60%; --accent: 238 30% 25%; --accent-foreground: 276 5% 90%; --destructive: 0 100% 30%; --destructive-foreground: 276 5% 90%; --border: 276 30% 26%; --input: 276 30% 26%; --ring: 276 85% 48%; --radius: 0.5rem; } } ``` ## Additional customizations If you need more customization, if you are not using Tailwind CSS, or if you have your own UI component library, you can use Nile Auth React hooks with any components you want and any styles you want. For example, to use Nile's SignIn hook with your own button, you can do the following: ```jsx import { useSignIn } from "@niledatabase/react"; const SignInComponent = () => { const signIn = useSignIn({ beforeMutate: (data) => ({ ...data, extraParam: "value" }), onSuccess: () => console.log("Login successful"), onError: (error) => console.error("Login failed", error), callbackUrl: "/dashboard", }); const handleLogin = () => { signIn({ email: "user@example.com", password: "securepassword" }); }; return ; }; ``` This snippet uses the `useSignIn` hook to sign in a user and the `beforeMutate` option to add an extra parameter to the sign in request. Note how customizable the hook is - you can add extra parameters, handle the success and error cases, and redirect the user after login. Individual component pages also include the relevant hooks and code snippets you need to use the component in your own code. # Organization Source: https://thenile.dev/docs/auth/components/organization Learn how to use the Nile Auth Organization component export const component_0 = "" # Tenant Selector The `TenantSelector` component provides an interactive way to manage and switch between tenant organizations within an application. It leverages React Query for fetching tenant data and includes functionality to create new tenants. *** ## Features * **Tenant Selection**: Users can switch between available tenants using a dropdown. * **Tenant Creation**: If no tenants exist, users can create a new organization. * **Persistent Selection**: Selected tenants persist via cookies. * **Server Communication**: Fetches tenant data from an API endpoint. *** ## Installation Ensure you have the necessary dependencies installed: ```sh npm install @niledatabase/react ``` *** ## Usage ```tsx import TenantSelector from '@niledatabase/react'; export default function App() { return ; } ``` *** #### Props | Name | Type | Default | Description | | ----------- | ------------- | ------- | --------------------------------- | | `client` | `QueryClient` | `null` | React Query client instance. | | `baseUrl` | `string` | `''` | Base API URL for tenant requests. | | `className` | `string` | `''` | Additional CSS classes. | *** ### `useTenants` A hook to fetch tenant data. ```ts const { data: tenants, isLoading, refetch } = useTenants(); ``` Returns: * `tenants`: List of available tenants. * `isLoading`: Whether data is loading. * `refetch()`: Function to refresh data. *** ### `useTenantId` A hook for managing the currently selected tenant. ```ts const [tenantId, setTenant] = useTenantId(); ``` Returns: * `tenantId`: The currently selected tenant ID. * `setTenant(id: string)`: Function to change the active tenant. *** ### `CreateTenant` A component for creating a new organization. ```tsx console.log('Created:', tenant)} trigger={} /> ``` #### Props | Name | Type | Description | | ----------- | -------------------------- | --------------------------------- | | `trigger` | `React.ReactElement` | Element to trigger modal. | | `onSuccess` | `(tenant: Tenant) => void` | Callback when tenant is created. | | `onError` | `(error: Error) => void` | Callback when creation fails. | | `fetchUrl` | `string` | API endpoint for tenant creation. | *** ## Behavior 1. **Tenant Selection**: Users can select an existing tenant from a dropdown. 2. **Tenant Creation**: If no tenant exists, users can create one. 3. **Persistent State**: Selected tenant is saved via cookies. 4. **Automatic Data Fetching**: Tenant data is fetched on component mount. *** ## Example UI * If the user has tenants, they can switch between them. * If the user has no tenants, they are prompted to create one. * Clicking "Create Tenant" opens a form to enter an organization name. ## Theming The {component_0} component is styled using tailwind. It can be customized using the `className` and `buttonText` props, or using Tailwind CSS theme variables. You can find more details and full examples in the [customization doc](/auth/components/customization). In addition, the hooks documented for each component can be used if you'd like to use Nile Auth with your own components. If needed, you can peek at the [source code of the component](https://github.com/niledatabase/nile-js/tree/main/packages/react/src) and use it as a template to create your own component for maximum customization. ## Related Topics * [User Management](/auth/concepts/users) * [Tenants](/auth/concepts/tenants) * [User Component](/auth/components/user) # Sign In Source: https://thenile.dev/docs/auth/components/signin Learn how to use the Nile Auth Sign In component export const component_0 = "" Users can sign up and be entered into the database via email + password. A JWT is used (which has a default expiry in 30 days). For Single Sign On, sessions are kept in the database in the `auth.session` table. `@tanstack/react-query` and `react-hook-form` are used to do much of the heavy lifting in the component. The component is styled via tailwind. ## Installation ```bash npm install @niledatabase/react @niledatabase/client ``` ## Email + Password Sign In Uses simple `` and `` fields, along with the `useSignIn()` hook to sign a user in, if they exist. In most cases, you want to supply a `callbackUrl`, which is where the user will be redirected upon successful sign in. Error handling works the same as it does in ``