1# Extending RediSearch 2 3RediSearch supports an extension mechanism, much like Redis supports modules. The API is very minimal at the moment, and it does not yet support dynamic loading of extensions in run-time. Instead, extensions must be written in C (or a language that has an interface with C) and compiled into dynamic libraries that will be loaded at run-time. 4 5There are two kinds of extension APIs at the moment: 6 71. **Query Expanders**, whose role is to expand query tokens (i.e. stemmers). 82. **Scoring Functions**, whose role is to rank search results in query time. 9 10## Registering and loading extensions 11 12Extensions should be compiled into .so files, and loaded into RediSearch on initialization of the module. 13 14* Compiling 15 16 Extensions should be compiled and linked as dynamic libraries. An example Makefile for an extension [can be found here](https://github.com/RediSearch/RediSearch/blob/master/tests/ctests/ext-example/Makefile). 17 18 That folder also contains an example extension that is used for testing and can be taken as a skeleton for implementing your own extension. 19 20* Loading 21 22 Loading an extension is done by appending `EXTLOAD {path/to/ext.so}` after the `loadmodule` configuration directive when loading RediSearch. For example: 23 24 25 ```sh 26 $ redis-server --loadmodule ./redisearch.so EXTLOAD ./ext/my_extension.so 27 ``` 28 29 This causes RediSearch to automatically load the extension and register its expanders and scorers. 30 31 32## Initializing an extension 33 34The entry point of an extension is a function with the signature: 35 36```c 37int RS_ExtensionInit(RSExtensionCtx *ctx); 38``` 39 40When loading the extension, RediSearch looks for this function and calls it. This function is responsible for registering and initializing the expanders and scorers. 41 42It should return REDISEARCH_ERR on error or REDISEARCH_OK on success. 43 44### Example init function 45 46```c 47 48#include <redisearch.h> //must be in the include path 49 50int RS_ExtensionInit(RSExtensionCtx *ctx) { 51 52 /* Register a scoring function with an alias my_scorer and no special private data and free function */ 53 if (ctx->RegisterScoringFunction("my_scorer", MyCustomScorer, NULL, NULL) == REDISEARCH_ERR) { 54 return REDISEARCH_ERR; 55 } 56 57 /* Register a query expander */ 58 if (ctx->RegisterQueryExpander("my_expander", MyExpander, NULL, NULL) == 59 REDISEARCH_ERR) { 60 return REDISEARCH_ERR; 61 } 62 63 return REDISEARCH_OK; 64} 65``` 66 67## Calling your custom functions 68 69When performing a query, you can tell RediSearch to use your scorers or expanders by specifying the SCORER or EXPANDER arguments, with the given alias. 70e.g.: 71 72``` 73FT.SEARCH my_index "foo bar" EXPANDER my_expander SCORER my_scorer 74``` 75 76**NOTE**: Expander and scorer aliases are **case sensitive**. 77 78## The query expander API 79 80At the moment, we only support basic query expansion, one token at a time. An expander can decide to expand any given token with as many tokens it wishes, that will be Union-merged in query time. 81 82The API for an expander is the following: 83 84```c 85#include <redisearch.h> //must be in the include path 86 87void MyQueryExpander(RSQueryExpanderCtx *ctx, RSToken *token) { 88 ... 89} 90``` 91 92### RSQueryExpanderCtx 93 94RSQueryExpanderCtx is a context that contains private data of the extension, and a callback method to expand the query. It is defined as: 95 96```c 97typedef struct RSQueryExpanderCtx { 98 99 /* Opaque query object used internally by the engine, and should not be accessed */ 100 struct RSQuery *query; 101 102 /* Opaque query node object used internally by the engine, and should not be accessed */ 103 struct RSQueryNode **currentNode; 104 105 /* Private data of the extension, set on extension initialization */ 106 void *privdata; 107 108 /* The language of the query, defaults to "english" */ 109 const char *language; 110 111 /* ExpandToken allows the user to add an expansion of the token in the query, that will be 112 * union-merged with the given token in query time. str is the expanded string, len is its length, 113 * and flags is a 32 bit flag mask that can be used by the extension to set private information on 114 * the token */ 115 void (*ExpandToken)(struct RSQueryExpanderCtx *ctx, const char *str, size_t len, 116 RSTokenFlags flags); 117 118 /* SetPayload allows the query expander to set GLOBAL payload on the query (not unique per token) 119 */ 120 void (*SetPayload)(struct RSQueryExpanderCtx *ctx, RSPayload payload); 121 122} RSQueryExpanderCtx; 123``` 124 125### RSToken 126 127RSToken represents a single query token to be expanded and is defined as: 128 129 130```c 131/* A token in the query. The expanders receive query tokens and can expand the query with more query 132 * tokens */ 133typedef struct { 134 /* The token string - which may or may not be NULL terminated */ 135 const char *str; 136 /* The token length */ 137 size_t len; 138 139 /* 1 if the token is the result of query expansion */ 140 uint8_t expanded:1; 141 142 /* Extension specific token flags that can be examined later by the scoring function */ 143 RSTokenFlags flags; 144} RSToken; 145 146``` 147 148## The scoring function API 149 150A scoring function receives each document being evaluated by the query, for final ranking. 151It has access to all the query terms that brought up the document,and to metadata about the 152document such as its a-priory score, length, etc. 153 154Since the scoring function is evaluated per each document, potentially millions of times, and since 155redis is single threaded - it is important that it works as fast as possible and be heavily optimized. 156 157A scoring function is applied to each potential result (per document) and is implemented with the following signature: 158 159```c 160double MyScoringFunction(RSScoringFunctionCtx *ctx, RSIndexResult *res, 161 RSDocumentMetadata *dmd, double minScore); 162``` 163 164RSScoringFunctionCtx is a context that implements some helper methods. 165 166RSIndexResult is the result information - containing the document id, frequency, terms, and offsets. 167 168RSDocumentMetadata is an object holding global information about the document, such as its a-priory score. 169 170minSocre is the minimal score that will yield a result that will be relevant to the search. It can be used to stop processing mid-way of before we even start. 171 172The return value of the function is double representing the final score of the result. 173Returning 0 causes the result to be counted, but if there are results with a score greater than 0, they will appear above it. 174To completely filter out a result and not count it in the totals, the scorer should return the special value `RS_SCORE_FILTEROUT` (which is internally set to negative infinity, or -1/0). 175 176### RSScoringFunctionCtx 177 178This is an object containing the following members: 179 180* **void *privdata**: a pointer to an object set by the extension on initialization time. 181* **RSPayload payload**: A Payload object set either by the query expander or the client. 182* **int GetSlop(RSIndexResult *res)**: A callback method that yields the total minimal distance between the query terms. This can be used to prefer results where the "slop" is smaller and the terms are nearer to each other. 183 184### RSIndexResult 185 186This is an object holding the information about the current result in the index, which is an aggregate of all the terms that resulted in the current document being considered a valid result. 187 188See redisearch.h for details 189 190### RSDocumentMetadata 191 192This is an object describing global information, unrelated to the current query, about the document being evaluated by the scoring function. 193 194 195## Example query expander 196 197This example query expander expands each token with the term foo: 198 199```c 200#include <redisearch.h> //must be in the include path 201 202void DummyExpander(RSQueryExpanderCtx *ctx, RSToken *token) { 203 ctx->ExpandToken(ctx, strdup("foo"), strlen("foo"), 0x1337); 204} 205``` 206 207## Example scoring function 208 209This is an actual scoring function, calculating TF-IDF for the document, multiplying that by the document score, and dividing that by the slop: 210 211```c 212#include <redisearch.h> //must be in the include path 213 214double TFIDFScorer(RSScoringFunctionCtx *ctx, RSIndexResult *h, RSDocumentMetadata *dmd, 215 double minScore) { 216 // no need to evaluate documents with score 0 217 if (dmd->score == 0) return 0; 218 219 // calculate sum(tf-idf) for each term in the result 220 double tfidf = 0; 221 for (int i = 0; i < h->numRecords; i++) { 222 // take the term frequency and multiply by the term IDF, add that to the total 223 tfidf += (float)h->records[i].freq * (h->records[i].term ? h->records[i].term->idf : 0); 224 } 225 // normalize by the maximal frequency of any term in the document 226 tfidf /= (double)dmd->maxFreq; 227 228 // multiply by the document score (between 0 and 1) 229 tfidf *= dmd->score; 230 231 // no need to factor the slop if tfidf is already below minimal score 232 if (tfidf < minScore) { 233 return 0; 234 } 235 236 // get the slop and divide the result by it, making sure we prefer results with closer terms 237 tfidf /= (double)ctx->GetSlop(h); 238 239 return tfidf; 240} 241``` 242