1# Extending RediSearch
2
3RediSearch supports an extension mechanism, much like Redis supports modules. The API is very minimal at the moment, and it does not yet support dynamic loading of extensions in run-time. Instead, extensions must be written in C (or a language that has an interface with C) and compiled into dynamic libraries that will be loaded at run-time.
4
5There are two kinds of extension APIs at the moment:
6
71. **Query Expanders**, whose role is to expand query tokens (i.e. stemmers).
82. **Scoring Functions**, whose role is to rank search results in query time.
9
10## Registering and loading extensions
11
12Extensions should be compiled into .so files, and loaded into RediSearch on initialization of the module.
13
14* Compiling
15
16    Extensions should be compiled and linked as dynamic libraries. An example Makefile for an extension [can be found here](https://github.com/RediSearch/RediSearch/blob/master/tests/ctests/ext-example/Makefile).
17
18    That folder also contains an example extension that is used for testing and can be taken as a skeleton for implementing your own extension.
19
20* Loading
21
22    Loading an extension is done by appending `EXTLOAD {path/to/ext.so}` after the `loadmodule` configuration directive when loading RediSearch. For example:
23
24
25    ```sh
26    $ redis-server --loadmodule ./redisearch.so EXTLOAD ./ext/my_extension.so
27    ```
28
29    This causes RediSearch to automatically load the extension and register its expanders and scorers.
30
31
32## Initializing an extension
33
34The entry point of an extension is a function with the signature:
35
36```c
37int RS_ExtensionInit(RSExtensionCtx *ctx);
38```
39
40When loading the extension, RediSearch looks for this function and calls it. This function is responsible for registering and initializing the expanders and scorers.
41
42It should return REDISEARCH_ERR on error or REDISEARCH_OK on success.
43
44### Example init function
45
46```c
47
48#include <redisearch.h> //must be in the include path
49
50int RS_ExtensionInit(RSExtensionCtx *ctx) {
51
52  /* Register  a scoring function with an alias my_scorer and no special private data and free function */
53  if (ctx->RegisterScoringFunction("my_scorer", MyCustomScorer, NULL, NULL) == REDISEARCH_ERR) {
54    return REDISEARCH_ERR;
55  }
56
57  /* Register a query expander  */
58  if (ctx->RegisterQueryExpander("my_expander", MyExpander, NULL, NULL) ==
59      REDISEARCH_ERR) {
60    return REDISEARCH_ERR;
61  }
62
63  return REDISEARCH_OK;
64}
65```
66
67## Calling your custom functions
68
69When performing a query, you can tell RediSearch to use your scorers or expanders by specifying the SCORER or EXPANDER arguments, with the given alias.
70e.g.:
71
72```
73FT.SEARCH my_index "foo bar" EXPANDER my_expander SCORER my_scorer
74```
75
76**NOTE**: Expander and scorer aliases are **case sensitive**.
77
78## The query expander API
79
80At the moment, we only support basic query expansion, one token at a time. An expander can decide to expand any given token with as many tokens it wishes, that will be Union-merged in query time.
81
82The API for an expander is the following:
83
84```c
85#include <redisearch.h> //must be in the include path
86
87void MyQueryExpander(RSQueryExpanderCtx *ctx, RSToken *token) {
88    ...
89}
90```
91
92### RSQueryExpanderCtx
93
94RSQueryExpanderCtx is a context that contains private data of the extension, and a callback method to expand the query. It is defined as:
95
96```c
97typedef struct RSQueryExpanderCtx {
98
99  /* Opaque query object used internally by the engine, and should not be accessed */
100  struct RSQuery *query;
101
102  /* Opaque query node object used internally by the engine, and should not be accessed */
103  struct RSQueryNode **currentNode;
104
105  /* Private data of the extension, set on extension initialization */
106  void *privdata;
107
108  /* The language of the query, defaults to "english" */
109  const char *language;
110
111  /* ExpandToken allows the user to add an expansion of the token in the query, that will be
112   * union-merged with the given token in query time. str is the expanded string, len is its length,
113   * and flags is a 32 bit flag mask that can be used by the extension to set private information on
114   * the token */
115  void (*ExpandToken)(struct RSQueryExpanderCtx *ctx, const char *str, size_t len,
116                      RSTokenFlags flags);
117
118  /* SetPayload allows the query expander to set GLOBAL payload on the query (not unique per token)
119   */
120  void (*SetPayload)(struct RSQueryExpanderCtx *ctx, RSPayload payload);
121
122} RSQueryExpanderCtx;
123```
124
125### RSToken
126
127RSToken represents a single query token to be expanded and is defined as:
128
129
130```c
131/* A token in the query. The expanders receive query tokens and can expand the query with more query
132 * tokens */
133typedef struct {
134  /* The token string - which may or may not be NULL terminated */
135  const char *str;
136  /* The token length */
137  size_t len;
138
139  /* 1 if the token is the result of query expansion */
140  uint8_t expanded:1;
141
142  /* Extension specific token flags that can be examined later by the scoring function */
143  RSTokenFlags flags;
144} RSToken;
145
146```
147
148## The scoring function API
149
150A scoring function receives each document being evaluated by the query, for final ranking.
151It has access to all the query terms that brought up the document,and to metadata about the
152document such as its a-priory score, length, etc.
153
154Since the scoring function is evaluated per each document, potentially millions of times, and since
155redis is single threaded - it is important that it works as fast as possible and be heavily optimized.
156
157A scoring function is applied to each potential result (per document) and is implemented with the following signature:
158
159```c
160double MyScoringFunction(RSScoringFunctionCtx *ctx, RSIndexResult *res,
161                                    RSDocumentMetadata *dmd, double minScore);
162```
163
164RSScoringFunctionCtx is a context that implements some helper methods.
165
166RSIndexResult is the result information - containing the document id, frequency, terms, and offsets.
167
168RSDocumentMetadata is an object holding global information about the document, such as its a-priory score.
169
170minSocre is the minimal score that will yield a result that will be relevant to the search. It can be used to stop processing mid-way of before we even start.
171
172The return value of the function is double representing the final score of the result.
173Returning 0 causes the result to be counted, but if there are results with a score greater than 0, they will appear above it.
174To completely filter out a result and not count it in the totals, the scorer should return the special value `RS_SCORE_FILTEROUT` (which is internally set to negative infinity, or -1/0).
175
176### RSScoringFunctionCtx
177
178This is an object containing the following members:
179
180* **void *privdata**: a pointer to an object set by the extension on initialization time.
181* **RSPayload payload**: A Payload object set either by the query expander or the client.
182* **int GetSlop(RSIndexResult *res)**: A callback method that yields the total minimal distance between the query terms. This can be used to prefer results where the "slop" is smaller and the terms are nearer to each other.
183
184### RSIndexResult
185
186This is an object holding the information about the current result in the index, which is an aggregate of all the terms that resulted in the current document being considered a valid result.
187
188See redisearch.h for details
189
190### RSDocumentMetadata
191
192This is an object describing global information, unrelated to the current query, about the document being evaluated by the scoring function.
193
194
195## Example query expander
196
197This example query expander expands each token with the term foo:
198
199```c
200#include <redisearch.h> //must be in the include path
201
202void DummyExpander(RSQueryExpanderCtx *ctx, RSToken *token) {
203    ctx->ExpandToken(ctx, strdup("foo"), strlen("foo"), 0x1337);
204}
205```
206
207## Example scoring function
208
209This is an actual scoring function, calculating TF-IDF for the document, multiplying that by the document score, and dividing that by the slop:
210
211```c
212#include <redisearch.h> //must be in the include path
213
214double TFIDFScorer(RSScoringFunctionCtx *ctx, RSIndexResult *h, RSDocumentMetadata *dmd,
215                   double minScore) {
216  // no need to evaluate documents with score 0
217  if (dmd->score == 0) return 0;
218
219  // calculate sum(tf-idf) for each term in the result
220  double tfidf = 0;
221  for (int i = 0; i < h->numRecords; i++) {
222    // take the term frequency and multiply by the term IDF, add that to the total
223    tfidf += (float)h->records[i].freq * (h->records[i].term ? h->records[i].term->idf : 0);
224  }
225  // normalize by the maximal frequency of any term in the document
226  tfidf /=  (double)dmd->maxFreq;
227
228  // multiply by the document score (between 0 and 1)
229  tfidf *= dmd->score;
230
231  // no need to factor the slop if tfidf is already below minimal score
232  if (tfidf < minScore) {
233    return 0;
234  }
235
236  // get the slop and divide the result by it, making sure we prefer results with closer terms
237  tfidf /= (double)ctx->GetSlop(h);
238
239  return tfidf;
240}
241```
242