1# Scoring in RediSearch 2 3RediSearch comes with a few very basic scoring functions to evaluate document relevance. They are all based on document scores and term frequency. This is regardless of the ability to use [sortable fields](Sorting.md). Scoring functions are specified by adding the `SCORER {scorer_name}` argument to a search query. 4 5If you prefer a custom scoring function, it is possible to add more functions using the [Extension API](Extensions.md). 6 7These are the pre-bundled scoring functions available in RediSearch and how they work. Each function is mentioned by registered name, that can be passed as a `SCORER` argument in `FT.SEARCH`. 8 9## TFIDF (default) 10 11Basic [TF-IDF scoring](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) with a few extra features thrown inside: 12 131. For each term in each result, we calculate the TF-IDF score of that term to that document. Frequencies are weighted based on field weights that are pre-determined, and each term's frequency is **normalized by the highest term frequency in each document**. 14 152. We multiply the total TF-IDF for the query term by the a priory document score given on `FT.ADD`. 16 173. We give a penalty to each result based on "slop" or cumulative distance between the search terms: exact matches will get no penalty, but matches where the search terms are distant see their score reduced significantly. For each 2-gram of consecutive terms, we find the minimal distance between them. The penalty is the square root of the sum of the distances, squared - `1/sqrt(d(t2-t1)^2 + d(t3-t2)^2 + ...)`. 18 19So for N terms in document D, `T1...Tn`, the resulting score could be described with this python function: 20 21```py 22def get_score(terms, doc): 23 # the sum of tf-idf 24 score = 0 25 26 # the distance penalty for all terms 27 dist_penalty = 0 28 29 for i, term in enumerate(terms): 30 # tf normalized by maximum frequency 31 tf = doc.freq(term) / doc.max_freq 32 33 # idf is global for the index, and not calculated each time in real life 34 idf = log2(1 + total_docs / docs_with_term(term)) 35 36 score += tf*idf 37 38 # sum up the distance penalty 39 if i > 0: 40 dist_penalty += min_distance(term, terms[i-1])**2 41 42 # multiply the score by the document score 43 score *= doc.score 44 45 # divide the score by the root of the cumulative distance 46 if len(terms) > 1: 47 score /= sqrt(dist_penalty) 48 49 return score 50``` 51 52## TFIDF.DOCNORM 53 54Identical to the default TFIDF scorer, with one important distinction: 55 56Term frequencies are normalized by the length of the document (expressed as the total number of terms). The length is weighted, so that if a document contains two terms, one in a field that has a weight 1 and one in a field with a weight of 5, the total frequency is 6, not 2. 57 58``` 59FT.SEARCH myIndex "foo" SCORER TFIDF.DOCNORM 60``` 61 62## BM25 63 64A variation on the basic TF-IDF scorer, see [this Wikipedia article for more info](https://en.wikipedia.org/wiki/Okapi_BM25). 65 66We also multiply the relevance score for each document by the a priory document score and apply a penalty based on slop as in TFIDF. 67 68``` 69FT.SEARCH myIndex "foo" SCORER BM25 70``` 71 72## DISMAX 73 74A simple scorer that sums up the frequencies of the matched terms; in the case of union clauses, it will give the maximum value of those matches. No other penalties or factors are applied. 75 76It is not a 1 to 1 implementation of [Solr's DISMAX algorithm](https://wiki.apache.org/solr/DisMax) but follows it in broad terms. 77 78``` 79FT.SEARCH myIndex "foo" SCORER DISMAX 80``` 81 82## DOCSCORE 83 84A scoring function that just returns the a priory score of the document without applying any calculations to it. Since document scores can be updated, this can be useful if you'd like to use an external score and nothing further. 85 86``` 87FT.SEARCH myIndex "foo" SCORER DOCSCORE 88``` 89 90## HAMMING 91 92Scoring by the (inverse) Hamming Distance between the documents' payload and the query payload. Since we are interested in the **nearest** neighbors, we inverse the hamming distance (`1/(1+d)`) so that a distance of 0 gives a perfect score of 1 and is the highest rank. 93 94This works only if: 95 961. The document has a payload. 972. The query has a payload. 983. Both are **exactly the same length**. 99 100Payloads are binary-safe, and having payloads with a length that's a multiple of 64 bits yields slightly faster results. 101 102Example: 103 104``` 105127.0.0.1:6379> FT.CREATE idx SCHEMA foo TEXT 106OK 107127.0.0.1:6379> FT.ADD idx 1 1 PAYLOAD "aaaabbbb" FIELDS foo hello 108OK 109127.0.0.1:6379> FT.ADD idx 2 1 PAYLOAD "aaaacccc" FIELDS foo bar 110OK 111 112127.0.0.1:6379> FT.SEARCH idx "*" PAYLOAD "aaaabbbc" SCORER HAMMING WITHSCORES 1131) (integer) 2 1142) "1" 1153) "0.5" // hamming distance of 1 --> 1/(1+1) == 0.5 1164) 1) "foo" 117 2) "hello" 1185) "2" 1196) "0.25" // hamming distance of 3 --> 1/(1+3) == 0.25 1207) 1) "foo" 121 2) "bar" 122``` 123