1# Scoring in RediSearch
2
3RediSearch comes with a few very basic scoring functions to evaluate document relevance. They are all based on document scores and term frequency. This is regardless of the ability to use [sortable fields](Sorting.md). Scoring functions are specified by adding the `SCORER {scorer_name}` argument to a search query.
4
5If you prefer a custom scoring function, it is possible to add more functions using the [Extension API](Extensions.md).
6
7These are the pre-bundled scoring functions available in RediSearch and how they work. Each function is mentioned by registered name, that can be passed as a `SCORER` argument in `FT.SEARCH`.
8
9## TFIDF (default)
10
11Basic [TF-IDF scoring](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) with a few extra features thrown inside:
12
131. For each term in each result, we calculate the TF-IDF score of that term to that document. Frequencies are weighted based on field weights that are pre-determined, and each term's frequency is **normalized by the highest term frequency in each document**.
14
152. We multiply the total TF-IDF for the query term by the a priory document score given on `FT.ADD`.
16
173. We give a penalty to each result based on "slop" or cumulative distance between the search terms: exact matches will get no penalty, but matches where the search terms are distant see their score reduced significantly. For each 2-gram of consecutive terms, we find the minimal distance between them. The penalty is the square root of the sum of the distances, squared - `1/sqrt(d(t2-t1)^2 + d(t3-t2)^2 + ...)`.
18
19So for N terms in document D, `T1...Tn`, the resulting score could be described with this python function:
20
21```py
22def get_score(terms, doc):
23    # the sum of tf-idf
24    score = 0
25
26    # the distance penalty for all terms
27    dist_penalty = 0
28
29    for i, term in enumerate(terms):
30        # tf normalized by maximum frequency
31        tf = doc.freq(term) / doc.max_freq
32
33        # idf is global for the index, and not calculated each time in real life
34        idf = log2(1 + total_docs / docs_with_term(term))
35
36        score += tf*idf
37
38        # sum up the distance penalty
39        if i > 0:
40            dist_penalty += min_distance(term, terms[i-1])**2
41
42    # multiply the score by the document score
43    score *= doc.score
44
45    # divide the score by the root of the cumulative distance
46    if len(terms) > 1:
47        score /= sqrt(dist_penalty)
48
49    return score
50```
51
52## TFIDF.DOCNORM
53
54Identical to the default TFIDF scorer, with one important distinction:
55
56Term frequencies are normalized by the length of the document (expressed as the total number of terms). The length is weighted, so that if a document contains two terms, one in a field that has a weight 1 and one in a field with a weight of 5, the total frequency is 6, not 2.
57
58```
59FT.SEARCH myIndex "foo" SCORER TFIDF.DOCNORM
60```
61
62## BM25
63
64A variation on the basic TF-IDF scorer, see [this Wikipedia article for more info](https://en.wikipedia.org/wiki/Okapi_BM25).
65
66We also multiply the relevance score for each document by the a priory document score and apply a penalty based on slop as in TFIDF.
67
68```
69FT.SEARCH myIndex "foo" SCORER BM25
70```
71
72## DISMAX
73
74A simple scorer that sums up the frequencies of the matched terms; in the case of union clauses, it will give the maximum value of those matches. No other penalties or factors are applied.
75
76It is not a 1 to 1 implementation of [Solr's DISMAX algorithm](https://wiki.apache.org/solr/DisMax) but follows it in broad terms.
77
78```
79FT.SEARCH myIndex "foo" SCORER DISMAX
80```
81
82## DOCSCORE
83
84A scoring function that just returns the a priory score of the document without applying any calculations to it. Since document scores can be updated, this can be useful if you'd like to use an external score and nothing further.
85
86```
87FT.SEARCH myIndex "foo" SCORER DOCSCORE
88```
89
90## HAMMING
91
92Scoring by the (inverse) Hamming Distance between the documents' payload and the query payload. Since we are interested in the **nearest** neighbors, we inverse the hamming distance (`1/(1+d)`) so that a distance of 0 gives a perfect score of 1 and is the highest rank.
93
94This works only if:
95
961. The document has a payload.
972. The query has a payload.
983. Both are **exactly the same length**.
99
100Payloads are binary-safe, and having payloads with a length that's a multiple of 64 bits yields slightly faster results.
101
102Example:
103
104```
105127.0.0.1:6379> FT.CREATE idx SCHEMA foo TEXT
106OK
107127.0.0.1:6379> FT.ADD idx 1 1 PAYLOAD "aaaabbbb" FIELDS foo hello
108OK
109127.0.0.1:6379> FT.ADD idx 2 1 PAYLOAD "aaaacccc" FIELDS foo bar
110OK
111
112127.0.0.1:6379> FT.SEARCH idx "*" PAYLOAD "aaaabbbc" SCORER HAMMING WITHSCORES
1131) (integer) 2
1142) "1"
1153) "0.5" // hamming distance of 1 --> 1/(1+1) == 0.5
1164) 1) "foo"
117   2) "hello"
1185) "2"
1196) "0.25" // hamming distance of 3 --> 1/(1+3) == 0.25
1207) 1) "foo"
121   2) "bar"
122```
123