Similarity utils
Similarity utilities for search and discovery.
compute_bm25_score(query, document, additional_context=None)
Compute BM25 score manually for a query and document, optionally including additional context.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
Query string to search for. |
required | |
document
|
Document string to search in. |
required | |
additional_context
|
Optional additional context string to consider. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
|
BM25 similarity score between query and document (or best match with additional context). |
Source code in blue/utils/similarity_utils.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | |
compute_vector_score(query_vector, doc_vector, normalize_score=True)
Compute semantic similarity between two embedding vectors using cosine similarity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query_vector
|
Union[bytes, ndarray]
|
Query embedding vector as bytes or numpy array. |
required |
doc_vector
|
Union[bytes, ndarray]
|
Document embedding vector as bytes or numpy array. |
required |
normalize_score
|
bool
|
Whether to normalize score to [0,1] range. Defaults to True. |
True
|
Returns:
| Type | Description |
|---|---|
float
|
Similarity score in [0,1] if normalize_score=True, otherwise [-1,1]. |
Source code in blue/utils/similarity_utils.py
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | |
normalize_bm25_scores(scores, method='minmax', max_score=20.0)
Normalize BM25 scores using different methods.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
scores
|
List of BM25 scores to normalize. |
required | |
method
|
Normalization method ('linear', 'log', 'minmax'). Defaults to 'minmax'. |
'minmax'
|
|
max_score
|
Maximum score for linear normalization. Defaults to 20.0. |
20.0
|
Returns:
| Type | Description |
|---|---|
|
List of normalized scores in the same order as input scores. |
Source code in blue/utils/similarity_utils.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | |