Skip to content

fix: sort labels by str() key to handle mixed int/str label types#1872

Open
YousefZahran1 wants to merge 1 commit into
evidentlyai:mainfrom
YousefZahran1:youssef/fix-classification-mixed-label-types
Open

fix: sort labels by str() key to handle mixed int/str label types#1872
YousefZahran1 wants to merge 1 commit into
evidentlyai:mainfrom
YousefZahran1:youssef/fix-classification-mixed-label-types

Conversation

@YousefZahran1
Copy link
Copy Markdown

What

ClassificationQualityMetric and ClassificationConfusionMatrix crash with TypeError: '<' not supported between instances of 'str' and 'int' when classification labels include strings that look like integers (e.g. '101', '102').

Why

Evidently's internal label_key_valudator coerces any label value that can be cast to int — including string labels like '101' — into an actual int. When other labels remain as str (e.g. 'foo', 'bar'), the labels list contains mixed types. Python 3's sorted() cannot compare int and str, so calculate_matrix crashes on line 324.

Reproduce

from evidently.report import Report
from evidently.metrics import ClassificationQualityMetric, ClassificationConfusionMatrix
import pandas as pd

label_target  = ['foo', 'bar', 'fun', 'foo', 'fun', 'foo', '101', '102']
label_predict = ['foo', 'bar', 'fun', 'bar', 'fun', 'fun', '101', '101']
data_df = pd.DataFrame({'target': label_target, 'prediction': label_predict}, dtype='string')

report = Report(metrics=[ClassificationQualityMetric(), ClassificationConfusionMatrix()])
report.run(reference_data=None, current_data=data_df)
# TypeError: '<' not supported between instances of 'str' and 'int'

Fix

Use sorted(labels, key=str) instead of sorted(labels) in calculate_matrix. This sorts labels by their string representation, avoiding the type comparison entirely while preserving a consistent, deterministic order. No API or behaviour change for datasets with homogeneous label types.

Testing

# After fix — report runs without error:
report.run(reference_data=None, current_data=data_df)
print("SUCCESS")  # ✓

Fixes #1085

When classification labels contain values like '101' or '102', the
internal label_key_valudator coerces them to int while other labels
(e.g. 'foo', 'bar') remain as str. Calling sorted() on this mixed-type
list raises TypeError in Python 3 because int and str are not orderable.

Fix: use sorted(labels, key=str) in calculate_matrix so that labels are
always sorted by their string representation, making the sort stable
regardless of whether individual labels happen to be int or str.

Fixes evidentlyai#1085
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Classification metrics do not support label names containing numbers

1 participant