Skip to content

feat: add drop parameter to OneHotEncoder (#913)#928

Open
BALOGUN-DAVID wants to merge 2 commits into
feature-engine:mainfrom
BALOGUN-DAVID:main
Open

feat: add drop parameter to OneHotEncoder (#913)#928
BALOGUN-DAVID wants to merge 2 commits into
feature-engine:mainfrom
BALOGUN-DAVID:main

Conversation

@BALOGUN-DAVID
Copy link
Copy Markdown

Closes #913

Description

This PR adds a drop parameter to the OneHotEncoder to allow users to control which dummy category is dropped when performing k-1 encoding. Previously, only the last category in the unique insertion order could be dropped via drop_last=True.

New drop strategies:

  • 'last': Drops the last category in alphabetical order.
  • 'first': Drops the first category in alphabetical order.
  • 'most_frequent': Drops the most frequent category found during fit(). In the event of a tie, a UserWarning is raised and the first category alphabetically among the ties is dropped.

Backward Compatibility:
The drop_last parameter is fully preserved for backward compatibility. If both drop_last and drop are specified, drop takes precedence and a FutureWarning is emitted.

Checklist

  • Added drop parameter and implementation logic
  • Added FutureWarning when both drop and drop_last are set
  • Updated docstrings for the new parameter
  • Added comprehensive tests for all drop strategies and tie-breaking
  • All tests and estimator checks pass

@BALOGUN-DAVID
Copy link
Copy Markdown
Author

Hi @solegalli, it's me again! 👋

I've pushed the fixes and the PR is now ready for your review.

This adds the drop parameter to OneHotEncoder to allow dropping the 'first', 'last', or 'most_frequent' category, while preserving full backward compatibility for drop_last (as requested in #913). All tests and estimator checks are passing locally.

Let me know if you need any adjustments or if this is good to merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add drop parameter to OneHotEncoder to control which dummy category is dropped

1 participant