Fix explanation by Abdoulaye-SAKHO · Pull Request #47 · artefactory/woodtapper

Abdoulaye-SAKHO · 2026-06-15T08:22:49Z

No description provided.

gemini-code-assist

Code Review

This pull request refactors the DataFrame creation in woodtapper/example_sampling/base.py by using np.column_stack to combine features and targets instead of concatenating separate DataFrames. Feedback highlights that using np.column_stack forces a homogeneous NumPy dtype, potentially causing unwanted upcasting (e.g., converting integer targets to floats). Additionally, the hardcoded 'target' column name could cause a shape mismatch for multi-output targets. A code suggestion was provided to assign target columns dynamically to preserve dtypes and handle multi-output targets correctly.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-15T08:23:46Z

+            feature_names = [f"feature_{j}" for j in range(self.train_X.shape[1])] + [
+                "target"
+            ]
            for i in range(most_similar_idx.shape[0]):
-                df_covariates = pd.DataFrame(
-                    self.train_X[most_similar_idx[i]],
-                    columns=[f"feature_{j}" for j in range(self.train_X.shape[1])],
+                combined = np.column_stack(
+                    (
+                        self.train_X[most_similar_idx[i]],
+                        self.train_y[most_similar_idx[i]],
+                    )
                )
-                df_target = pd.DataFrame(
-                    self.train_y[most_similar_idx[i]], columns=["target"]
-                )
-                df = pd.concat([df_covariates, df_target], axis=1)
+                df = pd.DataFrame(combined, columns=feature_names)
                list_.append(df)


Using np.column_stack to combine self.train_X and self.train_y before creating the DataFrame forces a single homogeneous NumPy dtype. If self.train_X contains floats and self.train_y contains integers (e.g., class labels) or objects, they will be upcast to a common dtype (such as float64 or object). This results in losing the original dtypes in the returned DataFrame (e.g., integer targets becoming floats).

Additionally, if self.train_y is multi-output (2D), the hardcoded "target" column name in feature_names will cause a shape mismatch error when creating the DataFrame.

We can avoid both issues and improve performance by creating the DataFrame from self.train_X first, and then assigning the target column(s) directly.

Suggested change

feature_names = [f"feature_{j}" for j in range(self.train_X.shape[1])] + [

"target"

]

for i in range(most_similar_idx.shape[0]):

df_covariates = pd.DataFrame(

self.train_X[most_similar_idx[i]],

columns=[f"feature_{j}" for j in range(self.train_X.shape[1])],

combined = np.column_stack(

(

self.train_X[most_similar_idx[i]],

self.train_y[most_similar_idx[i]],

)

)

df_target = pd.DataFrame(

self.train_y[most_similar_idx[i]], columns=["target"]

)

df = pd.concat([df_covariates, df_target], axis=1)

df = pd.DataFrame(combined, columns=feature_names)

list_.append(df)

feature_names = [f"feature_{j}" for j in range(self.train_X.shape[1])]

for i in range(most_similar_idx.shape[0]):

df = pd.DataFrame(self.train_X[most_similar_idx[i]], columns=feature_names)

if self.train_y.ndim == 1:

df["target"] = self.train_y[most_similar_idx[i]]

elif self.train_y.shape[1] == 1:

df["target"] = self.train_y[most_similar_idx[i]][:, 0]

else:

for j in range(self.train_y.shape[1]):

df[f"target_{j}"] = self.train_y[most_similar_idx[i]][:, j]

list_.append(df)

Abdoulaye-SAKHO added 7 commits May 4, 2026 17:57

fix(refs):doi from sklearn

ccc515e

Merge branch 'main' of https://github.com/artefactory/rules

f6a7c07

Merge branch 'main' of https://github.com/artefactory/rules

3f11dc1

Merge branch 'main' of https://github.com/artefactory/rules

706a3d5

Merge branch 'main' of https://github.com/artefactory/rules

17d18ff

Merge branch 'main' of https://github.com/artefactory/rules

841a4b8

fix: faster explanation df option

e16eddf

gemini-code-assist Bot reviewed Jun 15, 2026

View reviewed changes

Abdoulaye-SAKHO added 2 commits June 15, 2026 10:41

feat(exp): add multiclass compatibility

db0d462

feat: add df explanation

57c1c5a

Abdoulaye-SAKHO merged commit 5f8ac78 into main Jun 15, 2026
5 checks passed

Abdoulaye-SAKHO deleted the fix_explanation branch June 15, 2026 08:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix explanation#47

Fix explanation#47
Abdoulaye-SAKHO merged 9 commits into
mainfrom
fix_explanation

Abdoulaye-SAKHO commented Jun 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Abdoulaye-SAKHO commented Jun 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant