Skip to content

Fix explanation#47

Merged
Abdoulaye-SAKHO merged 9 commits into
mainfrom
fix_explanation
Jun 15, 2026
Merged

Fix explanation#47
Abdoulaye-SAKHO merged 9 commits into
mainfrom
fix_explanation

Conversation

@Abdoulaye-SAKHO

Copy link
Copy Markdown
Collaborator

No description provided.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the DataFrame creation in woodtapper/example_sampling/base.py by using np.column_stack to combine features and targets instead of concatenating separate DataFrames. Feedback highlights that using np.column_stack forces a homogeneous NumPy dtype, potentially causing unwanted upcasting (e.g., converting integer targets to floats). Additionally, the hardcoded 'target' column name could cause a shape mismatch for multi-output targets. A code suggestion was provided to assign target columns dynamically to preserve dtypes and handle multi-output targets correctly.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread woodtapper/example_sampling/base.py Outdated
Comment on lines 167 to 178
feature_names = [f"feature_{j}" for j in range(self.train_X.shape[1])] + [
"target"
]
for i in range(most_similar_idx.shape[0]):
df_covariates = pd.DataFrame(
self.train_X[most_similar_idx[i]],
columns=[f"feature_{j}" for j in range(self.train_X.shape[1])],
combined = np.column_stack(
(
self.train_X[most_similar_idx[i]],
self.train_y[most_similar_idx[i]],
)
)
df_target = pd.DataFrame(
self.train_y[most_similar_idx[i]], columns=["target"]
)
df = pd.concat([df_covariates, df_target], axis=1)
df = pd.DataFrame(combined, columns=feature_names)
list_.append(df)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using np.column_stack to combine self.train_X and self.train_y before creating the DataFrame forces a single homogeneous NumPy dtype. If self.train_X contains floats and self.train_y contains integers (e.g., class labels) or objects, they will be upcast to a common dtype (such as float64 or object). This results in losing the original dtypes in the returned DataFrame (e.g., integer targets becoming floats).

Additionally, if self.train_y is multi-output (2D), the hardcoded "target" column name in feature_names will cause a shape mismatch error when creating the DataFrame.

We can avoid both issues and improve performance by creating the DataFrame from self.train_X first, and then assigning the target column(s) directly.

Suggested change
feature_names = [f"feature_{j}" for j in range(self.train_X.shape[1])] + [
"target"
]
for i in range(most_similar_idx.shape[0]):
df_covariates = pd.DataFrame(
self.train_X[most_similar_idx[i]],
columns=[f"feature_{j}" for j in range(self.train_X.shape[1])],
combined = np.column_stack(
(
self.train_X[most_similar_idx[i]],
self.train_y[most_similar_idx[i]],
)
)
df_target = pd.DataFrame(
self.train_y[most_similar_idx[i]], columns=["target"]
)
df = pd.concat([df_covariates, df_target], axis=1)
df = pd.DataFrame(combined, columns=feature_names)
list_.append(df)
feature_names = [f"feature_{j}" for j in range(self.train_X.shape[1])]
for i in range(most_similar_idx.shape[0]):
df = pd.DataFrame(self.train_X[most_similar_idx[i]], columns=feature_names)
if self.train_y.ndim == 1:
df["target"] = self.train_y[most_similar_idx[i]]
elif self.train_y.shape[1] == 1:
df["target"] = self.train_y[most_similar_idx[i]][:, 0]
else:
for j in range(self.train_y.shape[1]):
df[f"target_{j}"] = self.train_y[most_similar_idx[i]][:, j]
list_.append(df)

@Abdoulaye-SAKHO Abdoulaye-SAKHO merged commit 5f8ac78 into main Jun 15, 2026
5 checks passed
@Abdoulaye-SAKHO Abdoulaye-SAKHO deleted the fix_explanation branch June 15, 2026 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant