Skip to content

[SPARK-56415][INFRA] Refactor create_spark_jira.py for LLM use and extract shared utilities#55281

Closed
cloud-fan wants to merge 12 commits intoapache:masterfrom
cloud-fan:improve-create-jira-script
Closed

[SPARK-56415][INFRA] Refactor create_spark_jira.py for LLM use and extract shared utilities#55281
cloud-fan wants to merge 12 commits intoapache:masterfrom
cloud-fan:improve-create-jira-script

Conversation

@cloud-fan
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan commented Apr 9, 2026

What changes were proposed in this pull request?

Refactors dev/create_spark_jira.py into a lightweight LLM-friendly script, extracts shared JIRA utilities into dev/spark_jira_utils.py, and preserves the original interactive script as dev/create_jira_and_branch.py.

Changes:

  • dev/spark_jira_utils.py (new): shared module containing get_jira_client(), detect_affected_version(), list_components(), and create_jira_issue().
  • dev/create_spark_jira.py (simplified): stripped down to only create a JIRA ticket and print the key. Made -c (component) required, added --list-components flag, added --parent/--type mutual exclusivity validation, and improved error messages when jira library or JIRA_ACCESS_TOKEN is missing.
  • dev/create_jira_and_branch.py (new file, old behavior): the original script that creates a JIRA ticket, checks out a branch, and creates an initial commit — now importing shared utilities from spark_jira_utils.py.
  • CLAUDE.md: updated instructions for LLM agents to use the simplified script.

Why are the changes needed?

Previously, CLAUDE.md told LLM agents to ask the user to create JIRA tickets manually. The create_spark_jira.py script existed but included interactive prompts and git side effects (branch creation, committing) that made it unsuitable for automated use. This change makes the script LLM-friendly while preserving the original interactive workflow in a separate script, with shared JIRA logic extracted to avoid duplication.

Does this PR introduce any user-facing change?

No. The original interactive workflow is preserved in dev/create_jira_and_branch.py.

How was this patch tested?

#55280 was created with this new prompt.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

- Remove CLI version flag; auto-detect latest unreleased version instead
- Split preflight into check_jira_access() and detect_affected_version()
- Hint in AGENTS.md to review versions after ticket creation

Co-authored-by: Isaac
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add back the old script if people still need it, but with a different name, as create_spark_jira.py should only create ticket.

@cloud-fan
Copy link
Copy Markdown
Contributor Author

cc @dongjoon-hyun @HyukjinKwon

@dongjoon-hyun
Copy link
Copy Markdown
Member

Thank you for pinging me. Let me try today.


parser = argparse.ArgumentParser(description="Create a Spark JIRA issue.")
parser.add_argument("title", nargs="?", help="Title of the JIRA issue")
parser.add_argument("-p", "--parent", help="Parent JIRA ID for subtasks")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm alway using this parent JIRA ID feature. Please recover this, @cloud-fan .

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been considering this improvement. It would be great to preserve all existing features because it's a fallback for non-LLM-installed environment or users, @cloud-fan .

I can add back the old script if people still need it, but with a different name, as create_spark_jira.py should only create ticket.

In addition, it would be great if we can keep the existing file name because create_spark_jira.py is used conventionally by multiple committers across multiple Spark sub-projects already.

It's the same for merge_spark_pr.py:

This PR had better narrow down toward teaching LLMs to skip some existing human-oriented features which is not required for LLM-environment.

@cloud-fan
Copy link
Copy Markdown
Contributor Author

I'm fine to create a new script for this llm use case dedicatedly. But I'm confused about the naming suggestion. This simply creates Spark JIRA ticket but we can't name it create_spark_jira.py? It's used by multiple Spark sub-projects does not mean it's correct.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun the old script is kept with a more accurate name, as it does more than jira creation. dev/spark_jira_utils.py is created to share code between the two scripts.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description also updated.

@cloud-fan cloud-fan changed the title [SPARK-56415][INFRA] Simplify create_spark_jira.py for LLM-driven JIRA ticket creation [SPARK-56415][INFRA] Refactor create_spark_jira.py for LLM use and extract shared utilities Apr 13, 2026
@cloud-fan
Copy link
Copy Markdown
Contributor Author

thanks for the review, merging to master!

@cloud-fan cloud-fan closed this in aff735c Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants