(#Towards 3420) Move OMPParallelTrans to create the data sharing attributes at transformation time by LonelyCat124 · Pull Request #3421 · stfc/PSyclone

LonelyCat124 · 2026-04-30T12:04:00Z

I have a draft implementation of moving OMPParallelTrans to do clause generation (using logic that is still located in the directive, so can be called independently if the transformation isn't used for whatever reason). Reduction logic isn't all available at transformation time, so I've not touched that yet (maybe when Kern <=> Call is done we can revisit that?)

@arporter @sergisiso If either of you could have a look and let me know i fyou think I've overlooked anything with this basic change before I look at some of the other potential improvements we can do as well. I'm going to set ITs going as well to see if this breaks any of the transformation scripts/runtime behaviour of real codes too.

…f at lowering time

codecov · 2026-04-30T12:16:34Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.96%. Comparing base (ca8610a) to head (5e1ceb2).
⚠️ Report is 209 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #3421   +/-   ##
=======================================
  Coverage   99.96%   99.96%           
=======================================
  Files         391      391           
  Lines       54609    54627   +18     
=======================================
+ Hits        54590    54608   +18     
  Misses         19       19

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sergisiso · 2026-05-14T09:50:47Z

@LonelyCat124 I have a couple of questions

the NEMO integration tests passed by diffing with the previous version it seems that is missing the private attribute for loop variable indices:

>     !$omp teams loop collapse(2) default(shared) private(ji,jj)
870c870
<       !$omp teams loop collapse(2) default(shared)
---
>       !$omp teams loop collapse(2) default(shared) private(ji,jj)
888c888
<     !$omp teams loop collapse(2) default(shared)
---
>     !$omp teams loop collapse(2) default(shared) private(idx_4,idx_5)
904c904
<     !$omp teams loop collapse(2) default(shared)
---
>     !$omp teams loop collapse(2) default(shared) private(idx_6,idx_7)
922c922
<     !$omp teams loop collapse(2) default(shared)
---
>     !$omp teams loop collapse(2) default(shared) private(ji,jj)
1024c1024
<       !$omp teams loop collapse(2) default(shared)

Is this valid with a default(shared)? can we recover them?

The infer_sharing_attributes is still done in the lowering, could we avoid it? It is the larger contributor to the backend time:

…transformation option

LonelyCat124 · 2026-05-15T12:46:23Z

@sergisiso Ready for another look now.

I think 1) is fixed - I'd missed some cases that I now catch.

I think we can't yet - the reason is we need the need_sync variables in lowering for adding reduction clauses, which we can't do a transformation time due to LFRicKern (or Kern in general) and reproducible reductions I think.

sergisiso · 2026-05-21T14:03:46Z

the reason is we need the need_sync variables in lowering for adding reduction clauses, which we can't do a transformation time due to LFRicKern (or Kern in general) and reproducible reductions I think.

Is this because Kern does not report its used symbols in the reference_accesses that the infer_sharing_attributes uses to do its analysis? That may not be a problem (in the future) as providing them as virtual_children as we do in GOcean Kerns would fix it and is something that we plan to do anyway to make the DUC work with them to replace the old halo_exchange analysis.

But, is there a more fundamental problem that ANY symbol/reference that is created at lowering (if we don't repeat the infer_sharing_attributes) will become shared without synchronisation (the default)? I am not sure how I feel about this yet.

sergisiso · 2026-05-21T15:00:27Z

I think the list of aspects that we need to evaluate are:

Check if it makes the decisions regarding which loops are parallel and which variables are private easier if they are done together instead of in very different places.
Check if not repeating the same analysis in two different places (during transformation and after lowering) also prevents issues caused if we get different results (but we lose the ability to be smart about changes done latter, e.g in posterior transformations or new variables introduced during lowering ).
Check if doing it the associated reference_access analysis once is noticeably faster as it is potentially a expensive analysis (but with the current hoisting is not a dominating factor).

This won't be possible only we really can skip the lowering call we need for the reductions, so I would block this PR until we do the Kern reference_access children.

LonelyCat124 · 2026-05-27T10:35:02Z

the reason is we need the need_sync variables in lowering for adding reduction clauses, which we can't do a transformation time due to LFRicKern (or Kern in general) and reproducible reductions I think.

Is this because Kern does not report its used symbols in the reference_accesses that the infer_sharing_attributes uses to do its analysis? That may not be a problem (in the future) as providing them as virtual_children as we do in GOcean Kerns would fix it and is something that we plan to do anyway to make the DUC work with them to replace the old halo_exchange analysis.

But, is there a more fundamental problem that ANY symbol/reference that is created at lowering (if we don't repeat the infer_sharing_attributes) will become shared without synchronisation (the default)? I am not sure how I feel about this yet.

I'm not sure, I just went with the comment that says:

        # Now finish the LFRic reductions (which need lowering first)
        if reduction_kernels and not reprod_red_call_list:
            self._add_reduction_clauses()

Looking at this again now, I do have a slight concern with reproducible reductions:

        # Reproducible reduction will be done serially by accumulating the
        # partial results in an array indexed by the thread index
        reprod_red_call_list = self.reductions(reprod=True)
        if reprod_red_call_list:
            table = self.ancestor(Routine).symbol_table
            omp_lib = table.find_or_create(
                "omp_lib", symbol_type=ContainerSymbol)
            omp_get_thrd = table.find_or_create_tag(
                "omp_get_thread_num", symbol_type=RoutineSymbol,
                interface=ImportInterface(omp_lib))
            table.find_or_create_tag("omp_num_threads",
                                     root_name="nthreads",
                                     symbol_type=DataSymbol,
                                     datatype=INTEGER_TYPE)
            thread_idx = table.find_or_create_tag(
                "omp_thread_index", root_name="th_idx",
                symbol_type=DataSymbol, datatype=INTEGER_TYPE)
            assignment = Assignment.create(
                lhs=Reference(thread_idx),
                rhs=BinaryOperation.create(
                        BinaryOperation.Operator.ADD,
                        Call.create(omp_get_thrd),
                        Literal("1", INTEGER_TYPE))
            )
            self.dir_body.addchild(assignment, 0)

The omp_thread_index variable is not added to the private clause (if its not already present) despite being clearly thread private (omp_get_thread_num), which works correctly for subclasses of OMPParallel (maybe? But with default perhaps not), since they're also loop directives, but feels wrong on the OMPParallelDirective itself. We don't appear to test this anywhere - have we used reproducible reductions anywhere in production, as I'm concerned about this section of code.

From looking at Kern.reduction_sum_loop in general, it seems questionably defined/implemented for general OMPParallel as it is, but I'd need to test generating some code to be able to correctly see whats going on @sergisiso

sergisiso · 2026-05-28T10:16:49Z

@LonelyCat124 We have an issue to change this implementation, see #3409 , maybe we can do that first, as we need it regardless?

Move OMPParallelTrans to create the data sharing attributes instead o…

ba5b79c

…f at lowering time

LonelyCat124 requested review from arporter and sergisiso April 30, 2026 12:04

LonelyCat124 temporarily deployed to integration April 30, 2026 12:04 — with GitHub Actions Inactive

Update to master

1a198aa

LonelyCat124 added the ready for review label May 1, 2026

sergisiso added reviewed with actions and removed ready for review labels May 14, 2026

sergisiso and others added 3 commits May 14, 2026 15:30

#3420 Add documentation for the maximal_omp_parallel_region_trans sub…

9625dd1

…transformation option

Fixed not calling generate_data_clauses sometimes

b850318

Merge branch 'master' into 3420_clause_gen_transformation

bac067d

LonelyCat124 added 3 commits May 15, 2026 13:49

fix auto merge error

b6642f3

Mistake in transformation

5045292

Previous mistake fix

5e1ceb2

LonelyCat124 added ready for review and removed reviewed with actions labels May 15, 2026

sergisiso added reviewed with actions and removed ready for review labels May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(#Towards 3420) Move OMPParallelTrans to create the data sharing attributes at transformation time#3421

(#Towards 3420) Move OMPParallelTrans to create the data sharing attributes at transformation time#3421
LonelyCat124 wants to merge 8 commits into
masterfrom
3420_clause_gen_transformation

LonelyCat124 commented Apr 30, 2026

Uh oh!

codecov Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

sergisiso commented May 14, 2026

Uh oh!

LonelyCat124 commented May 15, 2026

Uh oh!

sergisiso commented May 21, 2026

Uh oh!

sergisiso commented May 21, 2026

Uh oh!

LonelyCat124 commented May 27, 2026

Uh oh!

sergisiso commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LonelyCat124 commented Apr 30, 2026

Uh oh!

codecov Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sergisiso commented May 14, 2026

Uh oh!

LonelyCat124 commented May 15, 2026

Uh oh!

sergisiso commented May 21, 2026

Uh oh!

sergisiso commented May 21, 2026

Uh oh!

LonelyCat124 commented May 27, 2026

Uh oh!

sergisiso commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Apr 30, 2026 •

edited

Loading