Sangjune's Dev Blog

Hello All! now we are on the final stage of the project which is project 3. If you remember correctly from my Stage 2 posting, i was able to run it and get PRUNE and NOPRUNE message with my own testing file but didnt get any message on prune testing file that was provided from professor. No prune testing file was working good with professor's testing file tho. But now its all fixed and good to go so lets review what changes that i have gone through.

Upon checking passes.def location was totally fine the problem was in cc file.

There are bunch of changes that i made to my new pass file as following.

- Detects the base function via .resolver suffix.

- Stores:

Number of basic blocks
Number of GIMPLE statements
Types of GIMPLE statements (e.g., GIMPLE_ASSIGN)
Number of operands per statement
RHS opcodes of assignments
Variable reference maps

- For each potential clone, compares structural attributes and variable usage maps.
- If structure and semantics match → PRUNE, else → NOPRUNE.

Now we also see PRUNE sign on prune testing file that was provided.

So lets start our Project stage 3. Briefly exaplain Stage 3 is extending the pass to handle multiple clone groups, and polishing the implementation.## SPO600 Project – Stage 3: Finalizing the Clone Pruning Pass

While the previous version could identify and compare a single cloned function group, the updated version now handles multiple groups across a codebase, with full support for PRUNE/NO PRUNE diagnostics on both x86_64 and aarch64 architectures.

In this post, I’ll walk through the improvements made to the pass, the internal structure updates, the new test cases, and the results I obtained from the test-clone examples.

Since we had an assumption that there would be only one group of cloned functions in the program. For the Stage 3, we need to handle multiple groups of clones within the same program. To do this we used a

std::regex suffix_regex(R"(^(.*?)(?:\.(default|resolver|popcnt|arch.*|sse2|avx|variant\.\d+))?$)");

where the key is the base function name, and the value is a list of fingerprints for all its variants like .default, .popcnt, etc. We grouped the function variants by extracting their base name using regex.

#include <vector>

#include <string>

#include <map>

#include <regex>

#include <cstdlib>

#include "config.h"

#include "system.h"

#include "coretypes.h"

#include "backend.h"

#include "tree.h"

#include "gimple.h"

#include "tree-pass.h"

#include "ssa.h"

#include "tree-pretty-print.h"

#include "gimple-iterator.h"

#include "gimple-walk.h"

#include "internal-fn.h"

#include "gimple-pretty-print.h"

#include "cgraph.h"

#include "gimple-ssa.h"

#include "attribs.h"

#include "pretty-print.h"

#include "tree-inline.h"

#include "intl.h"

#include "function.h"

#include "basic-block.h"

namespace {

const pass_data prune_clones_pass_data = {

GIMPLE_PASS,

"sj",

OPTGROUP_NONE,

TV_NONE,

PROP_cfg,

0, 0, 0, 0,

};

struct FunctionFingerprint {

int bb_count = 0;

int stmt_count = 0;

std::vector<gimple_code> ops;

std::vector<unsigned> op_counts;

std::vector<tree_code> rhs_types;

std::map<int, int> var_refs;

};

static std::map<std::string, std::vector<FunctionFingerprint>> clone_groups;

class tree_sj_pass : public gimple_opt_pass {

public:

tree_sj_pass(gcc::context *ctxt) : gimple_opt_pass(prune_clones_pass_data, ctxt) {}

bool gate(function *) override { return true; }

void set_pass_param(unsigned int, bool) override {}

unsigned int execute(function *) override;

private:

int extract_vars(std::map<tree, int> &, int, gimple *, std::map<int, int> &) const;

};

int tree_sj_pass::extract_vars(std::map<tree, int> &seen, int idx, gimple *stmt, std::map<int, int> &out_map) const {

std::vector<tree> vars;

switch (gimple_code(stmt)) {

case GIMPLE_COND:

if (tree lhs = gimple_cond_lhs(stmt)) vars.push_back(lhs);

if (tree rhs = gimple_cond_rhs(stmt)) vars.push_back(rhs);

break;

case GIMPLE_CALL:

if (tree lhs = gimple_call_lhs(stmt)) vars.push_back(lhs);

break;

case GIMPLE_ASSIGN:

vars.push_back(gimple_assign_lhs(stmt));

for (unsigned j = 1; j < gimple_num_ops(stmt); ++j)

vars.push_back(gimple_op(stmt, j));

break;

case GIMPLE_DEBUG_BIND:

if (tree v = gimple_debug_bind_get_var(stmt)) vars.push_back(v);

if (tree val = gimple_debug_bind_get_value(stmt)) vars.push_back(val);

break;

default:

break;

}

for (tree v : vars) {

auto it = seen.find(v);

if (it != seen.end())

out_map[idx] = it->second;

else {

seen[v] = idx;

out_map[idx] = idx;

}

++idx;

}

return idx;

}

unsigned int tree_sj_pass::execute(function *func) {

if (!dump_file || !func || !func->cfg)

return 0;

auto *node = cgraph_node::get(func->decl);

if (!node) return 0;

std::string name(node->name());

std::smatch match;

if (std::regex_match(name, match, suffix_regex)) {

std::string base = match[1];

// Create/track group for this base function name

clone_groups[base];

fprintf(dump_file, "[FUNC] %s grouped under [%s]\\n", name.c_str(), base.c_str());

}

return 0;

}

} // namespace

gimple_opt_pass *make_pass_sj(gcc::context *ctxt) {

return new tree_sj_pass(ctxt);

}

Above new cc includes following changes

A new struct to hold function finger print infos:

struct FunctionFingerprint {

int bb_count = 0;

int stmt_count = 0;

std::vector<gimple_code> gimple_types;

std::vector<unsigned> operand_counts;

std::vector<tree_code> assign_rhs_types;

std::map<int, int> var_map;

};

A global map inside the anonymous namespace:

static std::map<std::string, std::vector<FunctionFingerprint>> clone_groups;

And in execute function we extracted the base name of each function and group them

Next we are going to add more stuff!
We need to do an Function Fingerprint generation:
For wach functionvariant, we extract a "fingerprint" that uniquely captures its structure. This will include:
- Number of basic blocks

- Number of GIMPLE statements

- Sequence of GIMPLE opcodes

- Count of operands per statement

- RHS operators types

- Variable mapping

These will allow us to compare function variants meaningfully even if variable names are different by structure alone. so In order to achieve this we added followings:

generate_fingerprint() function

FunctionFingerprint generate_fingerprint(function *func) {

FunctionFingerprint fp;

std::map<tree, int> first_seen;

std::map<int, int> var_map;

int index = 0;

basic_block bb;

FOR_EACH_BB_FN(bb, func) {

fp.bb_count++;

for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {

gimple *stmt = gsi_stmt(gsi);

fp.stmt_count++;

fp.ops.push_back(gimple_code(stmt));

fp.op_counts.push_back(gimple_num_ops(stmt));

if (is_gimple_assign(stmt)) {

fp.rhs_types.push_back(gimple_assign_rhs_code(stmt));

}

std::vector<tree> vars;

switch (gimple_code(stmt)) {

case GIMPLE_ASSIGN:

vars.push_back(gimple_assign_lhs(stmt));

for (unsigned j = 1; j < gimple_num_ops(stmt); ++j)

vars.push_back(gimple_op(stmt, j));

break;

case GIMPLE_COND:

if (tree lhs = gimple_cond_lhs(stmt)) vars.push_back(lhs);

if (tree rhs = gimple_cond_rhs(stmt)) vars.push_back(rhs);

break;

case GIMPLE_CALL:

if (tree lhs = gimple_call_lhs(stmt)) vars.push_back(lhs);

break;

case GIMPLE_DEBUG_BIND:

if (tree var = gimple_debug_bind_get_var(stmt)) vars.push_back(var);

if (tree val = gimple_debug_bind_get_value(stmt)) vars.push_back(val);

break;

default:

break;

}

for (tree v : vars) {

if (first_seen.find(v) != first_seen.end()) {

var_map[index] = first_seen[v];

} else {

first_seen[v] = index;

var_map[index] = index;

}

++index;

}

fp.var_refs = std::move(var_map);

return fp;

}

generate_fingerprint() function is a helper function that walks through a function's GIMPLE statements and records key structure inforamtion into FunctionFongerprint struct. We need this to create a consistent way of identifying the internal shape of a function so we can compare two functions even if they use different variable names.

FunctionFingerprint fp = generate_fingerprint(func);

//creates the structural fingerprint of this specific function variant

clone_groups[base].push_back(fp);

We have now grouped all clone variants under a base function name. This code records the fingerprint of the current variant and stores it into the vector clone_groups[base] so we can compare them later.

Again next!

We will now add followings:
- Add a helper function to compare two fingerprints

- Iterate through each clone_group[base] and decide if it's PRUNE or NO PRUNE

- Print the result to the dump files

So what are we adding in our pass this time?

bool tree_sj_pass::fingerprints_match(const FunctionFingerprint &a, const FunctionFingerprint &b) {

return a.bb_count == b.bb_count &&

a.stmt_count == b.stmt_count &&

a.ops == b.ops &&

a.op_counts == b.op_counts &&

a.rhs_types == b.rhs_types &&

a.var_refs == b.var_refs;

}

Adding it to compare each clone to the base function

if (clone_groups[base].size() >= 2) {

bool all_match = true;

const FunctionFingerprint &base_fp = clone_groups[base][0];

for (size_t i = 1; i < clone_groups[base].size(); ++i) {

if (!fingerprints_match(base_fp, clone_groups[base][i])) {

all_match = false;

break;

}

if (dump_file)

fprintf(dump_file, "%s: %s\n", all_match ? "[PRUNE]" : "[NO PRUNE]", base.c_str());

}

We also adding this comparison loop at the bottom of the excute() to make it only compare when we have at least 2 variants and outputs either PRUNE or NO PRUNE per base group.

These will allow us to handle programs with multiple cline groups each with its own PRUNE/ NO PRUNE decision.

Now we are jumping into testing session:
Current Testing file:

// clone-test-core.c

// Chris Tyler 2017.11.29-2024.11.19 - Licensed under GPLv3.

// For the Seneca College SPO600 Course

// This code is designed to be compiled with different values for

// the CLONE_ATTRIBUTE macro for different architectures and for

// both 'prune' (cloned functions are the same after optimization)

// and 'noprune' (cloned functions are different after optimization)

// build cases

// The cloned function is scale_samples

#include <stdlib.h>

#include <stdio.h>

#include <stdint.h>

#include <stdbool.h>

#include "vol.h"

int sum_sample (int16_t *buff, size_t samples) {

int t;

for (int x = 0; x < SAMPLES; x++) {

t += buff[x];

}

return t;

}

CLONE_ATTRIBUTE

void scale_samples(int16_t *in, int16_t *out, int cnt, int volume) {

for (int x = 0; x < cnt; x++) {

out[x] = ((((int32_t) in[x]) * ((int32_t) (32767 * volume / 100) <<1) ) >> 16);

}

//new test case 1: loop unrolled by 4 (structurally different - this should cause NO PRUNE)

CLONE_ATTRIBUTE

void scale_samples_unrolled(int16_t *in, int16_t *out, int cnt, int volume) {

int32_t factor = ((int32_t)(32767 * volume / 100) << 1);

for (int x = 0; x < cnt; x += 4) {

for (int j = 0; j < 4 && x + j < cnt; j++) {

out[x + j] = (((int32_t) in[x + j]) * factor) >> 16;

}

// new test case 2: simple scalar op (should be PRUNE)

CLONE_ATTRIBUTE

void scalar_add(int *a, int *b) {

*a += *b;

}

int main(int argc, char **argv) {

int x;

int ttl=0;

int a = 5;

int b = 10;

// ---- Create in[] and out[] arrays

int16_t* in;

int16_t* out;

in = (int16_t*) calloc(SAMPLES, sizeof(int16_t));

out = (int16_t*) calloc(SAMPLES, sizeof(int16_t));

// ---- Create dummy samples in in[]

vol_createsample(in, SAMPLES);

// ---- This is the part we're interested in!

// ---- Scale the samples from in[], placing results in out[]

scale_samples(in, out, SAMPLES, VOLUME);

scale_samples_unrolled(in, out, SAMPLES, VOLUME);

scalar_add(&a, &b);

// ---- This part sums the samples.

ttl=sum_sample(out, SAMPLES);

// ---- Print the sum of the samples.

printf("Result: %d\n", ttl);

printf("Scalar result: %d\n", a);

free(in);

free(out);

return 0;

The Result:

As you can see from the logs, all the .default, .popcnt, and .resolver suffixes are being detected, which means the execute() function is working correctly and iterating through all available functions.

This also confirms that:

The pass is correctly identifying each cloned function through the call graph.
The logic for grouping clones by their base name (e.g. scale_samples, scalar_add) is functioning as expected.
Each variant is fingerprinted and compared to its baseline variant.
The PRUNE / NO PRUNE logic is being triggered only when at least two variants exist under the same base name.

slee569@x86-001:~/spo600/examples/test-clone$ grep -i "PRUNE" *.sj

clone-test-x86-noprune-clone-test-core.c.264t.sj:[PRUNE]: scalar_add

clone-test-x86-noprune-clone-test-core.c.264t.sj:[PRUNE]: scale_samples_unrolled

clone-test-x86-noprune-clone-test-core.c.264t.sj:[NO PRUNE]: scale_samples

clone-test-x86-prune-clone-test-core.c.264t.sj:[PRUNE]: scalar_add

clone-test-x86-prune-clone-test-core.c.264t.sj:[PRUNE]: scale_samples_unrolled

clone-test-x86-prune-clone-test-core.c.264t.sj:[PRUNE]: scale_samples

from my output this indicates that my pass successfully differentiated between trivial identical functions (PRUNE) and structurally different implementations (NO PRUNE), based on function fingerprints like:

Number of basic blocks
Total GIMPLE statements
Operation types (e.g., GIMPLE_ASSIGN)
Operand counts and right-hand side codes
Variable mapping patterns

This proves that the structural analysis is not only being run, but also making correct and independent decisions for each clone group.

Result on aarch-64

[slee569@aarch64-002 test-clone]$ find . -name "*.sj" -exec ls -lh --time=ctime {} +

-rw-r--r--. 1 slee569 slee569 16K Apr 21 01:48 ./clone-test-aarch64-noprune-clone-test-core.c.264t.sj

-rw-r--r--. 1 slee569 slee569 20K Apr 21 01:48 ./clone-test-aarch64-prune-clone-test-core.c.264t.sj

[slee569@aarch64-002 test-clone]$ grep -i "PRUNE" *.sj

clone-test-aarch64-noprune-clone-test-core.c.264t.sj:[NO PRUNE]: scale_samples

clone-test-aarch64-prune-clone-test-core.c.264t.sj:[PRUNE]: scale_samples

[slee569@aarch64-002 test-clone]$ nano clone-test-core.c

[slee569@aarch64-002 test-clone]$ make clean

rm clone-test-aarch64-prune clone-test-aarch64-noprune clone-test-x86-prune clone-test-x86-noprune || true

rm: cannot remove 'clone-test-x86-prune': No such file or directory

rm: cannot remove 'clone-test-x86-noprune': No such file or directory

rm vol_createsample.o || true

rm *.c.* || true # Should remove compiler dumpfiles

[slee569@aarch64-002 test-clone]$ make clean

rm clone-test-aarch64-prune clone-test-aarch64-noprune clone-test-x86-prune clone-test-x86-noprune || true

rm: cannot remove 'clone-test-aarch64-prune': No such file or directory

rm: cannot remove 'clone-test-aarch64-noprune': No such file or directory

rm: cannot remove 'clone-test-x86-prune': No such file or directory

rm: cannot remove 'clone-test-x86-noprune': No such file or directory

rm vol_createsample.o || true

rm: cannot remove 'vol_createsample.o': No such file or directory

rm *.c.* || true # Should remove compiler dumpfiles

rm: cannot remove '*.c.*': No such file or directory

[slee569@aarch64-002 test-clone]$ make CC="$HOME/gcc-build-001/gcc/xgcc -B$HOME/gcc-build-001/gcc"

/home/slee569/gcc-build-001/gcc/xgcc -B/home/slee569/gcc-build-001/gcc -c vol_createsample.c -o vol_createsample.o

[SJ_PASS] running on function: vol_createsample

[SJ_PASS] dump_file is NULL

/home/slee569/gcc-build-001/gcc/xgcc -B/home/slee569/gcc-build-001/gcc -D 'CLONE_ATTRIBUTE=__attribute__((target_clones("default","rng") ))'\

-march=armv8-a -g -O3 -fno-lto -ftree-vectorize -fdump-tree-all -fdump-ipa-all -fdump-rtl-all \

clone-test-core.c vol_createsample.o -o clone-test-aarch64-prune

clone-test-core.c:47:6: warning: Function Multi Versioning support is experimental, and the behavior is likely to change [-Wexperimental-fmv-target]

47 | void scalar_add(int *a, int *b) {

| ^~~~~~~~~~

[SJ_PASS] running on function: scale_samples

[SJ_PASS] running on function: scale_samples_unrolled

[SJ_PASS] running on function: scalar_add

[SJ_PASS] running on function: scalar_add.rng

[SJ_PASS] running on function: scale_samples_unrolled.rng

[SJ_PASS] running on function: scale_samples.rng

[SJ_PASS] running on function: sum_sample

[SJ_PASS] running on function: scalar_add.resolver

[SJ_PASS] running on function: scale_samples_unrolled.resolver

[SJ_PASS] running on function: scale_samples.resolver

[SJ_PASS] running on function: main

/home/slee569/gcc-build-001/gcc/xgcc -B/home/slee569/gcc-build-001/gcc -D 'CLONE_ATTRIBUTE=__attribute__((target_clones("default","sve2") ))' \

-march=armv8-a -g -O3 -fno-lto -ftree-vectorize -fdump-tree-all -fdump-ipa-all -fdump-rtl-all \

clone-test-core.c vol_createsample.o -o clone-test-aarch64-noprune

clone-test-core.c:47:6: warning: Function Multi Versioning support is experimental, and the behavior is likely to change [-Wexperimental-fmv-target]

47 | void scalar_add(int *a, int *b) {

| ^~~~~~~~~~

[SJ_PASS] running on function: scale_samples

[SJ_PASS] running on function: scale_samples_unrolled

[SJ_PASS] running on function: scalar_add

[SJ_PASS] running on function: scalar_add.sve2

[SJ_PASS] running on function: scale_samples_unrolled.sve2

[SJ_PASS] running on function: scale_samples.sve2

[SJ_PASS] running on function: sum_sample

[SJ_PASS] running on function: scalar_add.resolver

[SJ_PASS] running on function: scale_samples_unrolled.resolver

[SJ_PASS] running on function: scale_samples.resolver

[SJ_PASS] running on function: main

[slee569@aarch64-002 test-clone]$ nano clone-test-core.c

[slee569@aarch64-002 test-clone]$ grep -i "PRUNE" *.sj

clone-test-aarch64-noprune-clone-test-core.c.264t.sj:[PRUNE]: scalar_add

clone-test-aarch64-noprune-clone-test-core.c.264t.sj:[NO PRUNE]: scale_samples_unrolled

clone-test-aarch64-noprune-clone-test-core.c.264t.sj:[NO PRUNE]: scale_samples

clone-test-aarch64-prune-clone-test-core.c.264t.sj:[PRUNE]: scalar_add

clone-test-aarch64-prune-clone-test-core.c.264t.sj:[PRUNE]: scale_samples_unrolled

clone-test-aarch64-prune-clone-test-core.c.264t.sj:[PRUNE]: scale_samples

[slee569@aarch64-002 test-clone]$

This shows that even in aarch-64 it is working great!

in order to go through all above steps please do as follow:
1. in gcc directory please edit your pass file as above instruction and build in gcc-build folder
2. edit clone-test-core.c to have more test cases as mine
3. make clean in testing folder
4. make CC="$HOME/gcc-build-001/gcc/xgcc -B$HOME/gcc-build-001/gcc"
5. check the output if it makes dump file
6. find . -name "*.sj" -exec ls -lh --time=ctime {} +
7. checks the output of dump file with cat command
8. Or you can do grep -i "PRUNE" *.sj to see all the PRUNE contained outputs.

Final thoughts

Throughout this project, I learned a great deal about GCC internals and how to create a custom GIMPLE optimization pass. Specifically, I built a clone pruning analysis pass that compares multiple versions of cloned functions based on their structure and semantic behavior. I manually parsed function names to extract base names using string manipulation (splitting at .), and filtered out resolver variants early to focus only on actual function bodies. I also implemented a fingerprinting mechanism that captured structural information like basic block count, GIMPLE opcode sequences, operand counts, RHS operation types, and a normalized map of variable usage. These fingerprints allowed me to reliably distinguish between function variants that are structurally identical (PRUNE) and those that are semantically different (NO PRUNE), even when they used different variable names or compiler-optimized instructions.

Additionally, I extended the testing harness by modifying clone-test-core.c to include new function variants like scale_samples_unrolled and scalar_add to validate both PRUNE and NO PRUNE scenarios. I tested the pass on both x86_64 and aarch64 architectures using the official spo600-test-clone.tgz suite and verified that .sj dump files were generated correctly and included the expected diagnostic output. Overall, this project helped me deeply understand how function cloning, optimization, and structural analysis work within the GCC framework, and gave me hands-on experience with compiler pass construction, function analysis, and cross-architecture validation.

Sangjune's Dev Blog

Thursday, April 17, 2025

10th Posting - Project: Stage 3

10th Posting - Project: Stage 3

Report Abuse