Hello All! now we are on the final stage of the project which is project 3. If you remember correctly from my Stage 2 posting, i was able to run it and get PRUNE and NOPRUNE message with my own testing file but didnt get any message on prune testing file that was provided from professor. No prune testing file was working good with professor's testing file tho. But now its all fixed and good to go so lets review what changes that i have gone through.
Upon checking passes.def location was totally fine the problem was in cc file.
There are bunch of changes that i made to my new pass file as following.
- Detects the base function via .resolver
suffix.
- Stores:
Number of basic blocks
Number of GIMPLE statements
Types of GIMPLE statements (e.g., GIMPLE_ASSIGN
)
Number of operands per statement
RHS opcodes of assignments
Variable reference maps
- For each potential clone, compares structural attributes and variable usage maps.
- If structure and semantics match → PRUNE
, else → NOPRUNE
.
Now we also see PRUNE sign on prune testing file that was provided.
So lets start our Project stage 3. Briefly exaplain Stage 3 is extending the pass to handle multiple clone groups, and polishing the implementation.## SPO600 Project – Stage 3: Finalizing the Clone Pruning Pass
While the previous version could identify and compare a single cloned function group, the updated version now handles multiple groups across a codebase, with full support for PRUNE/NO PRUNE diagnostics on both x86_64 and aarch64 architectures.
In this post, I’ll walk through the improvements made to the pass, the internal structure updates, the new test cases, and the results I obtained from the test-clone examples.
Since we had an assumption that there would be only one group of cloned functions in the program. For the Stage 3, we need to handle multiple groups of clones within the same program. To do this we used a
where the key is the base function name, and the value is a list of fingerprints for all its variants like .default, .popcnt, etc. We grouped the function variants by extracting their base name using regex.
#include <vector>
#include <string>
#include <map>
#include <regex>
#include <cstdlib>
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "backend.h"
#include "tree.h"
#include "gimple.h"
#include "tree-pass.h"
#include "ssa.h"
#include "tree-pretty-print.h"
#include "gimple-iterator.h"
#include "gimple-walk.h"
#include "internal-fn.h"
#include "gimple-pretty-print.h"
#include "cgraph.h"
#include "gimple-ssa.h"
#include "attribs.h"
#include "pretty-print.h"
#include "tree-inline.h"
#include "intl.h"
#include "function.h"
#include "basic-block.h"
namespace {
const pass_data prune_clones_pass_data = {
GIMPLE_PASS,
"sj",
OPTGROUP_NONE,
TV_NONE,
PROP_cfg,
0, 0, 0, 0,
};
struct FunctionFingerprint {
int bb_count = 0;
int stmt_count = 0;
std::vector<gimple_code> ops;
std::vector<unsigned> op_counts;
std::vector<tree_code> rhs_types;
std::map<int, int> var_refs;
};
static std::map<std::string, std::vector<FunctionFingerprint>> clone_groups;
class tree_sj_pass : public gimple_opt_pass {
public:
tree_sj_pass(gcc::context *ctxt) : gimple_opt_pass(prune_clones_pass_data, ctxt) {}
bool gate(function *) override { return true; }
void set_pass_param(unsigned int, bool) override {}
unsigned int execute(function *) override;
private:
int extract_vars(std::map<tree, int> &, int, gimple *, std::map<int, int> &) const;
};
int tree_sj_pass::extract_vars(std::map<tree, int> &seen, int idx, gimple *stmt, std::map<int, int> &out_map) const {
std::vector<tree> vars;
switch (gimple_code(stmt)) {
case GIMPLE_COND:
if (tree lhs = gimple_cond_lhs(stmt)) vars.push_back(lhs);
if (tree rhs = gimple_cond_rhs(stmt)) vars.push_back(rhs);
break;
case GIMPLE_CALL:
if (tree lhs = gimple_call_lhs(stmt)) vars.push_back(lhs);
break;
case GIMPLE_ASSIGN:
vars.push_back(gimple_assign_lhs(stmt));
for (unsigned j = 1; j < gimple_num_ops(stmt); ++j)
vars.push_back(gimple_op(stmt, j));
break;
case GIMPLE_DEBUG_BIND:
if (tree v = gimple_debug_bind_get_var(stmt)) vars.push_back(v);
if (tree val = gimple_debug_bind_get_value(stmt)) vars.push_back(val);
break;
default:
break;
}
for (tree v : vars) {
auto it = seen.find(v);
if (it != seen.end())
out_map[idx] = it->second;
else {
seen[v] = idx;
out_map[idx] = idx;
}
++idx;
}
return idx;
}
unsigned int tree_sj_pass::execute(function *func) {
if (!dump_file || !func || !func->cfg)
return 0;
auto *node = cgraph_node::get(func->decl);
if (!node) return 0;
std::string name(node->name());
std::smatch match;
std::regex suffix_regex(R"(^(.*?)(?:\\.(default|resolver|popcnt|arch.*|sse2|avx|variant\\.\\d+))?$)");
if (std::regex_match(name, match, suffix_regex)) {
std::string base = match[1];
// Create/track group for this base function name
clone_groups[base];
fprintf(dump_file, "[FUNC] %s grouped under [%s]\\n", name.c_str(), base.c_str());
}
return 0;
}
} // namespace
gimple_opt_pass *make_pass_sj(gcc::context *ctxt) {
return new tree_sj_pass(ctxt);
}
Above new cc includes following changes
A new struct to hold function finger print infos:
struct FunctionFingerprint {
int bb_count = 0;
int stmt_count = 0;
std::vector<gimple_code> gimple_types;
std::vector<unsigned> operand_counts;
std::vector<tree_code> assign_rhs_types;
std::map<int, int> var_map;
};
A global map inside the anonymous namespace:
static std::map<std::string, std::vector<FunctionFingerprint>> clone_groups;
And in execute function we extracted the base name of each function and group them
std::regex suffix_regex(R"(^(.*?)(?:\\.(default|resolver|popcnt|arch.*|sse2|avx|variant\\.\\d+))?$)");
We need to do an Function Fingerprint generation:
For wach functionvariant, we extract a "fingerprint" that uniquely captures its structure. This will include:
- Number of basic blocks
generate_fingerprint() function
FunctionFingerprint generate_fingerprint(function *func) {
FunctionFingerprint fp;
std::map<tree, int> first_seen;
std::map<int, int> var_map;
int index = 0;
basic_block bb;
FOR_EACH_BB_FN(bb, func) {
fp.bb_count++;
for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {
gimple *stmt = gsi_stmt(gsi);
fp.stmt_count++;
fp.ops.push_back(gimple_code(stmt));
fp.op_counts.push_back(gimple_num_ops(stmt));
if (is_gimple_assign(stmt)) {
fp.rhs_types.push_back(gimple_assign_rhs_code(stmt));
}
std::vector<tree> vars;
switch (gimple_code(stmt)) {
case GIMPLE_ASSIGN:
vars.push_back(gimple_assign_lhs(stmt));
for (unsigned j = 1; j < gimple_num_ops(stmt); ++j)
vars.push_back(gimple_op(stmt, j));
break;
case GIMPLE_COND:
if (tree lhs = gimple_cond_lhs(stmt)) vars.push_back(lhs);
if (tree rhs = gimple_cond_rhs(stmt)) vars.push_back(rhs);
break;
case GIMPLE_CALL:
if (tree lhs = gimple_call_lhs(stmt)) vars.push_back(lhs);
break;
case GIMPLE_DEBUG_BIND:
if (tree var = gimple_debug_bind_get_var(stmt)) vars.push_back(var);
if (tree val = gimple_debug_bind_get_value(stmt)) vars.push_back(val);
break;
default:
break;
}
for (tree v : vars) {
if (first_seen.find(v) != first_seen.end()) {
var_map[index] = first_seen[v];
} else {
first_seen[v] = index;
var_map[index] = index;
}
++index;
}
}
}
fp.var_refs = std::move(var_map);
return fp;
}
generate_fingerprint() function is a helper function that walks through a function's GIMPLE statements and records key structure inforamtion into FunctionFongerprint struct. We need this to create a consistent way of identifying the internal shape of a function so we can compare two functions even if they use different variable names.
FunctionFingerprint fp = generate_fingerprint(func);
//creates the structural fingerprint of this specific function variant
clone_groups[base].push_back(fp);
We have now grouped all clone variants under a base function name. This code records the fingerprint of the current variant and stores it into the vector clone_groups[base] so we can compare them later.
Again next!
We will now add followings:
- Add a helper function to compare two fingerprints
- Iterate through each clone_group[base] and decide if it's PRUNE or NO PRUNE
- Print the result to the dump files
So what are we adding in our pass this time?
bool tree_sj_pass::fingerprints_match(const FunctionFingerprint &a, const FunctionFingerprint &b) {
return a.bb_count == b.bb_count &&
a.stmt_count == b.stmt_count &&
a.ops == b.ops &&
a.op_counts == b.op_counts &&
a.rhs_types == b.rhs_types &&
a.var_refs == b.var_refs;
}
Adding it to compare each clone to the base function
if (clone_groups[base].size() >= 2) {
bool all_match = true;
const FunctionFingerprint &base_fp = clone_groups[base][0];
for (size_t i = 1; i < clone_groups[base].size(); ++i) {
if (!fingerprints_match(base_fp, clone_groups[base][i])) {
all_match = false;
break;
}
}
if (dump_file)
fprintf(dump_file, "%s: %s\n", all_match ? "[PRUNE]" : "[NO PRUNE]", base.c_str());
}
We also adding this comparison loop at the bottom of the excute() to make it only compare when we have at least 2 variants and outputs either PRUNE or NO PRUNE per base group.
These will allow us to handle programs with multiple cline groups each with its own PRUNE/ NO PRUNE decision.
Now we are jumping into testing session:
Current Testing file:
// clone-test-core.c
// Chris Tyler 2017.11.29-2024.11.19 - Licensed under GPLv3.
// For the Seneca College SPO600 Course
//
// This code is designed to be compiled with different values for
// the CLONE_ATTRIBUTE macro for different architectures and for
// both 'prune' (cloned functions are the same after optimization)
// and 'noprune' (cloned functions are different after optimization)
// build cases
//
// The cloned function is scale_samples
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>
#include "vol.h"
int sum_sample (int16_t *buff, size_t samples) {
int t;
for (int x = 0; x < SAMPLES; x++) {
t += buff[x];
}
return t;
}
CLONE_ATTRIBUTE
void scale_samples(int16_t *in, int16_t *out, int cnt, int volume) {
for (int x = 0; x < cnt; x++) {
out[x] = ((((int32_t) in[x]) * ((int32_t) (32767 * volume / 100) <<1) ) >> 16);
}
}
//new test case 1: loop unrolled by 4 (structurally different - this should cause NO PRUNE)
CLONE_ATTRIBUTE
void scale_samples_unrolled(int16_t *in, int16_t *out, int cnt, int volume) {
int32_t factor = ((int32_t)(32767 * volume / 100) << 1);
for (int x = 0; x < cnt; x += 4) {
for (int j = 0; j < 4 && x + j < cnt; j++) {
out[x + j] = (((int32_t) in[x + j]) * factor) >> 16;
}
}
}
// new test case 2: simple scalar op (should be PRUNE)
CLONE_ATTRIBUTE
void scalar_add(int *a, int *b) {
*a += *b;
}
int main(int argc, char **argv) {
int x;
int ttl=0;
int a = 5;
int b = 10;
// ---- Create in[] and out[] arrays
int16_t* in;
int16_t* out;
in = (int16_t*) calloc(SAMPLES, sizeof(int16_t));
out = (int16_t*) calloc(SAMPLES, sizeof(int16_t));
// ---- Create dummy samples in in[]
vol_createsample(in, SAMPLES);
// ---- This is the part we're interested in!
// ---- Scale the samples from in[], placing results in out[]
scale_samples(in, out, SAMPLES, VOLUME);
scale_samples_unrolled(in, out, SAMPLES, VOLUME);
scalar_add(&a, &b);
// ---- This part sums the samples.
ttl=sum_sample(out, SAMPLES);
// ---- Print the sum of the samples.
printf("Result: %d\n", ttl);
printf("Scalar result: %d\n", a);
free(in);
free(out);
return 0;
The Result:
As you can see from the logs, all the .default
, .popcnt
, and .resolver
suffixes are being detected, which means the execute()
function is working correctly and iterating through all available functions.
This also confirms that:
-
The pass is correctly identifying each cloned function through the call graph.
-
The logic for grouping clones by their base name (e.g.
scale_samples
,scalar_add
) is functioning as expected. -
Each variant is fingerprinted and compared to its baseline variant.
-
The PRUNE / NO PRUNE logic is being triggered only when at least two variants exist under the same base name.
slee569@x86-001:~/spo600/examples/test-clone$ grep -i "PRUNE" *.sj
clone-test-x86-noprune-clone-test-core.c.264t.sj:[PRUNE]: scalar_add
clone-test-x86-noprune-clone-test-core.c.264t.sj:[PRUNE]: scale_samples_unrolled
clone-test-x86-noprune-clone-test-core.c.264t.sj:[NO PRUNE]: scale_samples
clone-test-x86-prune-clone-test-core.c.264t.sj:[PRUNE]: scalar_add
clone-test-x86-prune-clone-test-core.c.264t.sj:[PRUNE]: scale_samples_unrolled
clone-test-x86-prune-clone-test-core.c.264t.sj:[PRUNE]: scale_samples
from my output this indicates that my pass successfully differentiated between trivial identical functions (PRUNE) and structurally different implementations (NO PRUNE), based on function fingerprints like:
-
Number of basic blocks
-
Total GIMPLE statements
-
Operation types (e.g.,
GIMPLE_ASSIGN
) -
Operand counts and right-hand side codes
-
Variable mapping patterns
This proves that the structural analysis is not only being run, but also making correct and independent decisions for each clone group.
Result on aarch-64
[slee569@aarch64-002 test-clone]$ find . -name "*.sj" -exec ls -lh --time=ctime {} +
-rw-r--r--. 1 slee569 slee569 16K Apr 21 01:48 ./clone-test-aarch64-noprune-clone-test-core.c.264t.sj
-rw-r--r--. 1 slee569 slee569 20K Apr 21 01:48 ./clone-test-aarch64-prune-clone-test-core.c.264t.sj
[slee569@aarch64-002 test-clone]$ grep -i "PRUNE" *.sj
clone-test-aarch64-noprune-clone-test-core.c.264t.sj:[NO PRUNE]: scale_samples
clone-test-aarch64-prune-clone-test-core.c.264t.sj:[PRUNE]: scale_samples
[slee569@aarch64-002 test-clone]$ nano clone-test-core.c
[slee569@aarch64-002 test-clone]$ make clean
rm clone-test-aarch64-prune clone-test-aarch64-noprune clone-test-x86-prune clone-test-x86-noprune || true
rm: cannot remove 'clone-test-x86-prune': No such file or directory
rm: cannot remove 'clone-test-x86-noprune': No such file or directory
rm vol_createsample.o || true
rm *.c.* || true # Should remove compiler dumpfiles
[slee569@aarch64-002 test-clone]$ make clean
rm clone-test-aarch64-prune clone-test-aarch64-noprune clone-test-x86-prune clone-test-x86-noprune || true
rm: cannot remove 'clone-test-aarch64-prune': No such file or directory
rm: cannot remove 'clone-test-aarch64-noprune': No such file or directory
rm: cannot remove 'clone-test-x86-prune': No such file or directory
rm: cannot remove 'clone-test-x86-noprune': No such file or directory
rm vol_createsample.o || true
rm: cannot remove 'vol_createsample.o': No such file or directory
rm *.c.* || true # Should remove compiler dumpfiles
rm: cannot remove '*.c.*': No such file or directory
[slee569@aarch64-002 test-clone]$ make CC="$HOME/gcc-build-001/gcc/xgcc -B$HOME/gcc-build-001/gcc"
/home/slee569/gcc-build-001/gcc/xgcc -B/home/slee569/gcc-build-001/gcc -c vol_createsample.c -o vol_createsample.o
[SJ_PASS] running on function: vol_createsample
[SJ_PASS] dump_file is NULL
/home/slee569/gcc-build-001/gcc/xgcc -B/home/slee569/gcc-build-001/gcc -D 'CLONE_ATTRIBUTE=__attribute__((target_clones("default","rng") ))'\
-march=armv8-a -g -O3 -fno-lto -ftree-vectorize -fdump-tree-all -fdump-ipa-all -fdump-rtl-all \
clone-test-core.c vol_createsample.o -o clone-test-aarch64-prune
clone-test-core.c:47:6: warning: Function Multi Versioning support is experimental, and the behavior is likely to change [-Wexperimental-fmv-target]
47 | void scalar_add(int *a, int *b) {
| ^~~~~~~~~~
[SJ_PASS] running on function: scale_samples
[SJ_PASS] running on function: scale_samples_unrolled
[SJ_PASS] running on function: scalar_add
[SJ_PASS] running on function: scalar_add.rng
[SJ_PASS] running on function: scale_samples_unrolled.rng
[SJ_PASS] running on function: scale_samples.rng
[SJ_PASS] running on function: sum_sample
[SJ_PASS] running on function: scalar_add.resolver
[SJ_PASS] running on function: scale_samples_unrolled.resolver
[SJ_PASS] running on function: scale_samples.resolver
[SJ_PASS] running on function: main
/home/slee569/gcc-build-001/gcc/xgcc -B/home/slee569/gcc-build-001/gcc -D 'CLONE_ATTRIBUTE=__attribute__((target_clones("default","sve2") ))' \
-march=armv8-a -g -O3 -fno-lto -ftree-vectorize -fdump-tree-all -fdump-ipa-all -fdump-rtl-all \
clone-test-core.c vol_createsample.o -o clone-test-aarch64-noprune
clone-test-core.c:47:6: warning: Function Multi Versioning support is experimental, and the behavior is likely to change [-Wexperimental-fmv-target]
47 | void scalar_add(int *a, int *b) {
| ^~~~~~~~~~
[SJ_PASS] running on function: scale_samples
[SJ_PASS] running on function: scale_samples_unrolled
[SJ_PASS] running on function: scalar_add
[SJ_PASS] running on function: scalar_add.sve2
[SJ_PASS] running on function: scale_samples_unrolled.sve2
[SJ_PASS] running on function: scale_samples.sve2
[SJ_PASS] running on function: sum_sample
[SJ_PASS] running on function: scalar_add.resolver
[SJ_PASS] running on function: scale_samples_unrolled.resolver
[SJ_PASS] running on function: scale_samples.resolver
[SJ_PASS] running on function: main
[slee569@aarch64-002 test-clone]$ nano clone-test-core.c
[slee569@aarch64-002 test-clone]$ grep -i "PRUNE" *.sj
clone-test-aarch64-noprune-clone-test-core.c.264t.sj:[PRUNE]: scalar_add
clone-test-aarch64-noprune-clone-test-core.c.264t.sj:[NO PRUNE]: scale_samples_unrolled
clone-test-aarch64-noprune-clone-test-core.c.264t.sj:[NO PRUNE]: scale_samples
clone-test-aarch64-prune-clone-test-core.c.264t.sj:[PRUNE]: scalar_add
clone-test-aarch64-prune-clone-test-core.c.264t.sj:[PRUNE]: scale_samples_unrolled
clone-test-aarch64-prune-clone-test-core.c.264t.sj:[PRUNE]: scale_samples
[slee569@aarch64-002 test-clone]$
This shows that even in aarch-64 it is working great!
in order to go through all above steps please do as follow:
1. in gcc directory please edit your pass file as above instruction and build in gcc-build folder
2. edit clone-test-core.c to have more test cases as mine
3. make clean in testing folder
4. make CC="$HOME/gcc-build-001/gcc/xgcc -B$HOME/gcc-build-001/gcc"
5. check the output if it makes dump file
6. find . -name "*.sj" -exec ls -lh --time=ctime {} +
7. checks the output of dump file with cat command
8. Or you can do grep -i "PRUNE" *.sj to see all the PRUNE contained outputs.
Final thoughts
Throughout this project, I learned a great deal about GCC internals and how to create a custom GIMPLE optimization pass. Specifically, I built a clone pruning analysis pass that compares multiple versions of cloned functions based on their structure and semantic behavior. I manually parsed function names to extract base names using string manipulation (splitting at .
), and filtered out resolver variants early to focus only on actual function bodies. I also implemented a fingerprinting mechanism that captured structural information like basic block count, GIMPLE opcode sequences, operand counts, RHS operation types, and a normalized map of variable usage. These fingerprints allowed me to reliably distinguish between function variants that are structurally identical (PRUNE) and those that are semantically different (NO PRUNE), even when they used different variable names or compiler-optimized instructions.
Additionally, I extended the testing harness by modifying clone-test-core.c
to include new function variants like scale_samples_unrolled
and scalar_add
to validate both PRUNE and NO PRUNE scenarios. I tested the pass on both x86_64 and aarch64 architectures using the official spo600-test-clone.tgz
suite and verified that .sj
dump files were generated correctly and included the expected diagnostic output. Overall, this project helped me deeply understand how function cloning, optimization, and structural analysis work within the GCC framework, and gave me hands-on experience with compiler pass construction, function analysis, and cross-architecture validation.