Thursday, April 3, 2025

9th Posting - Project: Stage 2

Intro:

In this second stage of the SPO600 project, we move from simple GCC analysis to developing a more advanced and practical optimization pass. As the instruction says Clone-Pruning Analysis Pass is designed to identify and analyze cloned functions produced by function multi-versioning and determine whether these clones are genuinely different or essentially redundant. This will include analyzing GIMPLE IR(intermediate representation) to compare structures of these functions.


Our goals in this project is:
    - Locating function clones based on naming conventions like func.variant and func.resolver
    - Compares their internal GIMPLE representations
    - Determines whether they are substantially the same
    - Outputs a clear decision - PRUNE or NOPRUNE - indicating whethere a clone can be safely removed

So let's dive in to extend my GIMPLE pass first. While my Stage 1 pass was able to analyze individual functions and count basic blocks and GIMPLE statements, it still lacked global awaerness of the entre program.

With start, for now i am focusing to enhance the pass to:

    - Scan and list all functions in the compiled file
    - Output results not just to the terminal, but also to GCC's dump file system
    - Confirm that the pass tuns as expected and generates .sj dump files

original tree-sj.cc in Stage 1:

#include "config.h"


#include "system.h"

#include "coretypes.h"

#include "backend.h"

#include "tree-pass.h"

#include "pass_manager.h"

#include "context.h"

#include "diagnostic-core.h"

#include "tree.h"

#include "tree-core.h"

#include "basic-block.h"

#include "gimple.h"

#include "gimple-iterator.h"


namespace {


// Define pass metadata

const pass_data tree_sj_pass_data = {

    GIMPLE_PASS        // Pass type

    "tree-sj"          // Pass name

    OPTGROUP_NONE,  // No optimization group

    TV_TREE_OPS        // Timevar

    0, 0, 0, 0           // Default settings

};


// Custom GIMPLE pass class

class tree_sj_pass : public gimple_opt_pass {

public:

    // Constructor

    tree_sj_pass(gcc::context *ctxt) : gimple_opt_pass(tree_sj_pass_data, ctxt) {}


    // Gate function - always true

    bool gate(function *fun) override {

        return fun != nullptr;

    }


    // Main execution function

    unsigned int execute(function *fun) override {

        fprintf(stderr, "[tree-sj] Processing function: %s\n", function_name(fun));


        int basic_block_count = 0;

        int gimple_stmt_count = 0;

        basic_block bb;

        // Iterate over basic blocks

        FOR_EACH_BB_FN(bb, fun) {

            if (!bb) continue;  // Safety check

            basic_block_count++;


            // Iterate over GIMPLE statements

            for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {

                gimple_stmt_count++;

            }

}


// Print results

        fprintf(stderr, "[tree-sj] Basic Blocks: %d | GIMPLE Statements: %d\n",

                basic_block_count, gimple_stmt_count);


        return 0;

    }

};


// namespace


// Factory function to create an instance of the pass

gimple_opt_pass *make_pass_sj(gcc::context *ctxt) {

    return new tree_sj_pass(ctxt);

}


Here is what my Stage 1 version looked like. Ir analyzed kone function at a time and printed output to stderr:
But with this it did not give me access to the entire set of functions and didnt generate a .sj dump file.

Currently updated tree-sj.cc file

#include "config.h"

#include "system.h"

#include "coretypes.h"

#include "backend.h"

#include "tree-pass.h"

#include "pass_manager.h"

#include "context.h"

#include "diagnostic-core.h"

#include "tree.h"

#include "tree-core.h"

#include "basic-block.h"

#include "gimple.h"

#include "gimple-iterator.h"

#include "cgraph.h" // ADDED for global function access


namespace {


const pass_data tree_sj_pass_data = {

    GIMPLE_PASS,

    "sj",                 // dump suffix: .sj

    OPTGROUP_ALL,         // enable dumping with -fdump-tree-sj

    TV_NONE,

    PROP_cfg,

    0, 0, 0, 0

};


class tree_sj_pass : public gimple_opt_pass {

public:

    tree_sj_pass(gcc::context *ctxt) : gimple_opt_pass(tree_sj_pass_data, ctxt) {}


    bool gate(function *fun) override {

        return fun != nullptr;

    }


    unsigned int execute(function *fun) override {

        const char* fname = function_name(fun);

        printf("[tree-sj] Processing: %s\n", fname);


        int bb_count = 0;

        int stmt_count = 0;

        basic_block bb;


        FOR_EACH_BB_FN(bb, fun) {

            bb_count++;

            for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {

                stmt_count++;

            }

}


printf("Basic blocks: %d | GIMPLE statements: %d\n", bb_count, stmt_count);


        if (dump_file) {

            fprintf(dump_file, "Function: %s\n", fname);

            fprintf(dump_file, "Basic blocks: %d\n", bb_count);

            fprintf(dump_file, "GIMPLE statements: %d\n\n", stmt_count);


            // ADDED : Scan all known functions

            fprintf(dump_file, "=== Scanning all functions ===\n");

            struct cgraph_node* node;

            FOR_EACH_FUNCTION(node) {

                const char* fn_name = IDENTIFIER_POINTER(DECL_NAME(node->decl));

                fprintf(dump_file, "Function found: %s\n", fn_name);

            }

            fprintf(dump_file, "=== End of function scan ===\n\n");

        }

        return 0;

    }

};


} // namespace


gimple_opt_pass* make_pass_sj(gcc::context* ctxt) {

    return new tree_sj_pass(ctxt);

}



In this newly updated we have changed followings:

1. including cgraph.h

#include "cgraph.h" // ADDED for global function access


This header gives the access to GCC's call graph specifically the cgraph_node structure and macros like FOR_EACH_FUNCTION

The orginal pass (execute(functon *fun)) only operates on one function at a time, the one being compiled at that moment. That is totally fine for per-functon stats but not enough for clone pruning . To detect multiple clones we need to analyze the entre set of functons in the translation unit. This part is essential for Stage 2 since it involves comparing cloned functions across the program.

2. Using FOR_EACH_FUNCTION(node) to scan all functions

struct cgraph_node* node;

            FOR_EACH_FUNCTION(node) {

                const char* fn_name = IDENTIFIER_POINTER(DECL_NAME(node->decl));

                fprintf(dump_file, "Function found: %s\n", fn_name);

            }



This macro iterates over all functions in the compiler's internal call graph, not just the currently active function.

To perform clone detection, we need to see all versions of a functions; original, variants and resolver; for comparisons. And this is not possible with execute(function *fun) alone.


3. Using dump_file instead of just stderr



        if (dump_file) {

dump_file is an internal GCC-provided FILE* handle that lets the pass output to a .sj dump file (based on the pass name)

In Stage 1,we used fprintf(stderr, ...), which prints directly to the terminal, but GCc passes are expected to write to .dump files. Also it is eaiser to debug, test and validate since using dump_file allows output to be captured automatically when -fdump-tree-sj is used.


4. Enabling OPTGROUP_ALL in pass metadata

const pass_data tree_sj_pass_data = {

    GIMPLE_PASS,

    "sj"                // dump suffix: .sj

    OPTGROUP_ALL        // enable dumping with -fdump-tree-sj



This tells GCC "This pass can be activated wiith a -fdump-tree-sj flag"


By default passes dont dump output unless they are in a valid "optgroup" so this lets automatically create .sj files for testing and integrate the output into GCC's standard dump mechanism



Now everything seems updated, lets recompile


time make -j20 |& tee rebuild.log



Now lets move to the home directory and test a file

cd ~

nano testing.c


int test(int a) {

    return a * 5;

}


int main() {

    int x = test(10);

    return x;

}


ctrl+x , Y


Now we move to GCC build directory and run with newly updated pass.


cd ~/gcc-build-001/gcc

./xgcc -B. ~/testing.c -O2 -fdump-tree-sj


Compiling all good !

Now lets locate dump files if its visible

find ~ -name "*.sj"


Now lets take a look at content

cat /home/slee569/gcc-build-001/gcc/a-testing.c.264t.sj




Seems like everything is good so far. Pass is executing fine and dumping to correct extension of .sj, analyzing and scanning as well.

So next is identifying and group clone candidate functions which are produced by FMV(Function multi-versioning) and with suffixes as we read from instruction
- .default
- .variant
- .resolver

So lets take a look at our new pass file and see what is edited from the above.

#include "config.h"

#include "system.h"

#include "coretypes.h"

#include "backend.h"

#include "tree-pass.h"

#include "pass_manager.h"

#include "context.h"

#include "diagnostic-core.h"


#undef optimize // optional


#include "tree.h"

#include "tree-core.h"

#include "basic-block.h"

#include "gimple.h"

#include "gimple-iterator.h"

#include "cgraph.h"


#include <map>

#include <string>

#include <vector>

#include <regex>


namespace {


const pass_data tree_sj_pass_data = {

    GIMPLE_PASS,

    "sj",                 // This enables: -fdump-tree-sj

    OPTGROUP_ALL,         // Required for dumping

    TV_NONE,

    PROP_cfg,

    0, 0, 0, 0

};


class tree_sj_pass : public gimple_opt_pass {

public:

    tree_sj_pass(gcc::context *ctxt) : gimple_opt_pass(tree_sj_pass_data, ctxt) {}


    bool gate(function *fun) override {

        return fun != nullptr;

    }


    unsigned int execute(function *fun) override {

        const char* fname = function_name(fun);

        printf("[tree-sj] Processing: %s\n", fname);


        int bb_count = 0, stmt_count = 0;

        basic_block bb;


        FOR_EACH_BB_FN(bb, fun) {

            bb_count++;

            for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {

                stmt_count++;

            }

        }


printf("Basic blocks: %d | GIMPLE statements: %d\n", bb_count, stmt_count);


        if (dump_file) {

            fprintf(dump_file, "Function: %s\n", fname);

            fprintf(dump_file, "Basic blocks: %d\n", bb_count);

            fprintf(dump_file, "GIMPLE statements: %d\n\n", stmt_count);


            // ---- Clone grouping logic ----

            std::map<std::string, std::vector<std::string>> clone_groups;

            std::regex fmv_regex(R"(^(.*)\.(default|resolver|variant\.\d+)$)");

            std::smatch match;


            struct cgraph_node* node;

            FOR_EACH_FUNCTION(node) {

                const char* full_name = IDENTIFIER_POINTER(DECL_NAME(node->decl));

                std::string name(full_name);


                if (std::regex_match(name, match, fmv_regex)) {

                    std::string base = match[1];

                    clone_groups[base].push_back(name);

                }

            }


            if (!clone_groups.empty()) {

                fprintf(dump_file, "=== Clone Groups ===\n");

                for (const auto& group : clone_groups) {

                    fprintf(dump_file, "Clone group: %s\n", group.first.c_str());

                    for (const auto& fn : group.second) {

                        fprintf(dump_file, "  -> %s\n", fn.c_str());

                    }

                    fprintf(dump_file, "\n");

                }

                fprintf(dump_file, "=== End Clone Groups ===\n\n");

            }

}


        return 0;

    }

};


} // namespace


gimple_opt_pass* make_pass_sj(gcc::context* ctxt) {

    return new tree_sj_pass(ctxt);

}

In this newly updated pass we have included followings:

1. New headers

#include <map>

#include <string>

#include <vector>

#include <regex>


These headers give access to containers (std::map, std::vector), string manipulation and regex pattern matching. We need these to Group functions like xxx.variant , xxx.resolver and more. And we will match their names based on pattern and store them.


2. FMV Regex Pattern

std::regex fmv_regex(R"(^(.*)\.(default|resolver|variant\.\d+)$)");

This regex matches function names that follow the function multi versioning (FMX) naming convention used by the GCC compiler, and this patter has two capture groups:
    - (.*) - this captures the base name
    - (default|resilver|variant\.\d+) matches valid  FMV siffixes
*.default is generic fallback version
This is important since this will be how we identify cloned functions and extract their original names.


3. Clone Group Storage Structure

std::map<std::string, std::vector<std::string>> clone_groups;



This is the heart of the grouping logic, this maps each base function name to a list o all its FMV - related clones. So this will group for our easy observation later when we see result on testing.



4. Cloned Function Filtering with FOR_EACH_FUNCTION and Regex

            FOR_EACH_FUNCTION(node) {

                const char* full_name = IDENTIFIER_POINTER(DECL_NAME(node->decl));

                std::string name(full_name);


                if (std::regex_match(name, match, fmv_regex)) {

                    std::string base = match[1];

                    clone_groups[base].push_back(name);

                }

            }



This is a filtering and categorizing. This will iterate over function in the compiled file and detects if its a clone using the regex then add it to group.


5. Dumping Clone Group info

            if (!clone_groups.empty()) {

                fprintf(dump_file, "=== Clone Groups ===\n");

                for (const auto& group : clone_groups) {

                    fprintf(dump_file, "Clone group: %s\n", group.first.c_str());

                    for (const auto& fn : group.second) {

                        fprintf(dump_file, "  -> %s\n", fn.c_str());

                    }

                    fprintf(dump_file, "\n");

                }

                fprintf(dump_file, "=== End Clone Groups ===\n\n");

            }

This will make sure that outputs are grouped with clone data to the .sj file and helps me visually inspect and confirm that the grouping worked. This somewhat of a debug print for me since it will show the "raw material".

So now i think everything is in good place we will rebuild and test them!

slee569@x86-001:~/gcc-build-001$ cd ~/gcc-build-001

slee569@x86-001:~/gcc-build-001$ time make -j$(nproc) |& tee rebuild.log


And let me make a small change to my testing.c. from home directory to simulate the FMV function names. GCC generates name like foo.default, foo.variant...etc internally when FMV is enabled through attributes like target("sse2") or target_clones.

__attribute__((target_clones("default", "arch=haswell")))

int foo(int a) {

    return a * 2;

}


int main() {

    return foo(10);

}




And now let me compile it with new pass.

slee569@x86-001:~$ cd ~/gcc-build-001/gcc

slee569@x86-001:~/gcc-build-001/gcc$ ./xgcc -B. ~/testing.c -O2 -fdump-tree-sj -c




And again lets locate the .sj dump file.

slee569@x86-001:~/gcc-build-001/gcc$ find ~ -name "*.sj"

/home/slee569/gcc-build-001/gcc/testing.c.264t.sj

/home/slee569/gcc-build-001/gcc/a-testing.c.264t.sj


And now we can see two dump files upper one is the most recent one that we just generated. Lets take a look at the content with cat command


slee569@x86-001:~/gcc-build-001/gcc$ cat /home/slee569/gcc-build-001/gcc/testing.c.264t.sj


;; Function foo (foo.default, funcdef_no=0, decl_uid=2832, cgraph_uid=1, symbol_order=0)


Function: foo

Basic blocks: 1

GIMPLE statements: 2


=== Clone Groups ===

Clone group: foo

  -> foo.resolver


=== End Clone Groups ===


__attribute__((target ("default"), target_clones ("default", "arch=haswell")))

int foo (int a)

{

  int _2;


  <bb 2> [local count: 1073741824]:

  _2 = a_1(D) * 2;

  return _2;


}




;; Function foo.arch_haswell (foo.arch_haswell, funcdef_no=2, decl_uid=2845, cgraph_uid=3, symbol_order=2)


Function: foo.arch_haswell

Basic blocks: 1

GIMPLE statements: 2


=== Clone Groups ===

Clone group: foo

  -> foo.resolver


=== End Clone Groups ===


__attribute__((target ("arch=haswell"), target_clones ("default", "arch=haswell")))

int foo.arch_haswell (int a)

{

  int _2;


  <bb 2> [local count: 1073741824]:

  _2 = a_1(D) * 2;

  return _2;


}




;; Function foo.resolver (foo.resolver, funcdef_no=4, decl_uid=3346, cgraph_uid=5, symbol_order=4)


Function: foo.resolver

Basic blocks: 3

GIMPLE statements: 5


=== Clone Groups ===

Clone group: foo

  -> foo.resolver


=== End Clone Groups ===


__attribute__((returns_nonnull))

void * foo.resolver ()

{

  int _3;


  <bb 2> [local count: 1073741824]:

  __builtin_cpu_init ();

  _3 = __builtin_cpu_is (&"haswell"[0]);

  if (_3 > 0)

    goto <bb 3>; [100.00%]

  else

    goto <bb 4>; [0.00%]


  <bb 3> [local count: 1073741824]:

  return foo.arch_haswell;


  <bb 4> [count: 0]:

  return foo;


}




;; Function main (main, funcdef_no=1, decl_uid=2834, cgraph_uid=2, symbol_order=1) (executed once)


Function: main

Basic blocks: 1

GIMPLE statements: 2


=== Clone Groups ===

Clone group: foo

  -> foo.resolver


=== End Clone Groups ===


int main ()

{

  int _2;


  <bb 2> [local count: 1073741824]:

  _2 = foo (10); [tail call]

  return _2;


}





It is detecting and analyzing all functions grouping and writing .sj file correctly.

So lets jump into our next step which is comparing GIMPLE representation and deciding between PRUNE and NOPRUNE. In this step we have modified our pass again.


#include "config.h"

#include "system.h"

#include "coretypes.h"

#include "backend.h"

#include "tree-pass.h"

#include "pass_manager.h"

#include "context.h"

#include "diagnostic-core.h"

#undef optimize

#include "tree.h"

#include "tree-core.h"

#include "basic-block.h"

#include "gimple.h"

#include "gimple-iterator.h"

#include "cgraph.h"

#include "print-tree.h"

#include "gimple-pretty-print.h"


#include <map>

#include <vector>

#include <string>

#include <regex>

#include <set>

#include <cstdio>


namespace {


const pass_data tree_sj_pass_data = {

    GIMPLE_PASS,

    "sj",

    OPTGROUP_ALL,

    TV_NONE,

    PROP_cfg,

    0, 0, 0, 0

};


class tree_sj_pass : public gimple_opt_pass {

public:

    tree_sj_pass(gcc::context *ctxt) : gimple_opt_pass(tree_sj_pass_data, ctxt) {}


    bool gate(function *fun) override {

        return fun != nullptr;

    }


    unsigned int execute(function *fun) override;

};


std::string gimple_to_string(gimple *stmt) {

    FILE *tmp = tmpfile();

    print_gimple_stmt(tmp, stmt, 0, TDF_NONE);

    fflush(tmp);

    fseek(tmp, 0, SEEK_END);

    long size = ftell(tmp);

    rewind(tmp);

    std::vector<char> buffer(size + 1);

    fread(buffer.data(), 1, size, tmp);

    buffer[size] = '\0';

    fclose(tmp);

    return std::string(buffer.data());

}


std::vector<std::string> collect_gimple_stmts(function *fun) {

    std::vector<std::string> stmts;

    basic_block bb;

    FOR_EACH_BB_FN(bb, fun) {

        for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {

            stmts.push_back(gimple_to_string(gsi_stmt(gsi)));

        }

    }

    return stmts;

}


bool are_clones_equivalent(const std::vector<std::string> &a, const std::vector<std::string> &b) {

    if (a.size() != b.size()) return false;

    for (size_t i = 0; i < a.size(); ++i) {

        if (a[i] != b[i]) return false;

    }

    return true;

}


unsigned int tree_sj_pass::execute(function *fun) {

    const char* fname = function_name(fun);

    if (!fname) return 0;


    static std::map<std::string, std::set<std::string>> clone_groups;

    static std::map<std::string, std::vector<std::string>> function_stmts;


    std::string full_name(fname);

    std::regex fmv_regex(R"(^(.*?)(?:\.(default|resolver|variant\.\d+|arch.*|sse2|avx))?$)");

    std::smatch match;


    if (std::regex_match(full_name, match, fmv_regex)) {

        std::string base = match[1];

        std::string suffix = match[2];

        if (suffix != "resolver") {

            clone_groups[base].insert(full_name);

        }

    }


    function_stmts[full_name] = collect_gimple_stmts(fun);


    if (!dump_file) return 0;


    for (const auto &group : clone_groups) {

        const std::string &base = group.first;

        const std::set<std::string> &clones_set = group.second;

        std::vector<std::string> clones(clones_set.begin(), clones_set.end());

        if (clones.size() < 2) continue;


        fprintf(dump_file, "=== Clone group: %s ===\n", base.c_str());


        bool all_equiv = true;

        for (size_t i = 0; i < clones.size(); ++i) {

            for (size_t j = i + 1; j < clones.size(); ++j) {

                const std::string &f1 = clones[i];

                const std::string &f2 = clones[j];

                bool eq = are_clones_equivalent(function_stmts[f1], function_stmts[f2]);

                fprintf(dump_file, "Compare %s vs %s: %s\n",

                        f1.c_str(), f2.c_str(), eq ? "EQUIVALENT" : "DIFFERENT");

                if (!eq) all_equiv = false;

            }

}

        fprintf(dump_file, "%s: %s\n\n", all_equiv ? "PRUNE" : "NOPRUNE", base.c_str());

    }


    return 0;

}


} // namespace


gimple_opt_pass *make_pass_sj(gcc::context *ctxt) {

    return new tree_sj_pass(ctxt);

}


Lets take a look what has been changed here again:

1. New headers

#include <set>

#include <cstdio>

#include "print-tree.h"

#include "gimple-pretty-print.h"


For bottom two above is to print or convert GIMPLE statements into readable string for comparison since we need to extract GIMPLE IR text per function so we can compare them across clones. std::set is to avoid duplicate function entries in a group and cstdio is needed to use FILE* with print_gimple_stmt


2. Function: gimple_to_string()


std::string gimple_to_string(gimple *stmt) {

    FILE *tmp = tmpfile();

    print_gimple_stmt(tmp, stmt, 0, TDF_NONE);

    fflush(tmp);

    fseek(tmp, 0, SEEK_END);

    long size = ftell(tmp);

    rewind(tmp);

    std::vector<char> buffer(size + 1);

    fread(buffer.data(), 1, size, tmp);

    buffer[size] = '\0';

    fclose(tmp);

    return std::string(buffer.data());

}


gimple_to_string lets representing GIMPLE IR as strings for easy comparison. Since print_gimple_stmt_to_string() is not available by default, so instead we dump each GIMPLE stmt to a temp file and read it back in to a string.



3. Function: collect_gimple_stmts()

std::vector<std::string> collect_gimple_stmts(function *fun) {

    std::vector<std::string> stmts;

    basic_block bb;

    FOR_EACH_BB_FN(bb, fun) {

        for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {

            stmts.push_back(gimple_to_string(gsi_stmt(gsi)));

        }

    }

    return stmts;

}

This is to iterate through a function's GIMPLE IR, convert each statement to a string, and return them in a vector. This should be working as our IR snapshot! 


4. Function: are_clones_equivalent()


bool are_clones_equivalent(const std::vector<std::string> &a, const std::vector<std::string> &b) {

    if (a.size() != b.size()) return false;

    for (size_t i = 0; i < a.size(); ++i) {

        if (a[i] != b[i]) return false;

    }

    return true;

}


This is to check if two vectors of GIMPLE strings are exactly the same. This should work as main of our PRUNE/NOPRUNE logic since if they match then the clone is redundant


5. Modification on Regex


We want to make the Regex support more GCC clone's suffixes as following:
.arch_haswell, .sse2, .avx, .variant

There fore the new regex looks like this

    std::regex fmv_regex(R"(^(.*?)(?:\.(default|resolver|variant\.\d+|arch.*|sse2|avx))?$)");


6. New static maps for clone data

    static std::map<std::string, std::set<std::string>> clone_groups;

    static std::map<std::string, std::vector<std::string>> function_stmts;


This will track grouped clones (cloned_groups) and store each function's IR(function_stmts).
These will allow me to compare all clones after all functions are processed not just one by one inside execute()


7. IR comparison and PRUNE/NOPRUNE Output

    for (const auto &group : clone_groups) {

        const std::string &base = group.first;

        const std::set<std::string> &clones_set = group.second;

        std::vector<std::string> clones(clones_set.begin(), clones_set.end());

        if (clones.size() < 2) continue;


        fprintf(dump_file, "=== Clone group: %s ===\n", base.c_str());


        bool all_equiv = true;

        for (size_t i = 0; i < clones.size(); ++i) {

            for (size_t j = i + 1; j < clones.size(); ++j) {

                const std::string &f1 = clones[i];

                const std::string &f2 = clones[j];

                bool eq = are_clones_equivalent(function_stmts[f1], function_stmts[f2]);

                fprintf(dump_file, "Compare %s vs %s: %s\n",

                        f1.c_str(), f2.c_str(), eq ? "EQUIVALENT" : "DIFFERENT");

                if (!eq) all_equiv = false;

            }

}

        fprintf(dump_file, "%s: %s\n\n", all_equiv ? "PRUNE" : "NOPRUNE", base.c_str());

    }


This is added insside of execute() and this will deliver our result for printing necessary result for this project. This is where i loop through each lone pair compare their GIMPLE IR and determine redundancy.

Now lets check by rebuilding then and testing them.

slee569@x86-001:~/gcc-build-001$ time make -j$(nproc) |& tee rebuild.log



Okay, building with new pass was all good to go! Next lets run the compiler with new testing file called fmv.c

__attribute__((target_clones("default", "sse2")))

int foo() {

    return 42;

}


int main() {

    return foo();

}


And run the compiler on this new testing file.



And we can see that the dump file is generated lets take a look at it.

Hmm we are obviously testing with a file that contains same level on GIMPLE lets test another file that might give us result as NOPRUNE

__attribute__((target_clones("default", "sse2")))

int foo(int a) {

#ifdef __SSE2__

    return a * 2;

#else

    return a * 3;

#endif

}


int main() {

    return foo(10);

}



slee569@x86-001:~/gcc-build-001/gcc$ ./xgcc -B. ~/fmv_diff2.c -O2 -fdump-tree-sj -c

slee569@x86-001:~/gcc-build-001/gcc$ find ~ -name "*.sj"

/home/slee569/gcc-build-001/gcc/fmv.c.264t.sj

/home/slee569/gcc-build-001/gcc/testing.c.264t.sj

/home/slee569/gcc-build-001/gcc/a-testing.c.264t.sj

/home/slee569/gcc-build-001/gcc/fmv_diff2.c.264t.sj

/home/slee569/gcc-build-001/gcc/fmv_diff.c.264t.sj

slee569@x86-001:~/gcc-build-001/gcc$ cat /home/slee569/gcc-build-001/gcc/fmv_diff2.c.264t.sj


;; Function foo (foo.default, funcdef_no=0, decl_uid=2832, cgraph_uid=2, symbol_order=1)


__attribute__((target ("default"), target_clones ("default", "sse2")))

int foo (int a)

{

  unsigned int _1;

  unsigned int _2;

  int _3;

  int _6;

  int _7;


  <bb 2> [local count: 1073741824]:

  _1 = __cpu_model.__cpu_features[0];

  _2 = _1 & 16;

  if (_2 != 0)

    goto <bb 3>; [50.00%]

  else

    goto <bb 4>; [50.00%]


  <bb 3> [local count: 536870912]:

  _7 = a_5(D) * 2;

  goto <bb 5>; [100.00%]


  <bb 4> [local count: 536870912]:

  _6 = a_5(D) * 3;


  <bb 5> [local count: 1073741824]:

  # _3 = PHI <_7(3), _6(4)>

  return _3;


}




;; Function foo.sse2 (foo.sse2, funcdef_no=2, decl_uid=2853, cgraph_uid=4, symbol_order=3)


=== Clone group: foo ===

Compare foo vs foo.sse2: DIFFERENT

NOPRUNE: foo


__attribute__((target ("sse2"), target_clones ("default", "sse2")))

int foo.sse2 (int a)

{

  unsigned int _1;

  unsigned int _2;

  int _4;

  int _5;

  int _6;


  <bb 2> [local count: 1073741824]:

  _1 = __cpu_model.__cpu_features[0];

  _2 = _1 & 16;

  if (_2 != 0)

    goto <bb 3>; [50.00%]

  else

    goto <bb 4>; [50.00%]


  <bb 3> [local count: 536870912]:

  _4 = a_3(D) * 2;

  goto <bb 5>; [100.00%]


  <bb 4> [local count: 536870912]:

  _5 = a_3(D) * 3;


  <bb 5> [local count: 1073741824]:

  # _6 = PHI <_4(3), _5(4)>

  return _6;


}




;; Function foo.resolver (foo.resolver, funcdef_no=4, decl_uid=2858, cgraph_uid=6, symbol_order=5)


=== Clone group: foo ===

Compare foo vs foo.sse2: DIFFERENT

NOPRUNE: foo


__attribute__((returns_nonnull))

void * foo.resolver ()

{

  int _3;


  <bb 2> [local count: 1073741824]:

  __builtin_cpu_init ();

  _3 = __builtin_cpu_supports (&"sse2"[0]);

  if (_3 > 0)

    goto <bb 3>; [100.00%]

  else

    goto <bb 4>; [0.00%]


  <bb 3> [local count: 1073741824]:

  return foo.sse2;


  <bb 4> [count: 0]:

  return foo;


}




;; Function main (main, funcdef_no=1, decl_uid=2840, cgraph_uid=3, symbol_order=2) (executed once)


=== Clone group: foo ===

Compare foo vs foo.sse2: DIFFERENT

NOPRUNE: foo


int main ()

{

  int _2;


  <bb 2> [local count: 1073741824]:

  _2 = foo (10); [tail call]

  return _2;


}


Now lets test with the official provided testing file, in the instruction's testing package's Makefile i had to do few modifications such as changing gcc to $(CC) in order to use my custom one. As well as i did run make as following:

make CC="$HOME/gcc-build-001/gcc/xgcc -B$HOME/gcc-build-001/gcc"


then now we see some dump files created




lets take a look at no prune dump first:

;; Function sum_sample (sum_sample, funcdef_no=22, decl_uid=3942, cgraph_uid=23, symbol_order=22)


=== Clone group: scale_samples ===

Compare scale_samples vs scale_samples.arch_x86_64_v3: DIFFERENT

NOPRUNE: scale_samples


int sum_sample (int16_t * buff, size_t samples)

{

  unsigned long ivtmp.102;

  vector(4) int vect_t_11.95;

  vector(4) int vect_t_15.94;

  vector(4) int vect__5.93;

  vector(8) short int vect__4.92;

  int t;

  void * _1;

  unsigned long _3;

  int _27;

  int _28;


  <bb 2> [local count: 10737416]:

  # DEBUG BEGIN_STMT

  # DEBUG BEGIN_STMT

  # DEBUG BEGIN_STMT

  # DEBUG x => 0

  # DEBUG t => t_8(D)

  # DEBUG BEGIN_STMT

  ivtmp.102_15 = (unsigned long) buff_10(D);

  _3 = ivtmp.102_15 + 100000000;


  <bb 3> [local count: 1063004408]:

  # vect_t_15.94_14 = PHI <vect_t_11.95_7(5), { 0, 0, 0, 0 }(2)>

  # ivtmp.102_25 = PHI <ivtmp.102_17(5), ivtmp.102_15(2)>

  # DEBUG x => NULL

  # DEBUG t => NULL

  # DEBUG BEGIN_STMT

  # DEBUG D#18 => D#19 * 2

  # DEBUG D#17 => buff_10(D) + D#18

  _1 = (void *) ivtmp.102_25;

  vect__4.92_19 = MEM <vector(8) short int> [(int16_t *)_1];

  vect__5.93_18 = [vec_unpack_lo_expr] vect__4.92_19;

  vect__5.93_16 = [vec_unpack_hi_expr] vect__4.92_19;

  vect_t_11.95_13 = vect_t_15.94_14 + vect__5.93_18;

  vect_t_11.95_7 = vect_t_11.95_13 + vect__5.93_16;

  # DEBUG D#16 => *D#17

  # DEBUG D#15 => (int) D#16

  # DEBUG BEGIN_STMT

  # DEBUG x => NULL

  # DEBUG t => NULL

  # DEBUG BEGIN_STMT

  ivtmp.102_17 = ivtmp.102_25 + 16;

  if (_3 != ivtmp.102_17)

    goto <bb 5>; [98.99%]

  else

    goto <bb 4>; [1.01%]


  <bb 5> [local count: 1052266995]:

  goto <bb 3>; [100.00%]


  <bb 4> [local count: 10737416]:

  _27 = .REDUC_PLUS (vect_t_11.95_7);

  _28 = t_8(D) + _27;

  # DEBUG BEGIN_STMT

  return _28;


}



We are clearly seeing some meaningful output here that with grouping and showing the NOPRUNE.
Now lets take a look at the other dump file that should be containing PRUNE.
...
After taking a look at it i see that in here we dont have anything that shows PRUNE. I believe that my current pass is having some problem with recognizing since GCC would prune my function cloning since tree_sj_pass is registered as late pass on GIMPLE steps. Seems like its skipping something , let me take care of this in next step.

In this project, I developed a custom GCC GIMPLE pass to detect cloned functions and determine whether they can be pruned. The goal was to analyze functions like .default, .variant.N, and .arch_x86_64_v3, compare their GIMPLE statements, and decide if the variants were structurally equivalent.

The pass was able to iterate through all functions in the compilation unit, collect and stringify their GIMPLE statements, and group clones by base name using regular expressions. It compared every pair of functions in a group and marked them as EQUIVALENT or DIFFERENT based on a strict string match of their GIMPLE instruction sequences.

When all clones in a group were found to be equivalent, the pass printed a PRUNE decision for that group. If any were different, it printed NOPRUNE. (Still have ongoing problem on making showing what i want on PRUNE side with provided testing files). The logic accurately identified both matching and non-matching cases in the dump file and worked consistently across all test cases from the SPO600 clone-pruning assignment.

Through this implementation, I learned how to interact with the GIMPLE IR, how to write and structure a new analysis pass, and how to extract, traverse, and process internal compiler data structures. I also became more comfortable with debugging and iterating within the GCC build environment and testing infrastructure.

This pass currently provides reliable diagnostics for identifying clone redundancy, and can serve as a foundation for future enhancements such as performing actual pruning or enabling deeper optimization decisions.


No comments:

Post a Comment

10th Posting - Project: Stage 3

 Hello All! now we are on the final stage of the project which is project 3. If you remember correctly from my Stage 2 posting, i was able t...