

A computer-implemented method for generating and ranking source code for performing a task is described. The method includes: receiving input data comprising a task description, a code generation prompt and a test case generation prompt; processing the input data using at least one trained code generation neural network to generate a plurality of code solutions and a plurality of test cases; for each code solution, executing the set of candidate source code on the test inputs of the plurality of test cases to generate a plurality of execution outputs; clustering the plurality of code solutions into a plurality of clusters; computing an interaction matrix that specifies functional overlap between the plurality of clusters; determining, for each cluster, a score based the interaction matrix; and ranking the plurality of clusters based on the scores of the plurality of clusters.






