Project Dana

This is a research entry associated with a third-party research project or paper. We are not responsible for the contents of any files associated with this submission, or for the accuracy of any code / results. Any questions should be directed to the author(s) of the work.

Phenotypic Species Definitions for Genetic Improvement of Source Code

Abstract: Emergent software systems are composed of elementary building blocks, where many of those blocks have variations available which are better or worse in different deployment contexts. Genetic Improvement (GI) for source code has been proposed for creating and curating collections of such blocks, but the combination of new code synthesis with genetic mutation and crossover results in large, complex search spaces. A range of methods to aid such a search have been proposed, with the particular notion of species having appeared in the context of Genetic Algorithms (GAs) to identify individuals with similar genotypes for controlling competition, encouraging the exploration of distant local optima, maintaining diversity and avoiding premature convergence. In this paper we examine a species definition for GI for source code, a domain which has specific features: genotype similarity is largely irrelevant; distance between individuals is undefined; and the fitness landscape is extremely rugged. We propose a phenotypic species definition that captures an algorithm’s functional phenotypic characteristics, while excluding its nonfunctional phenotypic characteristics (and its particular representation in source code). We introduce our proposal in a GI for a hash table scenario, where species are characterised by divergence in probability distributions.

Status: Paper has no replication package, or it's hosted elsewhere; see paper for details.

Venue: ALife 2024.