Research topics

Software supply chain

One-shot runners for continuous integration

Continuous integration consists in setting up a specific environment to build and test an application. For example, Github actions will spawn a container to run the continuous integration. Github has recently introduced just-in-time runners to mitigate software supply chain attacks on CI [1].

In this work, we explore the combination of automatic software diversity [2] and just-in-time runners, to build one-shot unique versions of runners for CI. We will first look at the opportunity of reusing the natural diversity of containers [3], then we will investigate transformations to automatically increase the diversity of runners. The student will explore both github actions and Nix [4] for automatic build.

Adversarial Maven builds

Reprocible builds are essential to build trust in the integrity of packaged artifacts. The Java and Maven communities address this challenge and maintain a list of packages which reproducibility has been verified through independent builds. In this work, we design several strategies to perturb the build to determine the actual robustness of these reproducible builds. These strategies involve randomizing the order of dependencies in the configuration file and perturbing the build environment, such as setting time in the future or changing the system’s language or host name.

Build integrity with Diverse Build Pipelines

Software build pipelines transform source code into deployable artifacts. Relying on a single set of tools or environments throughout the build process creates significant risks. Software diversity in build pipelines can provide enhanced security and reliability. When Solarwinds’ build system was compromised, the company decided to to introduce software diversity at the core of their build pipelines in order to mitigate future hacks. In this work we investigate the feasibility of introducing diversity in builds for Java projects. We study the automatic migration of Maven builds into Gradle builds and then compare the diversity it provides at each build step and what impact this has on the output artifact.

Airgapped reproducible builds

An air-gapped computer is one that is not connected to any outside networks. This is used as security measure to protect sensitive data or machines. With the growth of software supply chains, third-party dependencies and continuous integration workflows have become the targets of malicious actors. In this project, we investigate the feasibility of building an environment that supports airgap software builds. In particular, we study to what extent current lockfiles support airgap builds and propose novel solutions to enhance package managers towards better support for airgap security.

Environment diversity as code

Infrastructure as code is about provisioning execution resources through executable configuration files. In this context, the execution of program provisions a whole environment to execute an application. A variation of the same program will provision a different environment to run the same application. In this project the student will explore transformations for infrastructure as code with the intention of creating a moving target at the environment level. We consider using Modus to define the infrastructure.

Diversifying a package registry

Dependency confusion is a growing threat for software supply chains. This attack consists in uploading malicious packages on public repositories, which will eventually be packaged in applications, through dependency resolution mechanisms. In this work, we will explore the automatic randomization of instructions in private npm registries to mitigate dependency confusion. The student will deploy a local npm registry and a instruction randomization scheme, along with the adaptation of the javascript engine to correctly execute the randomized packages.

Hardening transformations for Rust

Rust is growing for systems programming and so is the necessity for safety and security in Rust. Source to source transformations have been used to harden programs through obfuscation, diversification and randomization. Tigress is the state of the art toolbox for source transformations in C. In this project we experiment with Tigress transformations in Rust, focusing on transformations that randomize Rust control or data flow at runtime.

Dependencies diversification in Java

Software projects integrate a large number of third-party libraries. While this massive reuse is beneficial for software development the reuse of a handful of libraries across millions of projects (e.g. Log4J) is a security and safety liability. One option to mitigate this riks consists in lifting software diversification at the level of dependencies. In this project we develop novel transformations for projects that reuse very popular libraries so that they can randomly switch to compatible alternative libraries, at build time.

Bundler diversity for debloating JavaScript

JavaScript is the most used programming language for the development of web applications. Once the web application grows, so does the bundle size, primarily due to all its third-party dependencies. A bundler is a tool that transforms all the JavaScript code and its dependencies into a new output file with everything merged (including other files such as HTML, CSS, and PNG). There are many production-ready JavaScript bundlers (e.g., Webpack, Rollup, Browserify, ESbuild, and Parcel). They can perform optimizations and minifications on the bundle, such as tree shaking, scope hoisting, bundle splitting, and minifying. However, the size reduction achieved by a bundler is limited by its own code minimization technique. The student will perform an experimental study to leverage the diversity of JavaScript bundlers in order to reduce the original code size of applications while keeping the functionality required to pass all test cases in their test suites.

Tampering with test results

The large open source software supply chains of many applications have turned open source repositories into targets of choice for the introduction of malicious code. As mature open source projects use continuous integration, stealthy code tampering should also ensure that the test suite passes. While the modification of the test suite might appear as red flag to the open source community, another solution consists in forging the test results. For example, a change in the continuous integration pipeline can turn some failing test cases into passing ones. In this work, we investigate different strategies to forge test suite results in order to mask ill-intended changes in the source code.

Learning-based software substitutability

Software substitutability is a property which measures how readily a software component can be replaced by a different but equivalent component. In software supply chains it is critical for faulty or vulnerable components to be replaced as quickly as possible. However, software substitutes might not be immediately available. Generative AI tools may be used to efficiently produce software substitutes in diverse programming languages/paradigms. In this work, we assess the feasibility of using generative AI tools to enhance substitutablity of components in software supply chains.

Dependencies-targeted test suite augmentation

Software developers dedicate considerable efforts in implementing strong test suites that exercise and verify the behavior of their project. Yet, developer written tests usually perform poorly when considering the coverage of third-party dependencies. This is because the intention of these tests is not to verify the behavior of dependencies. In this work we explore novel test generation techniques that aim at increasing the coverage of a project’s software supply chain. This novel tests shall help in enhancing the dynamic transparency of the supply chain and in improving reachability analysis.

Software for the arts

Neuro-renovation of software-based artworks

Software-based artworks are performative: the tangible elements of such works are created on the fly when code executes. If the runtime environments of the artwork evolves or even stops being maintained, then the artwork disappears. For example, many artworks from the 2000’s relied on Java applets as a runtime, and now need to be ported to a new environment. The Whitney museum ported its CODeDOC artworks from applets to p5.js. In this work, we explore the ability of Large Language Models at renovating software-based artworks, starting with Java applets artworks to be renovated into pieces that can perform in modern web browsers.

Reproducibility of fx(hash) artworks

Many generative artists distribute their work online, as NFTs. fx(hash) is a large platform where artists can publish their NFTs, either on Tezos or Ethereum. To do so, they publish the source code as well as various metadata for their artwork. The source code is stored on IPFS and is executed each time someone wants to view the piece. When a buyer acquires one specific instance of the pieces all the parameters to rerun this exact instance are stored on chain. This software architecture is prone to different risks: the link between onchain data and IPFS can be lost, the javascript environment of the buyer can evolve. In this project, we investigate different techniques to mitigate these risks and improve the reproducibility and preservation of artworks.

Code as cool, shareable medium

Generative artists write code, run code, tweak code in order to generate artworks. Many artists share the artworks on online platforms, such as instagram or bandcamp. Sometimes, artists also share the code. Sharing code can have mutliple meanings: cool medium, transparency, sharing, traceability. In this project, we explore the code that is shared on these platforms as well as the different motivations of artists for sharing code.

Automatic documentation of generative artworks

Many algorithmic works rely on third-party libraries (e.g. p5.js), system level interfaces (e.g. GLSL), low-level drivers to connect to diverse hardware devices or online APIs (e.g. translation of geolocation). These rich assemblies of various software packages support setting up interactive, immersive artworks. Using advanced software observability and runtime monitoring, we investigate how to instrument live artworks and to produce precise digital documentation. This documentation will capture intricate parametric design workflows, hardware interactions, and dynamic environmental responses, ensuring detailed preservation of algorithmic methods—not merely their final outputs.

Energy harvesting for portable pen plotter

Pen plotters are programmable machines that can draw and that are currently mostly used by artists who use code as a medium. Plotters are usually used in an artist studio where a computer sends instructions to the machine to draw and where power is readily availabe. For field artists who like to practice and perform in outdoors environment, we wish to build a portable pen plotter. In order to offer the best portability experience, should not need to carry batteries for the plotter or the controller. This is why we also explore energy haresting technology to power the portable pen plotter.

The diversity of live coding practices

Digital art is a kind of performance. As long as software runs art is performed. Yet, generative artists approach this performative medium in different ways. Some artists craft the code in their studio and let the machine perform the artwork in a gallery, in a web browser or on the blockchain. Some other artists perform the act of coding itself. This practice is known as live coding. It can be an individual practice, as well as a group practice, from Seoul to Bogota. The code is written live, projected on screen as an overlay on top of generated images and sound. In this project, we study the programming paradigms and abstractions that live coders use, as well as the code sharing and communities that are involved in this artistic practice.

Live pen plotting

The practice of generative art is performative: an artist develops a program, and only when the program executes, i.e., performs, the artwork exists. Consequently, the artwork never exists on a physical memory and one instance can never be seen again. In the current practice of penplotting and generative art, it is usually necessary to save the generated image on disk before it is passed to the pen plotter. This storage of the performed artwork is not elegant in the realm of generative art. In this project, we design and build a system where an image that is performed as part of a generative artwork is streamed live to the plotter, avoiding the generation of the image file and preserving the performing mindset of generative art. Eventually, the technology can recreate the DiceGL setup.

Keeping it old: backporting updates to legacy artworks

Digital art is made to perform in an environment that evolves: OS patches, drivers for interaction, libraries that evolve or even disappear, etc. It is not always possible to update this environment as some elements such as screen resolution, hardware architecture, I/O might not be compatible with latest versions. In order for the artwork to benefit from the latest software patches, they need to be backported to the legacy project. In this project, we explore backporting practices in the specific context of legacy interactive artworks.

The software supply chain of generative art

Artists use advanced software technology to produce, distribute and generate artworks. Such software technology includes libraries for sound synthesis, visual art, augmented reality, as well as platforms to distribute artworks. In this work, we dive deep in this software ecosystem to draw a systematic landscape of the software supply chain for generative art. Our goal is to reveal the open source software foundations of this artistic practice, credit the key contributors and recognize the specificities of open source communities in the arts.

Software journeys

Automatic generation of 1 Million libc

libc is at the core of most software stacks, but it is fragile, prone to critical vulnerabilities [1]. In this work we explore a combination of techniques to generate large amounts of diverse implementations of libc [2]. The student will combine the abundant combinations of flags of C compilers [3], with state of the art code transformation and obfuscation techniques [4] to generate many libs variants.

Superdiversifying SHA256

Software diversity increases the robustness of software systems [1]. Through various transformations and randomization, it is possible to automatically generate variants of a program. These variants should have minimal impact on convenience, usability, and efficiency. Meanwhile, each variant should not be sensible to the same bug or vulnerability. In this project, we explore the large-scale diversification of SHA256 [2]. This family of hashing functions is essential for cryptography, and hence a critical feature for security. The student will investigate superdiversification [3] and the composition of multiple diversification techniques, in order to synthesize large amounts of variants for an implementation of SHA256.

Github repositories with literary references

Github repositories are rich sources of code, documentation and discussions. They also contain amazing resources such as images, sound snippets, texts or references. A recent study has analyzed the presence of links to academic papers in Github repositories. This study reveals the critical importance of linking code, data and publications to improve replication in computational science. In this work we wish to explore literary references in Github. For example, references to Bob Dylan cited in C code or novel quotes in comments, perl -le’$_=`perldoc -T perlfaq4`,s/^.*N;(.*?)E.*$/$1/s,print’.

The study seeks to unveil the deep connection of Github with culture and society and to analyze the role of literature on software development.

Easter egg VM flag

Easter eggs, sometimes called the final frontier of software development [10]. (Except that of course you can’t have a final frontier, because there’d be nothing for it to be a frontier to, but as frontiers go, it’s pretty penultimate . . .) [269696]. And against the wash of continuous integration a commit hangs, bloated and poetic, one single, cool contribution, gleaming like the madness of gods. Nearly unreal. Reality is not digital, an on-off state, but analog. Easter eggs are for lovers and for the mind. Not enterprise, nor a resurrection, they cherish enchantment and freedom. In the quest for technology and Mastery, you will add an extra mile to the frontier with a new Easter flag for an extraordinary virtual machine [42].

[42] java -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version [10] Curated list of all the easter eggs and hidden jokes in Python [0] Long Live Software Easter Eggs! [269696] Moving Pictures. T. Pratchett. 1990, on Monday afternoon, just before tea.

Paint Splatters & Perl Programs (remix)

In 2019, Colin Mc Millen and Tim Toady ran an experiment to answer one question: is it possible to smear paint on the wall without creating valid Perl? This is an essential question at the forefront of art / computing frontier. In this project, we will reproduce Mc Millen’s experiment, starting with the curated dataset provided by the authors. We will then elaborate on the findings with original splatters and an exploration of Perl’s diverse ecosystem. We eventually thoroughly settle the question of whether coffee stains are better Perl programs than paint splatters.