Software hardening
Fixing yanked releases in npm
Debloating Rust programs
Adversarial rebuild
Air-gapped software builds
Software integrity with runtime SBOMs
Embedding the software supply chain at runtime with Java classloaders
Ultra small code with GraalVM and debloating
Full-stack debloating for a video conferencing system
Reproducible builds for Maven
Leveraging the diversity of bundlers for debloating JavaScript applications
Automatic specialization of the Java Runtime (JRE)
Systematic decompilation in the CI to mitigate supply chain attacks
Detecting superfluous conflicts in Java projects
API specialization in Kotlin
Software testing
Automatic test generation for Rust
Forging test results to tamper with open-source projects
Test Generation for Ethereum Clients Using Production Data
Code Coverage in Production
Live analysis of Webassembly in the browser
NumPy: boosting the test suite of the Python numerical analysis package
Automatic synthesis of Java Mock objects based on Production observations
Effectiveness of diverse coverage tools for Java
Amplifying Kotlin library test suites with client usages
From JSON to Java records
Software diversification
Diverse-double compilation for jit compilers
Neural diversification
Using generative AI to improve software substitutability
Build Integrity with N-Version Continuous Integration
Diverse execution environments with infrastructure as code
Github copilot for automatic diversity
Diversifying a npm registry
Polymorphing GraphQL queries
Diverse Multi-compilation for Trusting trust
Java – Kotlin translation to diversify bytecode
Automatic generation of 1 Million libc
Automatic synthesis of diverse replacements for Java expressions
Superdiversifying SHA256
Automatic diversification of Kafka
Off the beaten track
The software supply chain of creative coding
Code by singing in eso-lang
Github repositories with literary references
The anatomy of the most Enterprise email client
Easter egg VM flag
Web stalker: deconstructing modern browser technology (remix)
Paint Splatters & Perl Programs (remix)
Software hardening
Fixing yanked releases in npm
Package managers such as npm, rubygems, or cargo support ‘yanking‘ a specific release of a package. This can be for for security or legal reasons, or even as a form of protest [3]. Meanwhile, all projects that depend on that release will fail to build when a release is yanked, which can have some catastrophic consequences when the release is massively used [3]. A recent study shows that 9.6% of the packages in Cargo have at least one yanked release [1]. In this project, we analyze the top 10000 npm packages by downloads to determine the amount of yanked releases in the npm ecosystem. We also analyze how the dependen projects fix their build when these releases are yanked.
- 1. An Empirical Study of Yanked Releases in the Rust Package Registry
- 2. Deprecation of packages and releases in software ecosystems: A case study on npm
- 3. How one developer just broke Node, Babel and thousands of projects in 11 lines of JavaScript
Debloating Rust programs
The Rust and Cargo ecosystem is growing for low-level programming, thanks to Rust’s memory safety and comprehensive compiler. It is now a language of choice to develop Kernel features or embedded systems. These applications have drastic constraints on size and it is important that Rust programs are as small as possible.
In this work, we develop novel techniques to reduce bloat in Rust programs, starting from debloating unnecessary crates at build time.
- Guided Feature Identification and Removal for Resource-constrained Firmware
- Set the configuration for the heart of the OS: on the practicality of operating system kernel debloating
- A comprehensive study of bloated dependencies in the maven ecosystem
Adversarial rebuild
Reproducibe builds is an essential concept to ensure the integrity of the software supply chain [1]. Yet, setting up an automated pipeline for reproducible builds is extremely challenging because of the numerous platform specificities or randomness sources that can occur in a build. Lamb and Zacchiroli introduced the concept of ‘adversarial rebuild’, which aims at assessing whether a build is actually reproducible. This concept has been implemented for Debian with the reprotest tool that builds the same source code twice in different environments, and then checks the binaries produced by each build for differences.
The objective of this work is to determine what are the most sensitive environmental changes that can perturb a reproducible build. We will collect a set of projects that have set up a reproducible build pipeline. Then, we will explore diverse environmental changes and study their effect on the reproducibility of the build.
- 1. Reproducible Builds: Increasing the Integrity of Software Supply Chains
- 2. reprotest for Debian
- 3. Reproducible build Maven
- 4. Bit-for-bit deterministic / reproducible builds for rustc
- 5. Improving Trust in Software through Diverse Double-Compiling and Reproducible Builds
Air-gapped software builds
Supervisors: Benoit Baudry, Martin Monperrus, KTH Royal Institute of Technology
Air-gapped software development is done by the military and similar highly sensitive environment. Modern software builds typically require Internet connectivity, and a typical build involves thousands of network requests. How to reconcile those opposite requirements? In this thesis, you will design, implement and evaluate an infrastucture for air-gapped software builds.
- 1. Software development challenges with air-gap isolation
- 2. Building a virtually air-gapped secure environment in AWS: with principles of devops security program and secure software delivery
Software integrity with runtime SBOMs
Supervisors: Benoit Baudry, Martin Monperrus, KTH Royal Institute of Technology
A software bill of material (SBOM) is an inventory of the software components that are reused in an application; e.g third-party libraries. With the growing awareness about the risk of software supply chain attacks, several standards have emerged to compute the static SBOM of an application. This is essential to identify the presence of risky components in the supply chain. Yet, malicious components can be introduced through the compilation and deployment phases. In this project, we investigate the feasibility of collecting the runtime SBOM of an application to mitigate this risk. The student will experiment with and contribute to jbom [2] to provide a sound technique to detect discrepancies between the static and the dynamic SBOM
- [1] SBOM. https://ntia.gov/page/software-bill-materials
- [2] runtime SBOM for Java apps. https://github.com/eclipse/jbom
Embedding the software supply chain at runtime with Java classloaders
Supervisors: Benoit Baudry, Martin Monperrus, KTH Royal Institute of Technology
In Java, class loading refers to retrieving the binary form of a class or interface and constructing, from that binary form, a class object to represent the class or interface [1]. Today, different subclasses of the `ClassLoader` may implement different loading policies [2]. For example, a class loader may cache the binary representation of a class, prefetch it based on expected usage, or load a group of related classes together. These activities may not be completely transparent to a running application. In this context, determining the third-party suppliers of classes loaded at runtime allows for controlling and hardening the software supply chain of third-party components used during program execution. Monitoring the origins of the “actually” executed code is a critical task for building more reliable and secure systems. The student will design and implement a novel software tool to build a representation of the software supply chain at runtime.
- The Java Virtual Machine Specification. Chapter 5. 01182103
- Sharing the runtime representation of classes across class loaders
Ultra small code with GraalVM and debloating
GraalVM compiles Java code to native, boosting deployment and runtime performance. Meanwhile, code debloating [2] removes unnecessary code from applications, reducing code size and attack surface. Both techniques are actively researched in the Java ecosystem[2,3]. In this work, we will you use both techniques in conjunction to take code reduction one step further. We will experiment with debloating before, as well as after the GraalVM compilation to understand where the largest code size savings can be performed. Quarkus [4] might be used to reduce one more step.
- [1] GraalVM
- [2] A Longitudinal Analysis of Bloated Java Dependencies
- [3] Enhancing Performance of Cloud-based Software Applications with GraalVM and Quarkus
- [4] Quarkus
Full-stack debloating for a video conferencing system
Software bloat is data and code that accumulates over time and yet is not necessary for an application to behave correctly. Several techniques have been proposed over the last years to detect and remove bloat. These techniques complement each other since they analyze bloat at different levels of the software stack (libraries, containers, kernel, etc.). Yet, no previous work has studied the combined effect of these techniques
For this thesis you will apply different debloating techniques such as DepClean [1], docker-slim [2] and unikernels [3]. You will measure the effects of each technique and their combination on the jitsi video conferencing system.
- [1] A comprehensive study of bloated dependencies in the Maven ecosystem
- [2] docker-slim
- [3] unikernels
- [4] It’s Time to Debloat the Cloud with Unikraft
Reproducible builds for Maven
Supervisors: Benoit Baudry, Martin Monperrus, KTH Royal Institute of Technology
Reproducible builds is an essential property for secure software supply chains [1]. There is ongoing effort in some Linux distributions, in particular Debian, to ensure reproducible builds [2]. In the Java world, there is little work on this topic and no clear understanding of the problem. You will design, perform and analyze an experiment to assess the status quo of reproducible builds in Java and a tool to improve build reproducibility.
- [1] Reproducible Builds: Increasing the Integrity of Software Supply Chains
- [2] Reproducible builds in Debian with diffoscope
- [3] Maven and reproducible builds
- [4] Towards Build Verifiability for Java-based Systems
Leveraging the diversity of bundlers for debloating JavaScript applications
JavaScript is the most used programming language for the development of web applications. Once the web application grows, so does the bundle size, primarily due to all its third-party dependencies [1,2]. A bundler is a tool that transforms all the JavaScript code and its dependencies into a new output file with everything merged (including other files such as HTML, CSS, and PNG). There are many production-ready JavaScript bundlers (e.g., Webpack, Rollup, Browserify, ESbuild, and Parcel). They can perform optimizations and minifications on the bundle, such as tree shaking, scope hoisting, bundle splitting, and minifying [4]. However, the size reduction achieved by a bundler is limited by its own code minimization technique [3]. The student will perform an experimental study to leverage the diversity of JavaScript bundlers in order to reduce the original code size of applications while keeping the functionality required to pass all test cases in their test suites.
- [1] Slimming JavaScript Applications: An Approach for Removing Unused Functions From JavaScript libraries (JSS), 2019
- [2] Evolving JavaScript Code to Reduce Load Time (TSE), 2021
- [3] Stubbifier: Debloating Dynamic Server-Side JavaScript Applications (ArXiv), 2021
- [4] https://webpack.js.org/guides/tree-shaking/
Automatic specialization of the Java Runtime (JRE)E
The Java Runtime Environment (JRE) is a great, general purpose execution engine, which provides the standard Java libraries.
Because it is general purpose, it offers too much functionality, when considering only one Java application that runs in the JRE.
You will design and experiment with a system that automatically specializes the JRE for a specific Java application, using jcov [1] to identify the parts that are necessary and the parts that can be removed. This topic contributes in hardening the software supply chain through debloating [2] and specialization of the software stack [3].
- [1] JCov
- [2] JRed: Program Customization and Bloatware Mitigation Based on Static Analysis
- [3] Unikraft: Unikernels Made Easy
Systematic decompilation in the CI to mitigate supply chain attacks
Supervisors: Benoit Baudry, Martin Monperrus, KTH Royal Institute of Technology
Supply chain attacks [1] represent a growing threat on software systems, as illustrated by the Solar Winds attack in late 2020 [2].
One of these attacks consist in tampering with the code at one point in the automatic build pipeline, in order to inject malicious code into the binary.
In this work, we investigate the systematic disassembly of binary [3], at the end of the build pipeline, to detect the injection of malicious code injection.
- [1] Supply chain attacks
- [2] Preventing Supply Chain Attacks like SolarWinds
- [3] An in-depth analysis of disassembly on full-scale x86/x64 binaries
API specialization in Kotlin
Supervisors: Benoit Baudry, Cesar Soto-Valero, KTH Royal Institute of Technology
Software applications rely on numerous third-party APIs to reuse existing features (e.g., data processing, security, network, etc.).
Yet, applications use only a small part of the APIs.
The unused parts represent unecessary risks for the security and reliability of the applciation.
In this project, we investigate API specialization to mitigate these risks [1].
This technique first determinines what are the legitimate usages of an API, to build a sense of self [3] for the application API usage.
Then, the specialization consists and in building a proxy that blocks all other API usages at runtime.
This project focuses on specialization for Kotlin APIs [2].
- [1] Shredder: Breaking Exploits through API Specialization
- [2] Why did developers migrate Android Applications from Java to Kotlin
- [3] A sense of self for Unix processes
https://github.com/topics/fake
Software testing
Automatic test generation for Rust
The popularity of the Rust programming language is constantly growing in various sectors, from embedded systems to creative coding. Meanwhile there is little support for automatic test generation in Rust. In this work, we evaluate the robustness of state of the art solution [1,2]. Then, we develop novel techniques to automatically enhance the test suites of Rust programs with variant test cases written in the idomatic way of Rust automated tests [4].
- [1] Search-Based Test Suite Generation for Rust
- [2] Syrust: automatic testing of rust libraries with semantic-aware program synthesis
- [3] RULF: Rust library fuzzing via API dependency graph traversal
- [4] Writing Automated Tests in Rust
Forging test results to tamper with open-source projects
The large open source software supply chains of many applications have turned open source repositories into targets of choice for the introduction of malicious code [1]. As mature open source projects use continuous integration, stealthy code tampering should also ensure that the test suite passes. While the modification of the test suite might appear as red flag to the open source community, another solution consists in forging the test results [2]. For example, a change in the continuous integration pipeline can turn some failing test cases into passing ones.
In this work, we investigate different strategies to forge test suite results in order to mask ill-intended changes in the source code.
- [1] On Omitting Commits and Committing Omissions: Preventing Git Metadata Tampering That (Re)introduces Software Vulnerabilities
https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/torres-arias - [2] in-toto: Providing farm-to-table guarantees for bits and bytes
https://www.usenix.org/system/files/sec19-torres-arias.pdf
Test Generation for Ethereum Clients Using Production Data
Supervisors: Martin Monperrus, Benoit Baudry
Description: Unit testing is one of the essential ways to improve the quality of software It is also helpful for correctness checking when there are different implementations based on the same software specification. Let us take Ethereum clients as an example, there are thousands of common tests [1] provided for all the Ethereum client projects. Though these tests have already cover various cases, there are corner cases in production that are missing in the test suite [2]. In this thesis project, you will design, implement and evaluate a prototype that collects production data and generate new valuable test cases for Ethereum clients.
Code Coverage in Production
Supervisors: Martin Monperrus, Benoit Baudry
Description: Code coverage usually relates to test code. Production code coverage is the coverage over real interactions made by users in production. Obtaining and analysing production code coverage enables to identify useless code as well as relevant test data and values. It enables testers and developers to better align the test intentions with what matters for users. The student will compare and analyze techniques for automatically collecting code coverage in production for Java software.
- Code Pulse: Real-time code coverage for penetration testing activities
- Measuring production code coverage with JaCoCo
- Perpetual testing
Live analysis of Webassembly in the browser
Supervisors: Benoit Baudry, Javier Cabrera-Arteaga, KTH Royal Institute of Technology
Webassembly is rapidly conquering the world of web technology [1].
Its safe and compact binary format provides great support to consolidate existing applications and to boost the migration of legacy apps to the browser [3].
In this project we will investigate what Webassembly binaries arrive in web browsers.
The project includes the development of efficient technology to collect wasm files live in the browser.
The second part consists in analyzing the live coverage of these files, as well as their purpose.
- [1] New Kid on the Web: A Study on the Prevalence of WebAssembly in the Wild.
- [2] Scalable comparison of javascript V8 bytecode traces
- [3] Adobe Photoshop in the browser thanks to Emscripten
NumPy: boosting the test suite of the Python numerical analysis package
NumPY is a fundamental package for scientific computing with Python, as well as an excellent illustration of state of the art software engineering [1]. For example, the NumPY community uses four different continuous integration systems [2]. Its crucial importance for science calls for a rock-solid test suite, in order to ensure the validity and reproducibility of scientific experiments.
You will dive deep into the test suite of NumPy and aim at making it stronger through a systematic assessment of the test cases. You will investigate the presence of pseudo-tested methods [3] and contribute test improvement to NumPy’s test suite.
- [1] Developing open source scientific practice
- [2] Science-Changing Code
- [3] A Comprehensive Study of Pseudo-tested Methods
Automatic synthesis of Java Mock objects based on Production observations
Mock objects are highly valuable to create predictable test environments, which speed-up test execution and limit flaky tests. Yet, the development of relevant mock objects is challenging, since there is currently no support to determine the validity or value of manually selected values for mocks.
You will design a system that observe an application in production in order to collect real program state values that will then be turned into mock objects. This system will leverage efficient observability technology [3] in order to contribute to the state of the art of automated test generation [1,2].
- [1] GenUTest: a unit test and mock aspect generation tool
- [2] Dygen: automatic generation of high-coverage tests via mining gigabytes of dynamic traces
- [3] Glowroot
Effectiveness of diverse coverage tools for Java
Code coverage is a key metric to assess test suite quality as well as to perform dynamic analyses [1]. Yet, there exist a variety of test coverage tools, each with their strengths and quirks [1,2].
You will design and perform a systematic analysis of the main coverage tools for a specific programming language, e.g. Java [2,3,4], in order to determine which is the most appropriate combination of tools for the most accurate measurement of full coverage.
- [1] Trace-based Debloat for Java Bytecode
- [2] Code Pulse: Real-time code coverage for penetration testing activities
- [3] jacoco
- [4] JCov
From JSON to Java records
Pankti records program states in production in order to generate differential unit tests that can improve the original test suite of an application [1]. Currently, the states are serialized in JSON, then the generated test includes instructions to deserialize the objects. In this thesis, you will investigate how to generate Java records [2] as part of the test harness. This will make more readable test cases that are not overloaded with deserialization instructions. Java records were introduced in Java 14, and aim to simplify the way we create a POJO (Plain Old Java Objects).
Amplifying Kotlin library test suites with client usages
Third-party libraries are at the core of the software supply chain [1]. Their test suites are essential to ensure the quality of this infrastructure.
One solution to consolidate these test suites consists in carving additional test cases by running the clients of these libraries [2].
You will design, implement and evaluate a test carving tool for Java libraries [3].
- [1] Surviving Software Dependencies
- [2] Carving Differential Unit Test Cases from System Test Cases
- [3] Analyzing 2.3 Million Maven Dependencies to Reveal an Essential Core in APIs
Software diversification
Diverse-double compilation for jit compilers
Just-in-Time (JIT) compilation plays a crucial role in optimizing the performance of modern software programs. However, there are also targets for trusting trust attacks. This thesis aims to investigate the benefits of a diverse-double compilation (DDC) approach to mitigate those attacks. You will design, implement and evaluate DDC for a Java JIT compiler.
- Reflections on Trusting Trust
- Countering trusting trust through diverse double-compiling
- Diverse Double-Compiling to Harden Cryptocurrency Software (Master’s thesis KTH 2023)
Neural diversification
Supervisors: Javier Cabrera, Benoit Baudry, Martin Monperrus
Automatic code generation is boosted by generative AI and large langage models [1]. These new abilities are used daily, letting software developers focus on the design and creative parts of development. In this work, we are interested in ability of these models to generate mutliple variants of the same functionality. The goal is to revisit program synthesis for automatic software diversification [2], through the lens of generative AI.
- 1. MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation
- 2. Searching for software diversity: attaining artificial diversity through program synthesis
- 3. Building diverse computer systems
Using generative AI to improve software substitutability
Supervisors: Javier Ron, Benoit Baudry, Martin Monperrus
Software substitutability is a property which measures how readily a
software component can be replaced by a different but equivalent component [1].
In software supply chains it is critical for faulty or vulnerable
components to be replaced as quickly as possible. However, software
substitutes might not be immediately available.
Generative AI tools like ChatGPT may be used to efficiently produce
software substitutes in diverse programming languages/paradigms [2].
In this work, we assess the feasibility of using generative AI tools to
enhance substitutablity of components in software supply chains.
- [1] Better Together? An Evaluation of AI-Supported Code Translation
- [2] Formalization of Component Substitutability
Build Integrity with N-Version Continuous Integration
Some software projects with strong reliability and security constraints build their product with more than one build pipeline. This also an approach to address the challenge of trusting trust [1]. For example, the NumPy open source project for scientific computing uses four continuous integration systems [2]. Following an attack against its Orion product, the Solarwinds company started using diverse build systems [3]. In this work, the student will experiment with integrating diversity in existing build pipelines. For example, the student will investigate duplicating a Travis CI pipeline with Github actions and assess the impact of this diversity of build technology.
- [1] Reflections on Trusting trust
- [2] Developing open source scientific practice
- [3] Orion build system
- [4] Improving the n-version programming process through the evolution of a design paradigm
Diverse execution environments with infrastructure as code
Infrastructure as code is about provisioning execution resources through executable configuration files [1]. In this context, the execution of program provisions a whole environment to execute an application. A variation of the same program will provision a different environment to run the same application. In this project the student will explore transformations for infrastructure as code with the intention of creating a moving target at the environment level [2]. We consider using Modus to define the infrastructure [3].
- [1] What is Infrastructure as Code (IaC)?
- [2] Finding focus in the blur of moving-target techniques
- [3] https://github.com/modus-continens/modus
Github copilot for automatic diversity
Github copilot, a.k.a an AI pair programmer, generates suggestions for lines of code, or entire functions [1]. It is based on an immense set of code written by human developers in order to synthesize new code in a new context. In this work, we wish to experiment these techniques in order to replace existing code snippets written by developers by synthetic ones. The objective to to generate program variants that are semantically similar but which executions are different.
- [1] https://copilot.github.com/
- [2] CodeHint: Dynamic and Interactive Synthesis of Code Snippets
- [3] Oracle-guided component-based program synthesis
Diversifying a npm registry
Supervisors: Benoit Baudry, Martin Monperrus, KTH Royal Institute of Technology
Dependency confusion is a growing threat for software supply chain [1]. This attack consists in uploading malicious packages on public repositories, which will eventually be packaged in applications, through dependency resolution mechanisms. In this work, we will explore the automatic randomization of instructions [3] in private npm registries to mitigate dependency confusion [2]. The student will deploy a local npm registry and a instruction randomization scheme, along with the adaptation of the javascript engine to correctly execute the randomized packages.
- [1] Dependency Confusion: Another Supply-Chain Vulnerability
- [2] Polyscripting to mitigate dependency confusion
- [3] Countering code-injection attacks with instruction-set randomization
- [4] Internal interface diversification with multiple fake interfaces
Polymorphing GraphQL queries
Supervisors: Benoit Baudry, Martin Monperrus, KTH Royal Institute of Technology
GraphQL is increasingly adopted for web APIs [1], making it a good target for exploits [2]. In this work investigate polymorphing to harden GraphQL APIs [3]. The student will develop a randomization scheme for the API and the corresponding adaptation of the client queries in order to build an effective protection against injection attacks.
- [1] Semantics and complexity of GraphQL
- [2] The 5 Most Common GraphQL Security Vulnerabilities
- [3] SQLrand: Preventing SQL Injection Attacks
Diverse Multi-compilation for Trusting trust
Supervisors: Benoit Baudry, Martin Monperrus, KTH Royal Institute of Technology
The problem of deceptive compilers introducing malicious code is relevant and hard [1,2]. One solution for this is to use multiple diverse compilers to mitigate the problem [3]. For instance, one can compile a C program with both GCC and CLANG. You will design, implement and evaluate a multi-compiler scheme for C.
- [1] Reflection on Trusting trust 1983
- [2] Defending Against Compiler-Based Backdoors
- [3] Countering trusting trust through diverse double-compiling
Automatic generation of 1 Million libc
libc is at the core of most software stacks, but it is fragile, prone to critical vulnerabilities [1]. In this work we explore a combination of techniques to generate large amounts of diverse implementations of libc [2]. The student will combine the abundant combinations of flags of C compilers [3], with state of the art code transformation and obfuscation techniques [4] to generate many libs variants.
- [1] The C standard library
- [2] Building diverse computer systems
- [3] gcc flags
- [4] Tigress
Java – Kotlin translation to diversify bytecode
Supervisors: Benoit Baudry, Martin Monperrus, KTH Royal Institute of Technology
The transition from Java to Kotlin is timely and hard problem [1].
In this work, we explore the natural diversity of translation strategies from Java to Kotlin [2], as well as the diversity of compilation options of koltinc [3] and javac [4]. The goal is to assess the ability of these strategies to generate diverse versions of Java bytecode for the same piece of source code.
- [1] Measuring Kotlin Build Performance at Uber
- [2] On the adoption, usage and evolution of Kotlin features in Android development
- [3] koltinc compilation flags
- [4] javac compilation flags
Superdiversifying SHA256
Software diversity increases the robustness of software systems [1]. Through various transformations and randomization, it is possible to automatically generate variants of a program. These variants should have minimal impact on convenience, usability, and efficiency. Meanwhile, each variant should not be sensible to the same bug or vulnerability.
In this project, we explore the large-scale diversification of SHA256 [2]. This family of hashing functions is essential for cryptography, and hence a critical feature for security. The student will investigate superdiversification [3] and the composition of multiple diversification techniques, in order to synthesize large amounts of variants for an implementation of SHA256.
- [1] S. Forrest, A. Somayaji and D. H. Ackley, “Building diverse computer systems” Proceedings. The Sixth Workshop on Hot Topics in Operating Systems
- [2] On the Secure Hash Algorithm family
- [3] Jacob, M., Jakubowski, M. H., Naldurg, P., Saw, C. W. N., & Venkatesan, R. (2008, November). The superdiversifier: Peephole individualization for software protection. In International Workshop on Security
Automatic diversification of Kafka
Automatic software diversity consists in generating multiple variants of an application, which provide the same functionality, with diverse implementations.
The goal is to minimize the risks of having a single point of failure.
In this project, we aim at automatically synthesizing diverse variants of applications that stream data with Kafka [1]. Diversification will be on Kafka itself, e.g., build the application with different versions of Kafka. We will also leverage the natural emergence of the Kafka compatible streaming library, Redpanda [2].
- [1] Kafka
- [2] redpanda
- [3] The multiple facets of software diversity: Recent developments in year 2000 and beyond
Off the beaten track
The software supply chain of creative coding
Artists use advanced software technology to produce, distribute and generate artworks. Such software technology includes libraries for sound synthesis [1], visual art[2,3], augmented reality [4], as well as platforms to distribute artworks [5,6]. In this work, we dive deep in this software ecosystem to draw a systematic landscape of the software supply chain [7] for creative coding.
- [1] Supercollider
- [2] OpenFrameworks
- [3] Processing
- [4] Augmented Reality APIs
- [5] artblocks
- [6] fxhash
- [7] The Evolution of Project Inter-dependencies in a Software Ecosystem: The Case of Apache
Code by singing for eso-lang
The progress of voice recognition and speech-to-text technology is fabulous. It opens the way towards, coding by voice, a very promising advance to open the world of programming to a wider population [1].
In this thesis, we will explore the possibilities of writing code by singing. This master thesis at the intersection of software technology, signal processing and rickrolling will be disseminated as part of a growing eso-lang [2].
- [1] Speaking in code: how to program by voice. Nature, 2018
- [2] rickroll-lang. Github, 2022
- [3] Listen and translate: A proof of concept for end-to-end speech-to-text translation. arXiv, 2018
Github repositories with literary references
Github repositories are rich sources of code, documentation and discussions. They also contain amazing resources such as images, sound snippets, texts or references. A recent study has analyzed the presence of links to academic papers in Github repositories [1]. This study reveals the critical importance of linking code, data and publications to improve replication in computational science. In this work we wish to explore literary references in Github. For example, references to Bob Dylan cited in C code or novel quotes in comments, perl -le’$_=`perldoc -T perlfaq4`,s/^.*N;(.*?)E.*$/$1/s,print’.
The study seeks to unveil the deep connection of Github with culture and society and to analyze the role of literature on software development.
- [1] GitHub Repositories with Links to Academic Papers: Open Access, Traceability, and Evolution
- [2] This AI researcher is trying to ward off a reproducibility crisis.
Anatomy of Outlook mail
Everyday we use extraordinary software objects. Examples of such objects include the Android mobile systems that run on billions of devices, the domain name system that runs the web, or the Outlook email client that lets millions of workers communicate efficiently. These objects are extraordinary in several respects: they are large, they are composed of hundreds of diverse software parts, they evolve fast, they exist in many versions that are tailored to various needs. The massive presence of such objects, as well as the very large dimensions that characterize them are intriguing for software developers and for users. One approach to unveil the extraordinary nature of these objects consists in breaking down all of its components turning into an anatomical analysis of the object [1,2].
In this work, we aim at building a fine-grained anatomy of an extraordinary, extremely popular software object: the Outlook email client.
- [1] Anatomy of an AI system
- [2] Anatomy of a URL and the DNS process
- [3] Microsoft Research Detours Package
Easter egg VM flag
Easter eggs, sometimes called the final frontier of software development [10]. (Except that of course you can’t have a final frontier, because there’d be nothing for it to be a frontier to, but as frontiers go, it’s pretty penultimate . . .) [269696]. And against the wash of continuous integration a commit hangs, bloated and poetic, one single, cool contribution, gleaming like the madness of gods. Nearly unreal. Reality is not digital, an on-off state, but analog. Easter eggs are for lovers and for the mind. Not enterprise, nor a resurrection, they cherish enchantment and freedom. In the quest for technology and Mastery, you will add an extra mile to the frontier with a new Easter flag for an extraordinary virtual machine [42].
[42] java -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version
[10] Curated list of all the easter eggs and hidden jokes in Python.
[269696] Moving Pictures. T. Pratchett. 1990, on Monday afternoon, just before tea.
Web stalker: deconstructing modern browser technology (remix)
In 1998, Simon Pope, Colin Green and Matthew Fuller designed the Web Stalker, an alternative web browser that displays the structure of web pages instead of its content [1]. The work was motivated by a strong motivation to understand what happens beyond the screen and to let web users experience this understanding. Twenty years later, the adoption of the web has massively radiated in all aspects of our lives and the complexity of web browsers has exploded.
This project is about rethinking a web stalker in the era of modern web browsers, going from the design of a solution that leverages the architecture of these browsers [2] to the implementation of an artistic representation of web pages content based on Electron [3].
- [1] The Web stalker
- [2] How browsers work
- [3] Electron JS
Paint Splatters & Perl Programs (remix)
In 2019, Colin Mc Millen and Tim Toady ran an experiment to answer one question: is it possible to smear paint on the wall without creating valid Perl? This is an essential question at the forefront of art / computing frontier.
In this project, we will reproduce Mc Millen’s experiment [1], starting with the curated dataset provided by the authors [2]. We will then elaborate on the findings with original splatters and an exploration of Perl’s diverse ecosystem [3].