Projects | swat.engineering

The arrival of GDPR has accelerated the attention for privacy. Who has access to what personal data of personal or clients?

Privacy, anonymization and data cleaning

The arrival of GDPR has accelerated the attention for privacy. Who has access to what personal data of personal or clients? How are personal data filtered from research data sets. How is internet traffic anonymized for detection of DDOS attacks?

In all these cases, there is a major gap between the legal requirements and their technical implementation.

We have designed and implemented Nescio, a domain-specific language for the description of anonymization policies. A policy describes which part of the data should be anonymized and how the sensitive parts should be hidden. The actual anomyzation is carried out by the Nescio implementation: it parses the data, finds the relevant parts in the data and anonymizes them.

Impact: Lawyers can formulate and review anonymization policies and they have a guarantee that the specified policy is actually implemented.
See also: Nescio Site
We offer: Tailored anonymization tools.

Forensic evidence is becoming more and more digital. Files on confiscated devices need to be analyzed and searched for relevant data.

Forensics and binary data analysis

Forensic evidence is becoming more and more digital. Files on confiscated devices (laptops, cameras, mobile phones) need to be analyzed and searched for relevant data. Unfortunately the amount of different data and message formats (jpeg, mpeg, word, pdf, …) is already overwhelming and rapidly growing. Manually implementing analysis tools for each format becomes prohibitive. Therefore we have created BIRD, a domain-specific language for describing and parsing binary data. From a description of a data format we automatically generate the corresponding parsing tool to analyze that specific data format.

An extensive overview of this approach is described in the dissertation by Jeroen van den Bos: Gathering Evidence: Model-driven software engineering in automated digital forensics.

Impact: Parse binary data, according to its format specification, without coding
See also: https://github.com/SWAT-engineering/bird/
We offer: Tailored tools for parsing binary data.

Today’s financial software is usually the result of decades of software development and evolution. That makes it hard to maintain.

Finance

Today’s financial software is usually the result of decades of software development and evolution. That makes it hard to maintain. That also makes it hard to guarantee functional and non-functional requirements. We have designed and implemented several domain-specific languages aiming at generation, coordination, querying and testing of financial products. They are characterized by: Complete separation of desired behaviour and implementation. Independent type checking, validation, verification and testing. Efficient code generation that integrates in target platforms.

For more details see Stoel, J, van der Storm, T, Vinju, J.J, & Bosman, J.W. (2016). Solving the bank with Rebel: on the design of the Rebel specification language and its application inside a bank. In ITSLE 2016 - Proceedings of the 1st Industry Track on Software Language Engineering, co-located with SPLASH 2016 (pp. 13–20).

Impact: Enable data analytics on data that could not be combined until now
We offer: Faster development of financial products with guaranteed functionality and behaviour

Based on mathematical modelling and stochastic simulation, we have built Seconds, a system for optimizing ambulance response times.

Healthcare logistics

Timely arrival of ambulances at the location of an incident can be a matter of life and death. Based on mathematical modelling and stochastic simulation, we have built Seconds, a system for optimizing ambulance response times. Shortly, Seconds will be deployment in several security regions in The Netherlands.

Impact: Lives are saved by more timely ambulances
See also: https://www.stokhos.eu/
We offer: Turnkey projects that fit our technical profile.

Data analytics becomes enabled on all available business data

Domain Data Analytics on Polystores

Essential business data is usually scattered over heterogeneous (relational, graph, elastic) databases. We are involved in designing and implementing a high-level query language for such polystores : a single query can be dispatched to the polystore and the results from the various databases are retrieved, integrated and returned in a uniform, standardized way. Bottom line: data analytics becomes enabled on all available business data.

Impact: Enable data analytics on data that could not be combined until now
See also: TYPHON project
We offer: Specialized, tailored, tools for data analytics on polystores

Improve the quality of embedded systems

Embedded systems

Many embedded systems have been implemented in C or C++ and become harder to maintain and modernize. We have been involved in various projects to address this: Analyze C++ code to extract component structure. Analyze C code to extract embedded state machines. Generating code from hybrid C++/DSL state machines.

Impact: Maintainable embedded systems of higher quality
We offer: Tailored analysis of software systems in a variety of languages.

Toward a common, open-source, Web-native language workbench

Next Generation Parsing for the Internet (NGPI)

The success of the internet was enabled by the development of new languages and data formats. Classical examples are HTML/CSS/JSON; modern ones are TypeScript/WASM/JSX. Thus, language development has driven internet innovation. However, developing new languages requires big investments and expertise. It makes language development for the internet inaccessible to most. Our long-term vision is to change this by democratizing language development technology. In particular, the NGPI project reinvents parsing in JavaScript (JS): transforming strings into JS data structures—parse trees—crucial for syntax highlighting, linting, compilation, etc.

Current parser generators that target JS engines are really hard to learn/use. To overcome these limitations, the NGPI project builds the first such tool based on the expressive GLL algorithm. It supports a unique combination of context-free grammars, type-safe parse trees, and error recovery. This enables users to easily define grammars for their own languages, generate parsers in WASM, and integrate them in clients and servers alike. Beyond the internet, via LLVM, this technology is also repurposable for other instruction sets (e.g., x86-64).

Impact: Democratize parser generation technology for the internet
We offer: Extensive expertise on parsers, language workbenches, and tool building

We have experience