Data Science Centre Member Spotlight: Leon van Wissen

4 April 2023

The Spotlight introduces a different Data Science Centre Affiliate Member every month. This month: Leon van Wissen, Data Engineer at the Faculty of Humanities, University of Amsterdam. Leon is also a member of the CREATE Lab, which offers digital research support services for researchers in the faculty.

How do you apply data science to your projects?

My background in Dutch literature and computational linguistics helps me to extract, organise, and analyse large amounts of information from Dutch cultural heritage sources, and allows me to bridge the gap between 'traditional' humanities research and data science.

Is there a project from this past year that you are most proud of?

I am currently part of the GLOBALISE project, which started last year. The project's main objective is to open up the Dutch East Indies' (VOC) archives with the help of computer vision, natural language processing, and knowledge engineering techniques.

When completed, this project will be an invaluable resource for scholars and anyone interested in the history of the VOC. Not only will it enhance the discoverability of people, places, and other entities in the archives, but it will also facilitate a more data-driven approach to studying these in their context. Given the current debates around contested history and decolonisation initiatives, this project is particularly timely.

What do you like most about being a DSC member?

The methods we use to extract, structure, and analyse data remain largely the same regardless of the material you work with. Being part of a community that can offer assistance when encountering methodological roadblocks or discovering new ways to handle data is useful, as well as the DSC's events and workshops.

To give an example, I recall the keynote presentation given by Anne Beaulieu at the Data Science Day 2022 on 'interfaces', which helped me to think beyond developing traditional data browsers, but to look for more creative and meaningful solutions for end-users.

What is your favourite data science method?

As someone who primarily works with textual sources, I rely heavily on Natural Language Processing (NLP) techniques. However, this can be more challenging when dealing with materials from the 17th-19th century due to digitisation errors like OCR/HTR mistakes or inconsistencies in orthography. Nevertheless, I get excited when I can link my findings to a knowledge base, contribute to one, or can make my data available for others in another way.

Are you camp Python/R/or something else?

Definitely Python, as I get the creeps from R's syntax. I also think that Python gives you a much more elegant way of programming, which helps in readability and its adoption in the (Digital) Humanities. Recently, though, I started to revalue building web applications when I discovered Svelte and Typescript. Trying more in Go and Julia is on my wish-list.