Registration: VIA THIS LINK
Synthetic data refers to data which is the output of computational processes, such as simulations or generative models, rather than a recording of processes in the world. Steinhoff’s research argues that while synthetic data has diverse interesting dimensions including the epistemological, ontological and ethical, it should be fundamentally understood through a lens of Marxist political economy. Synthetic data has come to prominence amid unprecedented enthusiasm for AI–capital’s dream automation technology–with a requisite hunger for training data. Even as a new data work industry grows to feed this demand, AI capital complains of a data shortage. Defined in terms of the infinite goal of valorization, there can never be enough data by capital’s logic. Synthetic data is positioned as a replacement for humans in two respects: a) as data source, b) as data labour. I contend that synthetic data is the automation of the production of conditions for production in data-intensive capitalism. However it simultaneously calls into AI production new categories of labour such as the technical artist. Synthetic data is thus the real subsumption of data production, reconstituting it as a process immanent to the digital, but this has unintended side effects which run counter to this goal. While synthetic data is unlikely to live up to the delirious aspirations for it posited by capital, it is nonetheless leading to mutations in both the technical and social stacks of the AI industry which deserve further research.