Datos:
Supervisor: Santiago Andrés Azcoitia, Departamento de Señales, Sistemas y Radiocomunicaciones
Fecha de inicio: 1 febrero, 2025
Requisitos: Estudiante de Grado en Ingeniería y Sistemas de Datos.
Solicitudes: Enviar CV y expediente académico a santiago.andres@upm.es antes del 7 de enero de 2025.
Background:
Spurred by the widespread adoption of AI / ML, ‘data’ is becoming a key production factor, comparable in importance to capital, land, or labour in an increasingly digital economy. In spite of an ever-growing demand for third-party data in the B2B market, firms are generally reluctant to share their information. This is due to the unique characteristics of ‘data’ as an economic good (a freely replicable, nondepletable asset holding a highly combinatorial and context-specific value). As a result, most of those valuable assets still remain unexploited in corporate silos nowadays.
However, there is already an ecosystem of companies that trade data over the Internet [1]. Some analysts have estimated the potential value of the data economy at $ 2.5 trillion globally by 2025 [2, 3], and the development of healthy data markets would be the key to making the most of AI/ML, which is expected to reach a market of $ 15-20 trillion in 2030 [4,5]. Not surprisingly, unlocking the value of data has become a central policy of the European Union, which also estimated the size of the data economy at 827C billion for the EU27 in the same period. Within the scope of the European Data Strategy, the European Commission is also steering relevant initiatives aimed at identifying relevant cross-industry use cases involving different verticals and at enabling sovereign data exchanges to realise them.
Objective
This Master Thesis aims to create a scraping tool to crawl and download information about data assets being offered in commercial data marketplaces like AWS, Snowflake, or DataRade, structure, and store it in a central repository. The tool will be developed using Python. The student will also carry out a first processing and an analysis of the downloaded information to provide some insights about commercial data markets, answering questions such as what kind of data is being offered, how sellers price the data, at what prices, how many data providers are using commercial data marketplaces, etc.
Methodology
This research will involve the design and development of a modular scraping tool able to attack at least 3 commercial data marketplaces and accommodate new platforms in the future. We will use statistics and quantitative analysis methods to analyse the information downloaded and, comparing to the state of the art [6], give an idea of the evolution of data marketplaces and the products offered by them.
Expected results
This Master Thesis is expected to produce a modular tool to scrape and get information about data marketplaces and to provide empirical evidence and insights into the situation of data markets. Optionally, the student will participate in writing a research paper to disseminate the results of the project.
[1] S. Andrés Azcoitia and N. Laoutaris, A Survey of Data Marketplaces and Their Business Models. ACM SIGMOD Record, 51(3), (Sep 2022), ACM, New York, NY, USA.
[2] N. Henke, J. Bughin, M. Chui, J. Manyika, T. Saleh, B. Wiseman and G. Sethupathy. The Age of analytics: Competing in a data-driven world. McKinsey Global Institute. Dec. 2016
[3] G. Micheletti; N, Raczko, C. Moise; D. Osimo, and G. Cattaneo. European DATA Market Study 2021–2023. IDC & The Lisbon Council. May 2023
[4] PWC Consulting. Sizing the prize What’s the real value of AI for your business and how can you capitalise? 2017
[5] J. Bughin, J. Seong, J. Manyika, M. Chui, and R. Joshi. Notes from the AI frontier: Modeling the impact of AI on the world economy. McKinsey Global Institute. 2018
[6] S. Andrés Azcoitia, Costas Iordanou, and N. Laoutaris. Measuring the Price of Data in Commercial Data Marketplaces. First ACM Data Economy Workshop at CoNEXT’22 (20