{"id":4336,"date":"2024-10-04T13:27:51","date_gmt":"2024-10-04T11:27:51","guid":{"rendered":"https:\/\/ssr.upm.es\/?p=4336"},"modified":"2026-01-16T14:09:59","modified_gmt":"2026-01-16T13:09:59","slug":"information-retrieval-from-data-products-in-commercial-data-marketplaces","status":"publish","type":"post","link":"https:\/\/ssr.upm.es\/en\/2024\/10\/04\/information-retrieval-from-data-products-in-commercial-data-marketplaces\/","title":{"rendered":"Information retrieval from data products in commercial data marketplaces"},"content":{"rendered":"<div class=\"vc_row wpb_row vc_row-fluid\"><div class=\"wpb_column vc_column_container vc_col-sm-12\"><div class=\"vc_column-inner\"><div class=\"wpb_wrapper\">\n\t<div class=\"wpb_text_column wpb_content_element \" >\n\t\t<div class=\"wpb_wrapper\">\n\t\t\t<div align=\"right\">Responsable: <strong>Santiago Andr\u00e9s Azcoitia\u00a0<\/strong>[&#x73;&#x61;&#x6e;&#x74;&#x69;&#x61;&#x67;&#x6f;&#x2e;&#x61;&#x6e;&#100;&#114;&#101;&#115;&#64;&#117;&#112;m&#46;es]<\/div>\n\n\t\t<\/div>\n\t<\/div>\n<div class=\"vc_empty_space\"   style=\"height: 64px\"><span class=\"vc_empty_space_inner\"><\/span><\/div><\/div><\/div><\/div><\/div><div class=\"vc_row wpb_row vc_row-fluid\"><div class=\"wpb_column vc_column_container vc_col-sm-12\"><div class=\"vc_column-inner\"><div class=\"wpb_wrapper\">\n\t<div class=\"wpb_text_column wpb_content_element \" >\n\t\t<div class=\"wpb_wrapper\">\n\t\t\t<p><strong>Datos<\/strong><br \/>\nSupervisor: Santiago Andr\u00e9s Azcoitia, Departamento de Se\u00f1ales, Sistemas y Radiocomunicaciones<br \/>\nFecha de inicio: 1 de febrero de 2025<br \/>\nRequisitos: Estudiante de Grado en Ingenier\u00eda y Sistemas de Datos.<br \/>\nSolicitudes: Enviar CV y expediente acad\u00e9mico a <a href=\"m&#97;&#105;&#x6c;&#x74;&#x6f;&#x3a;s&#97;&#110;&#116;&#x69;&#x61;&#x67;o&#46;&#97;&#110;&#x64;&#x72;&#x65;s&#64;&#117;&#112;&#x6d;&#x2e;&#x65;s\">&#x73;&#x61;&#x6e;&#x74;&#x69;&#x61;&#x67;&#x6f;&#46;&#97;&#110;&#100;&#114;&#101;&#115;&#64;upm&#x2e;&#x65;&#x73;<\/a> antes del 7 de enero de 2025.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Background:<\/strong><br \/>\nSpurred by the widespread adoption of AI \/ ML, \u2018data\u2019 is becoming a key production factor, comparable in importance to capital, land, or labour in an increasingly digital economy. In spite of an ever-growing demand for third-party data in the B2B market, firms are generally reluctant to share their information. This is due to the unique characteristics of \u2018data\u2019 as an economic good (a freely replicable, nondepletable asset holding a highly combinatorial and context-specific value). As a result, most of those valuable assets still remain unexploited in corporate silos nowadays.<br \/>\nHowever, there is already an ecosystem of companies that trade data over the Internet [1]. Some analysts have estimated the potential value of the data economy at $ 2.5 trillion globally by 2025 [2, 3], and the development of healthy data markets would be the key to making the most of AI\/ML, which is expected to reach a market of $ 15-20 trillion in 2030 [4,5]. Recent studies revealed more than 2k data providers offering data products in commercial data marketplaces [6]. Even when there are already some standards like W3C\u2019s DCAT v3.0, neither the metadata describing data products in commercial data marketplaces follows any standard, nor respects a common structure. As a result, many features describing data assets (e.g., update frequency, delivery methods, volume of data being offered, etc.) are found in the plain language descriptions attached to data products in marketplaces.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Objective<\/strong><br \/>\nThis Master Thesis aims to use NLP models and techniques, including LLMs, to create a tool to structure the information stemming from the description of data products in commercial data marketplaces. The tool will be developed using Python. The student will also carry out an analysis of the resulting information to provide some insights about data products being offered across commercial data markets, answering questions such as what kind of data is being offered, how sellers price the data, at what prices, how many data providers are using commercial data marketplaces, etc.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Methodology<\/strong><br \/>\nThis research will involve the design and development of an information retrieval tool to structure information about data products based on their descriptions [6]. We will use prompt engineering to refine the queries to LLM models fed with data product descriptions in order to structure information on key features buyers demand knowing when purchasing data.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Expected results<\/strong><br \/>\nThis Master Thesis is expected to produce a modular tool to structure information about data products in data marketplaces, and provide empirical evidence and insights into the situation of data markets. Optionally, the student will participate in writing a research paper to disseminate the results of the project.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>[1] S. Andr\u00e9s Azcoitia and N. Laoutaris, A Survey of Data Marketplaces and Their Business Models. ACM SIGMOD Record, 51(3), (Sep 2022), ACM, New York, NY, USA.<br \/>\n[2] N. Henke, J. Bughin, M. Chui, J. Manyika, T. Saleh, B. Wiseman and G. Sethupathy. The Age of analytics: Competing in a data-driven world. McKinsey Global Institute. Dec. 2016<br \/>\n[3] G. Micheletti; N, Raczko, C. Moise; D. Osimo, and G. Cattaneo. European DATA Market Study 2021\u20132023. IDC &amp; The Lisbon Council. May 2023<br \/>\n[4] PWC Consulting. Sizing the prize What\u2019s the real value of AI for your business and how can you capitalise? 2017<br \/>\n[5] J. Bughin, J. Seong, J. Manyika, M. Chui, and R. Joshi. Notes from the AI frontier: Modeling the impact of AI on the world economy. McKinsey Global Institute. 2018<br \/>\n[6] S. Andr\u00e9s Azcoitia, C. Iordanou and N. Laoutaris, \u00abUnderstanding the Price of Data in Commercial Data Marketplaces, 2023 IEEE 39th Internatio<\/p>\n\n\t\t<\/div>\n\t<\/div>\n<\/div><\/div><\/div><\/div><div class=\"vc_row wpb_row vc_row-fluid\"><div class=\"wpb_column vc_column_container vc_col-sm-12\"><div class=\"vc_column-inner\"><div class=\"wpb_wrapper\"><\/div><\/div><\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Responsable: Santiago Andr\u00e9s Azcoitia\u00a0[san&#116;&#105;&#97;&#103;&#x6f;&#x2e;&#x61;&#x6e;&#x64;&#x72;&#x65;s&#64;u&#112;&#109;&#46;&#101;&#115;] Datos Supervisor: Santiago Andr\u00e9s Azcoitia, Departamento de Se\u00f1ales, Sistemas y Radiocomunicaciones Fecha de inicio: 1 de febrero de 2025 Requisitos: Estudiante de Grado en Ingenier\u00eda y Sistemas de Datos. Solicitudes: Enviar CV y expediente acad\u00e9mico a &#115;&#x61;n&#116;&#x69;a&#103;&#x6f;&#46;&#x61;&#x6e;&#100;&#x72;e&#115;&#x40;u&#112;&#x6d;&#46;&#x65;&#x73; antes del 7 de enero de 2025. &nbsp; Background: Spurred by the widespread adoption&hellip;<\/p>\n","protected":false},"author":10,"featured_media":4337,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[36],"tags":[],"_links":{"self":[{"href":"https:\/\/ssr.upm.es\/en\/wp-json\/wp\/v2\/posts\/4336"}],"collection":[{"href":"https:\/\/ssr.upm.es\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ssr.upm.es\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ssr.upm.es\/en\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/ssr.upm.es\/en\/wp-json\/wp\/v2\/comments?post=4336"}],"version-history":[{"count":2,"href":"https:\/\/ssr.upm.es\/en\/wp-json\/wp\/v2\/posts\/4336\/revisions"}],"predecessor-version":[{"id":4339,"href":"https:\/\/ssr.upm.es\/en\/wp-json\/wp\/v2\/posts\/4336\/revisions\/4339"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ssr.upm.es\/en\/wp-json\/wp\/v2\/media\/4337"}],"wp:attachment":[{"href":"https:\/\/ssr.upm.es\/en\/wp-json\/wp\/v2\/media?parent=4336"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ssr.upm.es\/en\/wp-json\/wp\/v2\/categories?post=4336"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ssr.upm.es\/en\/wp-json\/wp\/v2\/tags?post=4336"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}