Bachelor Theses on the Data Economy

Yesterday, two bachelor students successfully defended their theses on the monitoring of the emergent data economy. Congratulations!

Miguel Eleno García developed and implemented web scraping tools to monitor data markets. He updated my previous work on measuring the price of data in commercial data marketplaces, and built new scrapers to download information about data products and data providers in AWS, Snowflake, and DataRade. He received an 8,5/10. Well done!

Figure 1. Nº products downloaded from commercial data marketplaces

Overall, we found that both the number of data products and the number of data providers grew significantly from Nov 2021 to Mar 2025, as the chart above shows. Unsurprisingly, data markets exhibited substantial dynamism: only a few providers and data products from 2021 are still available now. We also compared statistics about data product prices, too. More information will be presented at the upcoming Data Economy Workshop@VLDB in London.

Álvaro Pérez Saldaña carried out an interesting project on desigining and developing a tool to retrieve information on commercial data products. Álvaro even created a video demonstration of the tool, and received a 9/10 for his excellent work.

Using as ground truth the dataset generated during my research, Álvaro worked with open-source large-language models (LLM) to automate the retrieval of relevant information about data products in their descriptions, returning a structured JSON file covering 67 metadata fields. He created specific prompts to extract them and evaluated the performance of six LLMs on this task. The figure below shows histograms of the models’ performance across metadata fields. We found that accuracy vary across fields, and that the performance on those fields registering the lowest accuracy can be improved by applying advanced prompting techniques.

Figure 2. Histograms of LLM performance on the different metadata fields

In conclusion, it was a pleasure working with both Álvaro and Miguel during the last months. Their contributions will support the development of a Data Economy Observatory at UPM, and I am very much looking forward to presenting their work on a full research paper soon.


Posted

in

by

Comments

Una respuesta a “Bachelor Theses on the Data Economy”

  1. […] de Telecomunicación, next Tuesday 18 November from 10 am. I’ll show a data pricing tool and the results of related Bachelor and Master Thesis by our students, and we’ll be presenting our paper the first measurement study on Starlink’s […]

    Me gusta

Replica a Meet us@ UPM Conecta – Santiago Andrés Azcoitia Cancelar la respuesta