I’m thrilled to share that I finally had the chance to distill the essence of our data pricing tool into a research paper. Our work, titled «An Interpretable Market-based Data Price Prediction Tool», has been accepted for presentation at the Data Economy Workshop DEC’25, co-located with the prestigious VLDB Conference. See a demo of the data pricing tool here.
The paper is already accessible at the 3rd Data Economy (DEC) Workshop proceedings webpage . I’ll be presenting it next Friday, September 5, at the Queen Elizabeth II Centre (QEII Centre) in London. A nice opportunity to engage with the data economy and database community and hold insightful discussions on the topic.
The paper outlines the architecture and key technical insights behind our tool, which provides a market-based price reference for datasets using only their metadata. A key contribution of this work is the introduction of new price regressors, developed with the invaluable collaboration of Alicia Cabrero as part of her Master’s thesis «Development and Evaluation of an Explainable Neural Network to Predict the Price of Data«. In this work, we managed to improve the accuracy and robustness of state-of-the-art price predictions and set a reliable architecture to accommodate new data in the future. Alicia successfully defended her Master’s thesis on July 2025 and got a 9/10.
This paper marks the first step in a broader initiative to build and operate a long-term observatory of the data economy. We’re currently focused on expanding and enriching our training data, streamlining data ingestion processes, and ensuring that our models stay updated with the latest market dynamics. We hope this workshop paper is just the beginning—stay tuned for a more comprehensive follow-up, with updated data, enhanced functionality, and deeper insights.
Abstract:
Artificial intelligence (AI) and machine learning (ML) are having a profound impact on the economy but require huge amounts of data, which is partially generated by increasingly digitalised organisations but often acquired from third parties. This has resulted in a rampant demand for data in emerging data that face daunting challenges derived from the nature of data as an economic good (freely replicable, non-rival) and its elusive value. Despite the appearance of data marketplaces (AWS, Snowflake, Nokia DM) aimed to facilitate data transactions, data holders find it difficult to set a price for their data assets in their go-to-market process, and data consumers have trouble estimating a fair price to pay for data.
This paper presents an interpretable market-based data pricing tool designed to help with these tasks by estimating the price of data based on the prices of real data products observed in commercial data marketplaces. Resorting to sentence transformers, neural networks, sensitivity analysis and a novel two-step SHapley Additive exPlanations (SHAP), not only does our tool provide insightful user-friendly reporting and interpretation of price predictions using different price schemes, but it also improves the accuracy, the robustness and the generalisability of state-of-the art (SOTA) models.
Acknowledgements
This work was partially developed during my time at IMDEA Networks and partially supported by the European Union’s HORIZON project UPCAST (101093216).

Replica a Meet us@ UPM Conecta – Santiago Andrés Azcoitia Cancelar la respuesta