What’s the price of my data?

The first prototype of my data pricing tool is ready! Are you a data owner willing to know the value of your assets? Are you a data consumer sizing a sourcing operation? The data pricing tool estimates the market price of a data asset based on its metadata. It’s user-friendly and simple:

1) Just type in a description of the data product, and some metadata such as volume metrics (# people, # companies, #records…), the time scope, the delivery methods, format, etc.

2) The data pricing tool can return an exportable list of similar datasets offered in public commercial data marketplaces, or by specific data providers. These can feed in-depth market analysis, provide more reference prices, or identify data sellers of a specific type of assets.

3) The tool can provide an estimation of the asset’s market price based on the prices observed in the market and based on ML models trained with data stemming from my prior research. As the figure below shows, the prototype provides SHAP explanations of the features affecting the prediction, including the relevant words in the description that appear to be driving it up (in green) and down (in red) the price of the asset.

The SHAP explanation charts compare the average prediction for the training set – E[f(X)] – with the actual prediction for this product (f(X)). The tool can display SHAP explanations either by individual features – top-K most relevant features affecting the price -, or groups of features responding to the same aspect of the metadata. In the X-axis, the chart above shows the groups of features, and the bars reflect their impact on the asset price. Green bars show groups of features with multiplicative effects, whereas red bars show groups of features with divisive effects on the final prediction.

3) The tool can also display charts about the sensitivity of the predicted price to specific parameters, such as the volume of data supplied as the figure below shows:

Even though the prototype is based on my previous research on understanding the price of data in commercial data marketplaces, it includes new functionalities and technical artifacts. First, it trains new more accurate models and embeds sentence transformers to make the tool more generalizable. Second, a comprehensive XAI layer, and an operational log were added to facilitate debugging, allow for audits, and eventually to pursue fairness, accountability, and transparency of this tool.

Still, there is huge work ahead to make this first version of the prototype more robust and commercially feasible, let alone issues related to the business model and to ensure trustworthy and ethical AI. Not only am I scraping more updated data, but I am also testing some ideas to drastically improve and streamline the data collection process, and to squeeze the data readily available. Moreover, I am looking forward to publishing more scientific papers on the topic.

See a demonstration video here! And feel free to contact me for more details.


Posted

in

by

Comments

6 respuestas a “What’s the price of my data?”

  1. […] wait for the event? See the blog entry for the first prototype, or a demonstration video here! And feel free to contact me for more […]

    Me gusta

  2. […] The second session dealt with the data economy. Prof. Raul Castro Fernández exposed a novel overarching philosophical definition of the value of data, stressing the importance of specifying precisely what we understand by that value, a philosophical yet timely and needed presentation when many people still confuse it with the value of placing an online add in an advertising slot on a webpage. The talk clearly separated the value of data and the value of documents representing them, and distinguished between instrumental and intrinsic, prior and post, absolute and relative value of data. In the roundtable that followed. I found very interesting the market trends in the online advertising industry and the challenges it faces as a result of corruption in the measured KPIs, as Álvaro Mayol (TapTap Digital) pointed out. I loved the concept of data swamps (as opposed to data lakes) and the need to clean them to build new reliable use cases. Participants also pointed to fairness and bias as key challenges for data markets and AI/ML models. It was in this session that I showed a demo of my data pricing tool. […]

    Me gusta

  3. […] thrilled to present a live demo of the data pricing tool at Big Data Value Association Activity Group 66 meeting next Wednesday 30 April 2025. This meeting […]

    Me gusta

  4. […] I gave a brief overview about the research I conducted during my PhD and a demo of our data pricing tool as a first step towards an observatory of the data economy. Glad to see the talk triggered an […]

    Me gusta

  5. […] Data Price Prediction Tool», which outlines its architecture and key technical insights. The data pricing tool provides a market-based price reference for datasets using only their metadata, and aims to […]

    Me gusta

  6. […] I’m thrilled to share that I finally had the chance to distill the essence of our data pricing tool into a research paper. Our work, titled «An Interpretable Market-based Data Price Prediction Tool», has been accepted for presentation at the Data Economy Workshop DEC’25, co-located with the prestigious VLDB Conference. See a demo of the data pricing tool here. […]

    Me gusta

Replica a AI Horizons @UC3M – Santiago Andrés Azcoitia Cancelar la respuesta