Go back

Are open bibliometric data sources better than proprietary ones?


With the advancement of open science, the reliability of open bibliometric data providers compared with proprietary providers is becoming a topic of increasing importance. Proprietary providers such as Scopus, SciVal, and Web of Science have been criticised for their profit-oriented nature, their opaqueness, and lack of inclusiveness, notably of authors and works from the Global South. 

At the same time, non-commercial open science infrastructures and open-source software and standards are increasingly being recommended (cf UNESCO Recommendation on Open Science, EU Council conclusions of May 2023). Some institutions, such as CNRS, Sorbonne University and CWTS Leiden have started transitioning from proprietary data sources to open ones. 

The question often asked is: how do the open bibliometric databases compare with their well-established commercial counterparts? To contribute to this debate we share our recent experience at cOAlition S. 

Last year we tasked scidecode science consulting to carry out a study on the impact of Plan S on scholarly communication. We required that the results of work have to be disseminated with an open licence, including the data. Therefore our partners at scidecode had to make sure that the bibliometric data they used in their impact study was open and also good enough to carry out the needed analyses. 

To assess this, scidecode science consulting conducted an interesting experiment with the aim to prove that the quality of bibliometric references from open metadata sources is at least as good, if not better, as that provided by commercial entities. For this, scidecode selected a deliberately challenging benchmark: a set of publications from Fiocruz, a funder from the Global South with many non-English publications and many lacking DOIs.They then compared the coverage provided by OA.Works, a non-profit data provider using open bibliometric sources such as OpenAlex, Crossref, Unpaywall, etc., with the coverage provided by a commercial database. 

The authors concluded that the resulting data based on open bibliometric sources was more comprehensive and of better quality than the data based on sources provided by the commercial provider. In addition, the use of open source data allows scidecode to comply with the requirement set by cOAlition S to openly licence their results, which would not necessarily be the case if they had used a commercial provider. 

Read the full details and results of the study on the scidecode website.

Nora Papp-Le Roy

Nora is the cOAlition S Programme Manager, coordinating the work of the cOAlition S office at the European Science Foundation, and supporting research funders to implement Plan S and make full and immediate Open Access to scholarly publications a reality. She has 20 years of experience promoting international scientific cooperation, global policies, membership engagement and advocacy. She has previously worked at the International Science Council in the field of global science for policy, and at the OECD in the fields of Green Growth and Policy Coherence for Development.   She holds degrees in Economic Sciences, International Relations, EU affairs and Advocacy.