Mapping the structure of science through clustering in citation networks : granularity, labeling and visualization
The science system is large, and millions of research publications are published each year. Within the field of scientometrics, the features and characteristics of this system are studied using quantitative methods. Research publications constitute a rich source of information about the science system and a means to model and study science on a large scale. The classification of research publications into fields is essential to answer many questions about the features and characteristics of the science system.
Comprehensive, hierarchical, and detailed classifications of large sets of research publications are not easy to obtain. A solution for this problem is to use network-based approaches to cluster research publications based on their citation relations. Clustering approaches have been applied to large sets of publications at the level of individual articles (in contrast to the journal level) for about a decade. Such approaches are addressed in this thesis. I call the resulting classifications “algorithmically constructed, publications-level classifications of research publications” (ACPLCs).
The aim of the thesis is to improve interpretability and utility of ACPLCs. I focus on some issues that hitherto have not received much attention in the previous literature: (1) Conceptual framework. Such a framework is elaborated throughout the thesis. Using the social science citation theory, I argue that citations contextualize and position publications in the science system. Citations may therefore be used to identify research fields, defined as focus areas of research at various granularity levels. (2) Granularity levels corresponding to conceptual framework. In Articles I and II, a method is proposed on how to adjust the granularity of ACPLCs in order to obtain clusters corresponding to research fields at two granularity levels: topics and specialties. (3) Cluster labeling. Article III addresses labeling of clusters at different semantic levels, from broad and large to narrow and small, and compares the use of data from various bibliographic fields and different term weighting approaches. (4) Visualization. The methods resulting from Articles I-III are applied in Article IV to obtain a classification of about 19 million biomedical articles. I propose a visualization methodology that provides overview of the classification, using clusters at coarse levels, as well as the possibility to zoom into details, using clusters at a granular level.
In conclusion, I have improved interpretability and utility of ACPLCs by providing a conceptual framework, adjusting granularity of clusters, labeling clusters and, finally, by visualizing an ACPLC in a way that provides both overview and detail. I have demonstrated how these methods can be applied to obtain ACPLCs that are useful to, for example, identify and explore focus areas of research.
List of scientific papers
I. Sjögårde, P., & Ahlgren, P. (2018). Granularity of algorithmically constructed publication-level classifications of research publications: Identification of topics. Journal of Informetrics. 12(1), 133–152.
https://doi.org/10.1016/j.joi.2017.12.006
II. Sjögårde, P., & Ahlgren, P. (2020). Granularity of algorithmically constructed publication-level classifications of research publications: Identification of specialties. Quantitative Science Studies. 1(1), 207–238.
https://doi.org/10.1162/qss_a_00004
III. Sjögårde, P., Ahlgren, P., & Waltman, L. (2021). Algorithmic labeling in hierarchical classifications of publications: Evaluation of bibliographic fields and term weighting approaches. Journal of the Association for Information Science and Technology. 72(7), 853–869.
https://doi.org/10.1002/asi.24452
IV. Sjögårde, P. (2022). Improving overlay maps of science: Combining overview and detail. Quantitative Science Studies. 3(4), 1097–1118.
https://doi.org/10.1162/qss_a_00216
History
Defence date
2023-06-09Department
- Department of Learning, Informatics, Management and Ethics
Publisher/Institution
Karolinska InstitutetMain supervisor
Koch, SabineCo-supervisors
Sundberg, Carl Johan; Ahlgren, Per; Waltman, LudoPublication year
2023Thesis type
- Doctoral thesis
ISBN
978-91-8017-025-3Number of supporting papers
4Language
- eng