Springer Nature
Browse
1/1
4 files

CodeOntology OpenJDK8 Dataset

dataset
posted on 2017-09-12, 14:18 authored by Mattia AtzeniMattia Atzeni, Maurizio Atzori

Dataset extracted from the source code of OpenJDK 8: http://openjdk.java.net/, generated by using the CodeOntology parser.

This dataset is a breakdown in 4 different files of the dataset at: https://doi.org/10.5281/zenodo.579977

structuralInformation.nt - Structural information on source code: 1981108 triples

annotations.nt - DBpedia links: 309688 triples

sourceCodeLiterals.nt - Actual source code as literals: 134757 triples

comments.nt - Literal Comments: 105881 triples

The dataset includes different kinds of triples: structural information extracted from source code, DBpedia links generated from javadoc comments, actual source code as literals and literal comments.

Background:

The associated publication describes the development of CodeOntology as a community-shared software framework supporting expressive queries over source code. This dataset is the product of the CodeOntology parser, which is able to analyze Java source code and serialize it into RDF triples, applied to the source code of OpenJDK 8, gathering a structured dataset consisting of more than 2 million RDF triples. CodeOntology allows the generation of Linked Data from any Java project, thereby enabling the execution of highly expressive queries over source code, by means of a powerful language like SPARQL.

A tutorial video is available at https://youtu.be/bd6pvUDy8kA

More information at the CodeOntology website: http://codeontology.org/

History

Research Data Support

Research data support provided by Springer Nature.