CodeOntology OpenJDK8 Dataset
Dataset extracted from the source code of OpenJDK 8: http://openjdk.java.net/, generated by using the CodeOntology parser.
This dataset is a breakdown in 4 different files of the dataset at: https://doi.org/10.5281/zenodo.579977
structuralInformation.nt - Structural information on source code: 1981108 triples
annotations.nt - DBpedia links: 309688 triples
sourceCodeLiterals.nt - Actual source code as literals: 134757 triples
comments.nt - Literal Comments: 105881 triples
The dataset includes different kinds of triples: structural information extracted from source code, DBpedia links generated from javadoc comments, actual source code as literals and literal comments.
Background:
The associated publication describes the development of CodeOntology as a community-shared software framework supporting expressive queries over source code. This dataset is the product of the CodeOntology parser, which is able to analyze Java source code and serialize it into RDF triples, applied to the source code of OpenJDK 8, gathering a structured dataset consisting of more than 2 million RDF triples. CodeOntology allows the generation of Linked Data from any Java project, thereby enabling the execution of highly expressive queries over source code, by means of a powerful language like SPARQL.
A tutorial video is available at https://youtu.be/bd6pvUDy8kA
More information at the CodeOntology website: http://codeontology.org/