8 files

SocialLink: knowledge transfer between social media and linked open data

posted on 2017-10-18, 13:16 authored by Yaroslav NechaevYaroslav Nechaev, Francesco Corcoglioniti, Claudio Giuliano

This dataset contains canonical citations (DOIs) for the SocialLink dataset (15th May 2017 release), alignment data and code and entity data in .csv and .json format.

SocialLink is a publicly-available Linked Open Data dataset that matches social media accounts on Twitter to the corresponding entities in multiple language chapters of DBpedia. By effectively bridging the Twitter social media world and the Linked Open Data cloud, SocialLink enables knowledge transfer between the two: on the one hand, it supports Semantic Web practitioners in better harvesting the vast amounts of valuable, up-to-date information available in Twitter; on the other hand, it permits Social Media researchers to leverage DBpedia data when processing the noisy, semi-structured data of Twitter.

The SocialLink dataset is created by the SocialLink Pipeline, which aligns 271,000 DBpedia persons and organisations to their Twitter profiles via data acquisition, candidate acquisition and candidate selection phases.

Data files are stored in compressed .gz format that can be uncompressed using standard compression utilities. Diagrams are presented in .pdf format, .csv, .json and .java files can be accessed via text edit programs, .tql files can be accessed via MS SQL Server.

Format descriptions:


JSON file is a single array containing an object for each DBpedia entity with similar structure.

Where candidates property contain the list of candidate IDs for each entity, while scores property contains a confidence score for each candidate reported by our candidate selection algorithm.

twitter_id might be present in case a certain threshold is met (thresholds are selected according to the high F1 setup from our paper)


For each row of our CSV file contains info about a certain entity. Each row looks like this:


The columns contain the same data as in JSON format. If the Twitter ID can't be determined — 0 is used in the last column instead.

approach.pdf and rdf.pdf provide visual representations of the SocialLink pipeline and RDF alignments.

For more detailed information on the RDF modeling choices see the associated publication, while extensive documentation is available via the SocialLink website (url below), covering: (i) dataset scope, format, statistics, and access mechanisms; (ii) instructions for deploying and running the SocialLink pipeline to recreate the resource; (iii) example applications using the dataset; and, (iv) links to external resources like the GitHub repository and issue tracker.

Code: https://github.com/Remper/sociallink

SocialLink Website: http://sociallink.futuro.media/


Research Data Support

Research data support provided by Springer Nature.