The Wikipedia social network is a weighted, large-scale network of roughly 800k nodes and 70M edges that is constructed from person mentions in the English Wikipedia corpus. Network construction is performed based on co-occurrences of person mentions within the entire Wikipedia corpus, and edges are weighted based on the distances of person mentions within the text, before they are aggregated over all co-occurrences of mentions to collapse parallel edges and create a simple network. The network is designed as a stand-in for large-scale real-world social networks that are otherwise unavailable at this time. Additionally, the network is enriched with community information (e.g., Wikipedia category membership of persons) and additional node attributes (e.g., Wikidata information about professions or the date of birth / death). The network has been shown to possess characteristic structural features of traditional small scale social networks, while its nodes correspond to contemporal and historic figures as opposed to currently used, anonymized large-scale social media data sets. The data set consists of four files:
Contents
wsn_edgelist-dicos.txt
Contains all edges of the Wikipedia Social Network. Note:
This is the only file that uses a whitespace as separator instead of a tab.
Format: <source person id> <target person id> <edge weight>
wsn_category-person.txt
Contains information of category membership for persons in the network.
Format: <category id> \t <category name> [\t <person id>]*
wsn_person-name-gender-birth-death.txt
Contains additional information for persons in the network
Format: <person id> \t <full name> \t <gender> \t <year of birth> \t <year of death>
wsn_person-occupation.txt
Occupation information for persons in the network. Note:
If a person has multiple occupations, it will occur in multiple rows.
Format: <person id> \t <occupation>
Download data as tar.gz (963 MB compressed, 2.7 GB uncompressed)
Detailed information about the construction of the network can be found in the original article below. If you use the data for research purposes, please cite this as the source:
Johanna Geiß, Andreas Spitz, and Michael Gertz.
Beyond Friendships and Followers: The Wikipedia Social Network.
In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015, Paris, France, August 25 – 28, 2015. 2015, 472–479
[DOI:10.1145/2808797.2808840] [BibTeX (DBLP)]