This is a collection of event triples that were extracted from English news outlets and news outlets that publish in the English language. For this collection, news articles for the time period 2016-01-01 to 2016-12-31 were retrieved. Entities are of type actor and location and are automatically annotated in the text of the news articles with the Stanford NER toolset; dates were annotated using Heideltime. Triples were constructed from co-occurrences of entities within a window of at most 3 consecutive sentences.
Information about the extracted news event triple is stored in the following format:
field | description |
e_id | internal event ID |
e_url | news article url |
e_geo_text | text of extracted geo location |
e_geo_id | Wikidata ID of extracted geo location |
e_geo_sent | sentence number on news article |
e_actor_text | text of extracted actor |
e_actor_id | Wikidata ID of extracted actor |
e_actor_sent | sentence number on news article |
e_time_text | text of extracted time |
e_time_norm | normalized time |
e_time_sent | sentence number on news article |
Information about the news outlets is given in the table below. The number of news articles for each outlet is given for the year 2016, even though some start dates are just within the year 2016 as new outlets are added continuously as part of the project.
Outlet | Number of News Articles 2016 | Country | Start Date |
Reuters | 100349 | England | Jan, 2016 |
The Los Angeles Times | 51071 | USA | May, 2016 |
The Washington Post | 50674 | USA | May, 2016 |
Fox News | 43770 | USA | May, 2016 |
The Sydney Morning Herald | 31887 | Australia | May, 2016 |
The Independent | 30405 | England | Jan, 2016 |
The Huffington Post – US | 30049 | USA | May, 2016 |
The Telegraph | 29248 | England | Jan, 2016 |
USA Today | 23923 | USA | Jan, 2016 |
New York Times | 23114 | USA | Jan, 2016 |
The Internat. Business Times | 20178 | USA | Jan, 2016 |
The Guardian | 19531 | England | Jan, 2016 |
The Associated Press | 18133 | USA | May, 2016 |
CBS News | 17291 | USA | May, 2016 |
The Huffington Post – UK | 17014 | England | May, 2016 |
CNN | 15642 | USA | May, 2016 |
BBC | 9202 | England | Jan, 2016 |
Sky News | 8773 | England | May, 2016 |
SBS | 4084 | Australia | Nov, 2016 |
Deutsche Welle | 2384 | Germany | Nov, 2016 |
Deutsche Welle | 1989 | Germany | Nov, 2016 |
CBS News | 1791 | USA | May, 2016 |
United Press International | 1627 | USA | Nov, 2016 |
AlJazeera | 1566 | Catar | Nov, 2016 |
New Yorker | 1367 | USA | May, 2016 |
Die Basler Zeitung | 969 | Switzerland | May, 2016 |
The Atlantic | 264 | Canada | Nov, 2016 |
National Public Radio | 84 | USA | Nov, 2016 |
The figure below shows the number of news article per month in the year 2016.
The following table gives the number of news article for the top 25 categories for the articles collected for the year 2016.
Category | Number of Articles |
world | 80558 |
business | 72636 |
politics | 63902 |
companyNews | 61861 |
finance | 45068 |
newsWorld | 39291 |
national | 34455 |
newsUK | 21031 |
political | 18057 |
money | 16026 |
news | 12626 |
US | 12295 |
local | 8357 |
washington | 8118 |
newsUS | 6942 |
national(US) | 6565 |
worldnews | 6230 |
USNational | 5874 |
politicsUS | 4754 |
technology | 3986 |
environment | 2996 |
europe | 2940 |
all | 2800 |
internat. Business | 2607 |
UK | 2538 |
Download
Because of the large size of the whole News Event Triples dataset, we also provide the triples in ten separate files. There is no particular ordering or organization upon which the partitioning of the complete dataset was conducted:
Event_News_Triples_all.7z
(129 MB compressed, 27 GB uncompressed, 172.521.530 event instances)
Separate files:
Event_News_Triples1v10.7z
(23 MB compressed, 2.6 GB uncompressed, 17.252.153 event instances)
Event_News_Triples2v10.7z
(10 MB compressed, 2.5 GB uncompressed, 17.252.153 event instances)
Event_News_Triples3v10.7z
(9 MB compressed, 2.5 GB uncompressed, 17.252.153 event instances)
Event_News_Triples4v10.7z
(11 MB compressed, 2.5 GB uncompressed, 17.252.153 event instances)
Event_News_Triples5v10.7z
(10 MB compressed, 2.6 GB uncompressed, 17.252.153 event instances)
Event_News_Triples6v10.7z
(9 MB compressed, 2.7 GB uncompressed, 17.252.153 event instances)
Event_News_Triples7v10.7z
(14 MB compressed, 2.7 GB uncompressed, 17.252.153 event instances)
Event_News_Triples8v10.7z
(14 MB compressed, 2.7 GB uncompressed, 17.252.153 event instances)
Event_News_Triples9v10.7z
(12 MB compressed, 2.8 GB uncompressed, 17.252.153 event instances)
Event_News_Triples10v10.7z
(18 MB compressed, 3.2 GB uncompressed, 17.252.153 event instances)