4.1, December 19, 2013
Wim Peters, University of Sheffield; email: w.peters@dcs.shef.ac.uk
http://www.arcomem.eu
Stefan Dietze, University of Hannover, email: dietze@l3s.de; David Dupplaw, University of Southampton, email:dpd@ecs.soton.ac.uk
ARCOMEM data model
This class is used for capturing term dynamics in ARCOMEM. It represents a contextual snapshot of a sense at a particular point in time.
This class covers the semantic characterization of extracted terminology or the purpose of sense disambiguation and sense dynamics.
InformationObject refers to the superclass of ARCOMEM's information objects Entity, Event, Opinion, Sense and Document.
InformationRealization is a concrete realization of an InformationObject, for instance a web page in which
an event occurs or a photo of an event.
An object that is significant within the conceptual domain covered by the web documents under consideration. Entities come in different flavours, each of which requires the application of specific extraction techniques, namely named entity recognition and term extraction.
A Term is a concept that is considered important for the conceptual coverage of the target domain, and is expressed by a nouns, verbs or phrases. Terms are the product of term extraction or recognition techniques, which involve the identification and filtering of term candidates, and assign them a termhood score as a reflection of their importance.
Examples from the Arcomem domains: ‘‘line up”, ‘‘approval of report”, ‘‘Βουλευτές”.
Topical identification, either expressed by means of explicit labels (e.g. Law, Politics, Music), or clusters derived from statistical analysis, produced by e.g. Latent Dirichlet Analysis.
A conceptual entity representing a web page. It may contain information objects, and can be the target of an opinion.
This is the informational counterpart of InformationRealization.
same as CIDOC: http://erlangen-crm.org/120111/Document
A role classifies an Entity by describing the nature of its participation in the predicate that is indicative of a (sub-) event.
This can be expressed in various ways, e.g.
- linguistic functions such as subject and direct object,
- semantic relationships such as subject and object as expressed in RDF triples,
- case roles such as Agent and Patient.
This class is used for capturing the dynamics of a hashtag.
NamedEntity refers to a (mention of) an instance of persons, organizations, and location names, e.g. ‘‘Turkey”, ‘‘Austrian Parliament”, ‘‘David Cameron”.
Other types of named entity that we can provide will play a supportive role for adding information about InformationObjects, e.g. temporal expressions (dates and times), e.g. ‘‘2010”, ‘‘May 27 2011”, ‘‘Tuesday 4pm”, and certain types of numerical expressions (monetary values and percentages) .
It is possible that we will add new types of named entities, which reflect the conceptual structure of the domains under consideration in ARCOMEM, such as MusicFestival, Legislation, and so on. These can be defined as necessary.
This class is used for capturing the dynamics of a directed association between hashtags.
This concept expresses the syntactic or semantic relation that connects entities within events.
For instance, ‘‘European Parliament - approve - Turkey’s Progress Report ‘‘; ‘‘MusicalArtist - play - MusicFestival”) are examples of Relation.
It forms part of InformationObject in that it captures predicates covered by events, and thus links entities into events.
Captures sentiments about Information Objects.
Score associated with an InformationRealization on the basis of the online analysis.
reference to the URI of an element from a particular conceptual taxonomy or thesaurus, which functions as a (near-)equivalent.
range: the entities, events etc. the opinion is about. Targets are of the class InformationObject
reference to the URI of an element from a particular conceptual taxonomy or thesaurus, which functions as a superclass.
twitter
youtube
googleplus
flickr
other
forum
image
video
blog
facebook
Aggregated opinion score
Text-only content for the resource
context in which an InformationObject occurs, where possible limited to one sentence.
URL or OXPath expression.
ermhood score according to Bosma W., P. Vossen, P. (2010), Bootstrapping language neutral term extraction, in: Proceedings of the 7th international conference on Language Resources and Evaluation (LREC2010), Malta, May 17-23, 2010, page 2277-2882, ed. N.Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner, D. Tapias, publ. European Language Resources Association (ELRA), ISBN 2-9517408-6-7.
This property specifies the granularity of the timespan for which a hashtag or a hashtag association are valid.
TODO: -
blog
socialnetwork
forum
other
wiki
XPath expression, which identifies the WebObject inside the WebResource.
a statement that describes a concept and permits its differentiation from other concepts within a system of concept
Termhood score consisting of the term frequency/inverted document frequency (TF/IDF) calculation, a technique widely used in information retrieval and text mining, yields a score that indicates the salience of term candidates for each document in the corpus.
(http://en.wikipedia.org/wiki/Tf-idf)
ugc
organization
news
A conceptual entity representing a web document, which may contain information objects, and, as an entity, can be the target of an opinion. This is the informational counterpart of InformationRealization.