Nano-publications – Review and Comparison to the Research Analysis model

By Scott Needham
Posted in How to use Research Analysis and FAQ

Apart from the scientific knowledge management models of micropublications (Clark 2014) and the Biological Expression Language (http://www.openbel.org), the concept of nano-publications (Mons 2009, Growth 2010) is most aligned with the goals of Research Analysis and has provided valuable insights. In this article we provide an overview of the nano-publication model and discuss how it relates to the Research Analysis (RA) model.

The authors propose 5 steps required to create and adoption nano-publications (Mons 2009, Growth 2010):

  1. Terms to Concepts: This step requires that all terms in a research article are mapped to non-ambiguous identifiers. In nano-publications this is referred to as a Concept, where a Concept is the smallest, unambiguous unit of thought. A concept is uniquely identifiable (Groth 2010). This is similar to, but more ambitious than, the Medical Subject Headings (MeSH) database. Using MeSH as an example, MeSH Headings are the equivalent of Concepts and the Entry Terms for each MeSH Heading are equivalent to the Terms or synonyms for each Concept. We agree with this general goal and strongly promote the use of standard language. We promote the use of MeSH terms in Research Analysis. In Research Analysis users can use the MeSH Entry Term (synonym) they are most comfortable with and the system then ensures that this is mapped to the main concept or MeSH Heading. This allows the user to work with the terminology that is most comfortable for them and their peers, rather than being forced to use an ideal concept. In a separate article, we discuss the challenges associated with defining an unambiguous unit of thought.
  2. Concepts to Statements: Here they propose that each smallest insight in exact sciences is a ‘triple’ of three concepts, though conditions are required to put the insight in context. The triple is in the form subject > predicate > object. For example, cholesterol > increases > atherosclerosis. A Statement is a uniquely identifiable triple, which can be achieved through the assignment of a unique identifier to the triple by annotation. RA initially implemented a cause-effect model for statements, which is a special case of the triple where the predicate must be a cause-effect predicate. We currently only offer the predicates increases, decreases and not significant. We chose to initially restrict the options for triples to allow for the collection of a consistent database that would allow for analysis.
  3. Annotation of Statements with Context and Provenance: It is not enough to store statements just in the form of their basic components, three concepts in a specific sequence. A statement only ‘makes sense’ in a given context and taking a statement out of a research publication strips it of this context. The context in a nano-publication is defined by another set of concepts. The annotation is achieved technically through a triple such that the subject of the triple is a statement. For the example above, the species should be specified. Mice do not get atherosclerosis, even on a high cholesterol diet, but humans do. Also, provenance is associated to Statements by annotation eg. author, source. Claim’s in RA by default require that the user provide organ/cell model, genetic model and species annotations that a relevant for each specific claim. RA automatically assigns unique identifiers to claims and requires that at least one supporting quotation is provided from a publication. The supporting quotations require that the PubMed ID (PMID) is provided. Additional conditions and context can be provided by the users appending tags to the claim.
  4. Treating Richly Annotated Statements as Nano- Publications: treat these statements with conditional annotation as nano-publications via proper attribution so they can be cited and the authors can be credited. A nano-publication is a set of annotations that refer to the same statement and contains a minimum set of (community) agreed upon annotations (Groth 2010). This concept is similar to the claim model in RA, claims can be cited using their unique identifier and viewing a claim provides details of all of the quoted statements and associated PMIDs that support the claim, along with any other context provided via tags.
  5. Removing Redundancy, Meta-analyzing Web-Statements: where statements are identical they would be removed to simplify the database. The goal of this being to reduce “undue repetition” and to help improve the identification of new statements. Groth et al. define S-Evidence: all the nano-publications that refer to the same statement (Groth 2010) and, as implied by the name, provide evidence for the statement. The original model for nano-publications focused more on the removal of redundancy, but the concept of S-Evidence provides more respect for the importance of replication and the potential for meta-analysis. In complex sciences like biology, the likelihood of a statement being true based on the evidence of one publication is surprisingly low. For example, the uncertain reproducibility and re-usability of results investigated in the therapeutic development in the cancer field (Begley 2012). No single experiment, or for that matter any number of experiments, can fully demonstrate the truth of a statement. However, the collection of results that support a statement can, in a Poperian sense, provide some guidance to the level to which a scientific statement has had its metal tested. It can also allow for the bridging of knowledge between subfields where different terms for the same concepts are regularly used.

Figure 0: The Nano-publication Model taken from Groth 2010.

The goal of nano-publications

The nano-publications authors propose the goal of having scientific authors structure their data in such a way that computers understand them and we support this goal. However, we feel that it is likely that the formalisation of scientific knowledge may become a specialist task, like the coding of software design specifications into software code. It is not clear how the nano-publications authors see the knowledgebase of nano-publications being used. We strongly believe that while knowledge coding may be a specialist activity, that most researchers in the biological fields will used tools based on such knowledgebases to help direct their research by identifying gaps, conflicts and opportunities in the current research.

Main differences between the nano-publications model and the Research Analysis models

We have discussed some similarities and differences between the nano-publication and RA models above, but here we go into a little more detail.

At the Concept level:

At the Statement level (Claims in RA):

Figure 1. Research Analysis Knowledge Management Model

Both the nano-publication and micro-publication models have a substantial focus on the technical aspects of encoding the data in semantic web schemas. The reason for this focus is the importance of making the data available in an open and semantically rich format. We respect this goal, but have chosen to hide as much of this detail from our users as possible. While the biological research community has become computer savvy over the past decades, the majority of the community are not trained in any computer programming languages and would find these schemas very unpleasant. I have a computer science degree from before web technology was popular and I still find it very unpleasant. This is a little unfair, as these are articles in informatics journals and certainly we acknowledge that the micropublication authors have built user oriented applications.

One of the key goals discussed in the article on the Research Analysis Mission is that our focus is on making knowledge management tools available to normal biological and medical scientists in an easy to use and powerful way – not making the knowledge available to a few hard core geeks and their supercomputers. For this reason, none of the bare bones schemas are visible to the users and the frontend terminology is focused on usability rather than theoretical correctness. This is also the reason why we provide a number of standard models for capturing scientific claims in RA. These models will not be flexible enough for some, but for the rest they will be much easier and straightforward to use. Adoption and the actual acceleration of discovery by normal scientists is our top priority.

References

Mons, B., & Velterop, J. (2009, October). Nano-Publication in the e-science era. In Workshop on Semantic Web Applications in Scientific Discourse (SWASD 2009).

Groth, P., Gibson, A., & Velterop, J. (2010). The anatomy of a nano-publicationlication. Inf. Services and Use, 30(1-2), 51-56.

Clark, T., Ciccarese, P., & Goble, C. (2014). Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications. Journal of Biomedical Semantics, 5(1), 28.

Begley, C. G., & Ellis, L. M. (2012). Drug development: Raise standards for preclinical cancer research. Nature483(7391), 531-533.