Representation of the meaning of scientific claims

By Scott Needham
Posted in Scientific Knowledge Management

Researchers in medical science work hard to express their views and findings in language which is unambiguous and consistent, but natural language is not well suited to the task, for example:

(1)    a. “Statin therapy can safely reduce the 5-year incidence of major coronary events, coronary revascularisation, and stroke by about one fifth per mmol/L reduction in LDL cholesterol, largely irrespective of the initial lipid profile or other presenting characteristics.” [2]

b. “Clinical trials in patients with and without coronary heart disease and with and without high cholesterol have demonstrated consistently that statins reduce the relative risk of major coronary events by ≈30% and produce a greater absolute benefit in patients with higher baseline risk.” [3]

c. “Statins can lower LDL cholesterol concentration by an average of 1.8 mmol/l which reduces the risk of IHD events by about 60% and stroke by 17%.” [4]

Are all of the scientific claims in (1) above expressing the same meaning? No, there are significant differences that effect the specific meaning of each sentence. If we assume that “IHD events” has the same meaning or denotation as “coronary events”, then can we say that all of the claims in (1) mean at a high level that “Statins reduce coronary events”? (see this claim in Research Analysis: http://researchanalysis.com/claim/2360) These are the sorts of questions that it would useful to be able to answer reliably using sematic methods and tools.

Focusing on the high level claim, consider the following:

(2)    a. Statins reduce coronary events

b. Coronary events are reduced by statins

Sentences (2a, b) have different written forms but the same truth condition. A way is need to represent meaning that is unambiguous and consistent and for this logic can be used.

Logic is a system for reasoning. Below some of the high level terms and elements of logic are briefly outlined:

For example:

(2)    Statins are drugs

Predicates can be represented using notation that uses the main word, without tense, without the copula be (for example “are” in this case) and some prepositions. The entities (people or things) that the predicate is related to are its arguments. The standard notation uses upper case for the predicate and lower case letters for names. For the example above, the notation would be as follows:

(3)    DRUG(statins) = TRUE

In this case the name statins has been used as the argument for the proposition, but any name can be inserted into the proposition. The proposition will be true or false depending on the argument.

Propositions can have more than one argument, for example:

(4)    Statins reduce coronary events

This is an example of a two place predicate. In natural language, predicates can occur with three or possibly four arguments. Adicity is the number of arguments that a predicate can take. Each predicate has a fixed acidity or number of arguments. Some predicates from natural language can have different or extended meanings with different numbers of arguments, but each of these is considered to be a unique predicate in logic with a fixed number of arguments.

In natural language, predicates are sometimes used with an argument missing. These are known as elliptical sentences. Elliptical sentences are regularly used in common language e.g. “Give statins.” Give to who? From the context of the conversation it is assumed that the statins are to be given to the patient. From a syntactic point of view this sentence is incomplete as it is missing the object noun phrase. The way to understand these sentences is to assume that there is an ellipsis (in this case the patient) that is given by the context of the situation to fill the required argument in the predicate.

In the case of “Give statins to”, it is obvious as a native English speaker that this sentence is incorrect. In the case of our high level claim (4), the sentence sounds fine and this is because it is a syntactically complete sentence. But it could be asked, reduced compared to what? To assess the truth of the predicate, a comparison set is needed to assess whether there was a reduction. In science, and phrases like (4), the comparison set is usually taken to be the pre-treatment population sample that is represented by the control group in the experiment. The control group is a group of entities (e.g. people, mice, cells) that are the same as the experimental group, but where the experimental condition has not been applied (e.g. the drug has not been administered). The term control group will be used to refer to the comparison set.

An analysis of the full sentences in (1), but without detailed linguistic analysis, provides the following control group for each of the three sentences:

  1. “largely irrespective of the initial lipid profile or other presenting characteristics” suggests that the reduction would apply to any human receiving the treatment.
  2. “patients with and without coronary heart disease and with and without high cholesterol” again suggests that the reduction would apply to any human receiving the treatment.
  3. The sentence doesn’t provide a control group and publication must be investigated further to find out the context of the sentence. In the method it is found that “We included all double blind trials, irrespective of participants’ age or disease. Participants in most trials were healthy with above average lipid concentrations.”. While they note the bias towards above average lipid concentrations, they also note elsewhere that the results are adjusted to take into account variations in the participant groups. So it is reasonable to assume that at a high level they are also making the claim relative to the group of all humans. Though in this case it would also be reasonable to take healthy humans with above average lipid concentrations as the control group.

Taking the control group as “normal humans” (we could have a second claim for hyperlipidemic humans), (4) is updated as follows:

(5)    Statins reduce coronary events in normal humans

Notation: REDUCE(statins, coronary events, normal humans)

Is the reduce predicate a two place or three place predicate? Syntactically reduce with two arguments is sufficient. Scientists strive for the maximum generality possible in their theories and reduce with two arguments may suggest that the relationship applies to all sets of things. But our example is clearly ridiculous when applied to machines or plants that have no concept of coronary events. In the case of a plant, the denotation of the coronary artery would be false due to the non-existence of the organ in the case of a plants and thus the claim would be false based on Russell’s theory of descriptions [6] or meaningless on a classical analysis. So while the two argument reduce predicate is syntactically correct and may have a meaning, the broad meaning is not implied by the sentences in (1) and is false or meaningless outside the set of animals with coronary arteries. I believe that to represent the sentences in (1) and to be a scientifically useful claim, the reduce predicate must be combined with a control group. In semantics, the identification of arguments and non-arguments is not straightforward and remains an unresolved topic [1].

There are some alternatives means of linking the control group to the reduce predicate without making it an argument:

  1. Consider the control group as a noun modifier for the object of the reduce predicate.
  2. Consider the control group as an adverbial phrase attached to the reduce predicate that provides a location for the reduction.

 

Control group as noun modifier

If the control group is treated as a noun modifier for the object, then “coronary events in normal humans” is the object that is reduced and this would be represented by (5.1) below

(5.1) REDUCE(statins, coronary events in normal humans)

This doesn’t appear to obviously wrong, since on a simple reading coronary events do occur in the normal humans. However, I do not believe that this representation is logically correct and doesn’t correctly represent the claim in (1). The concept in (1) is that statins are given to normal humans and that this results in a reduction in coronary events. The predicate in 5.1 suggests that the reduction is seen in normal humans, but humans that have received statins are actually no longer “normal humans” because statins alter their metabolism. It is only in these humans with a metabolism altered by statins that the reduction in coronary events is claimed to occur. On the other hand, the predicate in (5.1) does not make clear that the statins or the metabolic effects of statins occur in the humans, because the “in normal humans” only modifies the coronary events argument. The interpretation of the control group as a noun modifier to the object argument does not appear to be correct.

Control group as an adverbial phrase

I believe that the correct interpretation of the adjunct “in normal humans” is that the process or event of reduction occurs within the normal human. This is a word group that qualifies the main reduce predicate and is not a required argument of the reduce predicate. This is an example of an adverbial phrase that is attached to the verb, reduce, and provides a location for the event(s) or process. In the example, the statin is taken by the patient and the effect of the statin occurs within the patient by altering their metabolism and leads to a reduction of coronary events. In this example the location is within the body of any human included in the set of “normal humans”. I believe that this representation allows the interpretation that the process that the statins trigger in the normal humans to alter their metabolism resulting in reduced coronary events occurs from the starting point of a normal human, even though the result of the process will be an altered human with reduce coronary events.

Davidsonian Analysis

The semantic concepts introduced by Davidson in his 1967 paper “The Logic Form of Action Sentences” [8] can be used for the analysis of the reduce predicate and its associated adverbial phrases. Before analysing the sentences in (1), the Davidsonian approach will be reviewed with a simpler example:

“In 1976 we published two papers reporting the discovery and characterization of compactin, the first statin.” [5]

The following simpler sentence that takes some information from elsewhere in the paper will be used:

(6)      Akira Endo discovered compactin in 1976 in Japan

Before Davidson’s work, the standard logical analysis would have included the adverbials as arguments of the predicate [1].

(7)     DISCOVERED(Akira Endo, compactin, 1976, Japan)

Because a predicate has a fixed number of arguments, we can see that this analysis results in a number of DISCOVERED predicates with different numbers of arguments:

(8)     a. Akira Endo discovered compactin

DISCOVERED’(Akira Endo, compactin)

b. Akira Endo discovered compactin in 1976

DISCOVERED’’(Akira Endo, compactin, 1976)

The argument structure in (7) can be represented generally as (9):

(9)    DISCOVERED’’’(discoveree, discoverer, time, place)

There are problems with this analysis:

  1. The arguments of the predicate should be necessary to give it meaning. The discoveree and discoverer appear to be essential to the meaning of the DISCOVERED predicate. But the adverbial expressions are more loosely connected to the predicate.
  2. Action verbs can be modified by a variable number and type of adverbials in difference combinations. The difference combinations of the predicts would express different predicates, which results in a very complex group of related predicates.

It does not seem to make sense that the multitude of predicates that this produces related to different verb meanings. Intuitively it seems that there is one core predicate that appears in all of the sentences, with the adverbs specifying additional information.

Davidson pointed out that sentences like (7) and (8a&b) are related by entailment in a way which seems to require that the same predicate appears in all sentences [1, This example paraphrases Kearns’ example in chapter 11 and it should be referred to for a general introduction]. For example:

The entailments of (6) include:

(10)    Akira Endo discovered compactin in 1976

(11)    Akira Endo discovered compactin in Japan

And the entailment of both (10) and (11) is:

(12)    Akira Endo discovered compactin

Davidson observed that every entailment of this kind resembles entailments which ‘drop conjuncts’, in this case adverbial phrases. He proposed that the parts of the entailing sentence which can be dropped to produce the entailed sentence should be represented as logical conjuncts [1]. The central basic proposition can then be conjoined with the adverbials to provide a proposition that expresses the meaning of the whole sentence.

(13)    Akira Endo discovered compactin in 1976 in Japan

DISCOVERED (Akira Endo, compactin) & p & q

p expresses “in 1976”

q expresses “in Japan”

The next step is to link the propositions p and q to the central proposition. Davidson argued that the adverbials are referring to the action or event described by the central basic proposition eg. “The discovery event occurred in 1976”, “The discovery event occurred in Japan”. Davidson proposed that the event itself should be included as an additional argument and that the propositions should all refer to this event. Using our example:

(14)    a. Akira Endo discovered compactin in 1976 in Japan

b. ꓱe DISCOVERED (Akira Endo, compactin, e) & INYEAR(e, 1976) & INCOUNTRY(e, Japan)

c. “There was an event, which was the discovery of compactin by Akira Endo, and the discovery was in 1976 and the discovery was in Japan.”

The variable e is a restricted variable in logic and ranges over all events. The existential binding of the event variable in this proposition requires that there is at least one event for which the remainder of the proposition is true for the whole proposition to be true. The adverbials (eg. time, place, manner) are now expressed as logical conjuncts with each represented by its own predicate with the event as an argument. The result of the Davidsonian approach is just one core predicate for the action verb (in this case DISCOVERED), to which can be added as many or as few adverbials as needed by using separate predicates for each and conjoining them to the core action predicate.

Davidsonian Analysis of Medical Science Claims

Returning to the analysis of the medical science claim (5), Davidsonian analysis would produce the following:

(15)      a. Statins reduce coronary events in normal humans

b. ꓱe (REDUCE(statin, coronary events, e) & IN(e, normal human))

The word “events” in “coronary events” is confusing here and shouldn’t be confused by the reduction event represented by e in the predicate REDUCE. “Coronary events” here is the object of the predicate and refers to a category of adverse medical events related to the coronary artery, for example a myocardial infarction. To avoid confusion, from here on “myocardial infarction” will be used to replace coronary events as the most common coronary event.

(16)    a. Statins reduce myocardial infarction in normal humans

b. ꓱe (REDUCE(statin, myocardial infarction, e) & IN(e, normal human))

c. “There is at least one event, such that statin administration reduced myocardial infarction, where the event was in a normal human”

Note that the plural has been removed from “statins” and from “normal humans”. This is because in the Davidsonian analysis the predicate REDUCE now applies to a single even e. In each case there is only one statin administered to one human (we’ll ignore the case here that multiple statins are administered).

The concept of an event in common language gives the impression that the statin administration reduced the myocardial infarction over a short period of time. The effects of a single statin administration on the metabolism of cholesterol in humans does occur over a short period of time, but this altered cholesterol metabolism needs to persist for a long period of time to result in a significant reduction in myocardial infarction. Heart disease patients are often administered statins for the remainder of their life. In this way the statin administration has its reduction effect on myocardial infarction via a long term process rather than over a short period implied by the word event. Process may be a better word to use than event. I would like to suggest “case” as an alternative to event as a word that is better suited to the medical field. I believe that the concept of a case suggests that the time period and medical process will be appropriate to the specific disease treatment paradigm. I will continue to use the variable e for event as it is standard in logic, but will use case and event interchangeably in the verbal descriptions.

(16)    d. “There is at least one case, such that statin administration reduced myocardial infarction, where the case was in a normal human”

While the research claims in (1) certainly do suggest that there is a least one event or case where statins reduced coronary events, the existential quantifier doesn’t give the full strength of the claims made. In the introduction to Davidsonian analysis above the past tense of discover, discovered, was used to refer to Endo’s discovery. This signifies the historical nature of the discovery and implies that it isn’t expected to occur again. Rediscoveries can occur due to lost knowledge, but in modern science there is an assumption that there is a first discover and discovery of a scientific claim. In contrast, the claim in (15a) uses the infinitive, reduce, which implies that the reduction will occur in all or many events. A primary goal of science is to identifying models or rules that allow for the reliable compression of large amounts of evidential information about events in the world into a much more compact form that allows us to make good predictions about future events. To this end scientific claims ideally would apply to all events or at least most events that fit the conditions of the claim. The nature of empirical knowledge is that we can’t ever know anything with certainty, but we seek to verify claims that have as broad a scope as possible. Based on this goal of science, I would interpret the claim (16a) with the following logical proposition:

(17)    a. ꓯe (REDUCE(statin, myocardial infarction, e) & IN(e, normal human))

b. “In all cases, statin administration reduces myocardial infarction, where the case is in normal human.”

I believe that (17a) is the correct logical representation of the claim in (5) & (16a) and is a good representation of the core claim in the sentence of (1).

It was shown that considering the control group, “in normal humans”, as a noun modifier to the myocardial infarction (or coronary events) resulted in the scope of the control group only applying to the myocardial infraction. Davidsonian analysis applies the control group to the whole event. This has the effect of placing the reduction event, the statins and the myocardial infarction in a normal human. I believe that this is the correct logical interpretation.

One of the primary reasons for considering the Davidsonian approach was to remove the need for additional arguments in the REDUCE predicate and here this approach adds an additional argument. However, the benefit of the addition of the event argument is that is can be used to introduce other adjuncts to the REDUCE predicate without the need for any arguments beyond three. For example: “reduced by how much?”. Each of the sentences in (1) gives a description of the size of the reduction. For (1b), the following summary sentence could be made: Statins reduce myocardial infarction in normal humans by 30%, and the logical proposition could be made:

(18)    ꓯe (REDUCE(statin, myocardial infarction, e) & IN(e, normal human) & BY(e, 30%))

This BY adjunct does seem to be optional unlike the control group IN predicate, as without it the truth value of the statement (16) and (17) could be assessed by assuming that any reduction greater than zero will be sufficient. I believe that there are some adjuncts that are essential to have a valid scientific meaning and others that may improve the specificity of the claim, but are not essential to the core scientific claim. Interestingly the addition of the BY predicate introduces clear conflict between the three claims in (1). For (1b) we have the claim in (18) and for (1c) we would have:

(19)    ꓯe (REDUCE(statin, myocardial infarction, e) & IN(e, normal human) & BY(e, 60%))

The differences may be due to differences in the meaning of coronary events, as (1b) refers specifically to IHD events, though generally they are assumed to be the same concept. This clear identification of contradiction or conflict between claims is one of the key reasons for representing claims in this this logical structure and of the key goals for Research Analysis. The contradictions raise questions about the structure of the claims and the descriptions of medical objects and events, and I believe that these clearly defined questions are what drives science forward. Arguments are constantly happening in medical science, but often they occur without the parties having a clear understanding of the other parties claims. The use of standardised language and formal descriptions of claims will improve the quality of these arguments and science.

Standard claims used in Research Analysis

The sentences in (1) use the predicate reduce as the relationship between the subject and object. Reduce is clearly just one of many predicates that could be used in medical science. Increase, as the opposite of a reduction, would be another obvious example. In Research Analysis, at the date of writing, three predicates have been proposed as a summary of all predicates. They are:

  1. Increases: The subject has the effect of increasing the object
  2. Decreases: The subject has the effect of decreasing the object
  3. Not Significant: There is no significant relationship between the subject and object. From a statistical point of view, the relationship is not present at a level that is sufficient to exceed the threshold of significance as defined by the authors.

Decrease is a synonym for reduce, as well as other verbs like lower, shrink, etc. In each case, the words may have slightly different usage in natural language, but they share a common concept. In Research Analysis, we use decrease to represent all of these verbal predicates. For the opposite concept we use increase to represent all verbs like raise, expand, etc. Finally, we use the term Not Significant to represent relationships where the subject has no significant effect on the object based on the statistical measure used by the authors of the scientific claim.

Using this model, we can represent (17) as:

(20)    a. ꓯe (DECREASE(statin, myocardial infarction, e) & IN(e, normal human))

b.“In all cases, statin administration decreases myocardial infarction, where the case is in normal human.”

This claim can be found in Research Analysis here http://researchanalysis.com/TreatmentEffectClaim/3efec17001aaebc801e2c41c4f4e9164 where it is linked with the sentences and references in (1).

Causation and Confidence/Determinism

The use of “reduce” in the claims in (1) appears to suggest a relationship of causation between the statins and the reduction in coronary events. Most people would also associate verbs like increases and decreases as suggesting that there is a causal link between the subject and object of the sentence. Bertrand Russell in his paper, “On the notion of cause” [7], states “the word “cause” is so inextricably bound up with misleading associations as to make its complete extrusion from the philosophical vocabulary desirable”. This is a very strong view, but it is clear that causation is a complicated concept and that it should be used with care. The concept of causation and its formal representation will be reviewed in more detail in a later article.

Through the practical use of the Research Analysis model, examples have been found that do not suit a causal model. For example, there may be an association between increased c-reactive protein and myocardial infarction, however the researchers don’t believe that the c-reactive protein causes the myocardial infarction. Instead they believe that both are caused by some other common cause eg. late stage atherosclerosis. In these cases, the use of increase in a claim could be interpreted as “c-reactive protein is found to increase with myocardial infarction”. This use of increase does not imply causation. Increase and decrease in Research Analysis are intended to be used to represent scientific claims that propose causation as well as those that do not.

The formal representation of claims is in (20a) and in Research Analysis are declarative statements that don’t include any hedging of the claim. The claim “statins reduce myocardial infarction in normal humans” suggests that the reduction will occur in all normal humans, not that it will occur most of the time or 95% of the time. This is representative of the claims made in (1), where the core claim in (20a) is made without any hedging, though there is some uncertainty on the quantum of the effect. Is this type of declarative statement appropriate given that modern science relies on statistical tools (eg. t-testing) that acknowledge that we can never have complete certainty about any real world associations. I believe that unhedged declarative statements are the appropriate representation of scientific claims, as Research Analysis seeks to represent the clearly stated core of the hypothesis being tested by the research that becomes the claim found to be support by the results of the scientific process. Modern science is largely based on the philosophy of Karl Popper [9] that requires a scientist to begin their work with a hypothesis that can at least theoretically be falsified by experimental findings. A hypothesis that involves any hedging, cannot ever be falsified definitively as the defender of the hypothesis can always claim that a negative result was only due to chance. The claims in Research Analysis aim to capture the core of these hypotheses that could have been falsified, but were found to be supported, but remain open to having their mettle tested.

Through this investigation, it was identified that the terminology of “not significant” does not align with the declarative and unhedged statements that are intended to be collected in Research Analysis. For this reason, it will be replaced with “no relation” in the next version as in “hair colour has no relation to myocardial infarction”. This language and concept removes the hedging that is implied by the concept of “not significant”. With this language new language, a claim from another publication that found an increase or decrease relationship between the subject and object would provide evidence against or falsify the claim of “no relation”, whereas the old language leaves open the possibility of statistical luck.

Conclusion

Is the model presented in this article a perfect representation of the semantics of scientific claims in medicine? Of course not. We expect to improve the model over time as we learn from application and the input of peers. But we do believe that scientific claims collected using the existing model can provide value through clarification of meaning, more efficient search and tools that can compare and contrast claims.

The writing of this article raise a number of issues for us and we hope to explore them in future articles. They include:

References

  1. Kearns, Kate. “Semantics. Palgrave Modern Linguistics 2ndEngland: Palgrave Macmillan (2011).
  2. Cholesterol Treatment Trialists. “Efficacy and safety of cholesterol-lowering treatment: prospective meta-analysis of data from 90 056 participants in 14 randomised trials of statins.”The Lancet 9493 (2005): 1267-1278.
  3. Maron, David J., Sergio Fazio, and MacRae F. Linton. “Current perspectives on statins.” Circulation2 (2000): 207-213.
  4. Law, Malcolm R., Nicholas J. Wald, and A. R. Rudnicka. “Quantifying effect of statins on low density lipoprotein cholesterol, ischaemic heart disease, and stroke: systematic review and meta-analysis.” Bmj7404 (2003): 1423.
  5. Endo, Akira. “A historical perspective on the discovery of statins.”Proceedings of the Japan Academy, Series B5 (2010): 484-493.
  6. Russell, Bertrand. “On denoting.”Mind 56 (1905): 479-493.
  7. Russell, Bertrand. “On the notion of cause.”Proceedings of the Aristotelian society. Vol. 13. Aristotelian Society, Wiley, 1912.
  8. Davidson, Donald. “The logical form of action sentences.” (1967).
  9. Popper, Karl. The logic of scientific discovery. Routledge, 2005.

Version 1.3, 20th September, 2016

Previous versions can be provided on request.