Very necessary: the meaning of non-gradable modal adjectives in discourse contexts

In this paper we provide a quantitative and qualitative analysis of meaning of allegedly non-gradable modal adjectives in different discourse contexts. The adjectives studied are essential, necessary, crucial and vital which are compared with a gradable modal adjective important. In our study sentences containing these adjectives were chosen from a large corpus together with their contexts. Then 120 English native speakers evaluated the meaning of these adjectives in a crowd-sourced study. Different types of contexts were chosen for this purpose. In some the adjectives were used as gradable with a modifier very while in others as non-gradable, without a modifier. We also modified the contexts by adding or removing the modifier very. The task for evaluators was to provide a replacement for adjectives for all the resulting contexts. From the replacements we are able to quantitatively evaluate the semantic potential of these contexts and what kind of adjectives they license. 1 Modality and adjectives As a broad linguistic term, modality has newly gained increasing interest and been defined in different ways by linguists. (Huddleston and Pullum, 2002) argue that modality is mainly concerned with speakers’ attitudes towards factuality or actualisation of the situation expressed by the rest of the clause. Comparing these two utterances “She wrote it herself” and “She must have written it”, the first sentence, as a declarative main clause, is considered as non-modalised since no qualification or specialised emphasised has been made by the speaker towards the factuality of the preposition. By contrast, the second utterance is modalised since the truth of the preposition can only be indirectly inferred. By actualisation they refer to the utterances which have a relation to the future situation as in “She must help her friend.” The two modalised utterances above ,however belong to two different kinds of modality, express “necessity” as the core idea. Other concepts can also be considered as main concepts besides “necessity” in the domain of modality, for instance, “possibility”, “obligation” and “permission”. 1.1 Linguistic Elements for Expressing Modality (Matthewson, 2016) stated that languages vary in how they express and categorise modal meanings. For example in the Salish language St’át’imcets (Lillooet) the same morpheme (the enclitic =ka) can express either permission or obligation. A different morpheme (the circumfix ka. . . -a) is used to express ability. In English, modality can be expressed by several parts of speech, for example, auxiliary verbs, verbs, adjectives, adverbs and nouns referred to as “Lexical Modals”. Obviously there are other ways to express modality in English which are beyond the scope of the current paper. Referring to (Van Linden, 2012) who argues for a new definition for modal adjectives seemingly speakers or writers apply a kind of desirability scale to choose among adjectives from the same semantic domain in a specific situation. This desirability scale can also be applied when speakers or writers need to add weight to the chosen adjectives by using modifiers. The capability of adding a degree modifier to a modal adjective, is one criteria for the gradability evaluation. Gradability is expressed in the next section. 1.2 Degrees in Modal Adjectives Among non-modal adjectives, a class has been known as the extreme adjectives, for example big has the extreme counterpart huge and smart has brilliant (Paradis, 2001) and (Rett, 2008). This distinction is also applicable to modal adjectives; for example, crucial and certain are extreme or strong modal adjectives, comparing with non-extreme or weak ones such as important and likely. (Portner and Rubinstein, 2016) argue that strong modals cannot be gradable. Based on this we name extreme or strong modal adjectives as non-gradable modal adjectives and non-extreme or weak modal adjectives as gradable modal adjectives. The instances below taken from (Portner and Rubinstein, 2016) show the distinction between the two terms: • Non-gradable modal adjectives: A: It is crucial that our uninsured citizens get insurance. B: And it’s crucial that we allow people to make their own choices. A: So we’re stuck. • Gradable modal adjectives: A: It is important that our uninsured citizens get insurance. B: It’s also important that people make their own choices. A: So how do we balance these things? In the first example with the adjective crucial, Portner and Rubinstein point out that A and B are arguing that both of the following have the highest priority: uninsured citizens must get insurance and we must allow people to make their own choices. This leads to an impasse. In the second example with important, they argue, this impasse does not occur. However, there is a question whether the idea of non-gradability is as clear as this. This is because we find examples like the following: It is now widely apparent that the future of the earth as a living system is in many ways threatened, and that the basic cause is modern alienation from nature. There is a very essential difference between the present scientific way of regarding the earth, as a mass of inert matter, and the traditional view of it as a living, spiritual entity. This suggests that modal adjectives are not straightforwardly distinguished as gradable or nongradable. We argue that the gradability of modal adjectives is flexible and negotiable within the communicative context. Modifiers can coerce nongradable modal adjectives to gradable ones. This view assumes that meaning of lexical items is not fixed but fluid, related to the contexts they are used in. It might be the case then that non-gradable modal adjectives have a potential to be coerced into gradable ones in different contexts (see, for example, Pustejovsky, 1995; Clark, 1996; Cooper and Kempson, 2008). Two research questions are considered in this study: Q1 To what extent can “non-gradable” modal adjectives be used as gradable? Q2 What is the meaning of non-gradable modal adjectives when they co-occur with degree modifiers? To answer the first research question we perform a corpus study of examples of such usages in order to examine to what degree a modification of allegedly non-gradable adjectives is found in general language use or to what degree we should trust the linguists’ intuitions cited in the previous work. For the second research question, we specifically examine how non-gradable modal adjectives behave when co-occurring with degree modifier “very” and how their meanings vary across different contexts. The contributions of our study are both to theoretical linguistics and language technology. It investigates on the example of the corpus study to what degree structures that are traditionally left out from semantic analyses on the grounds that they do not exist occur in corpora of free text. We demonstrate that these are found in corpora and their semantics are captured by information theoretic measures. Knowing the semantic properties of these constructions gives important insights how such structures should be modelled and represented in feature-based annotation and rule-based approaches to language technology but also knowing what meaning representations we expect unsupervised language models to capture.


Modality and adjectives
As a broad linguistic term, modality has newly gained increasing interest and been defined in different ways by linguists. (Huddleston and Pullum, 2002) argue that modality is mainly concerned with speakers' attitudes towards factuality or actualisation of the situation expressed by the rest of the clause. Comparing these two utterances "She wrote it herself" and "She must have written it", the first sentence, as a declarative main clause, is considered as non-modalised since no qualification or specialised emphasised has been made by the speaker towards the factuality of the preposition. By contrast, the second utterance is modalised since the truth of the preposition can only be indirectly inferred. By actualisation they refer to the utterances which have a relation to the future situation as in "She must help her friend." The two modalised utterances above ,however belong to two different kinds of modality, express "necessity" as the core idea. Other concepts can also be considered as main concepts besides "necessity" in the domain of modality, for instance, "possibility", "obligation" and "permission".

Linguistic Elements for Expressing Modality
(Matthewson, 2016) stated that languages vary in how they express and categorise modal meanings. For example in the Salish language St'át'imcets (Lillooet) the same morpheme (the enclitic =ka) can express either permission or obligation. A different morpheme (the circumfix ka-. . . -a) is used to express ability. In English, modality can be expressed by several parts of speech, for example, auxiliary verbs, verbs, adjectives, adverbs and nouns referred to as "Lexical Modals". Obviously there are other ways to express modality in English which are beyond the scope of the current paper. Referring to (Van Linden, 2012) who argues for a new definition for modal adjectives seemingly speakers or writers apply a kind of desirability scale to choose among adjectives from the same semantic domain in a specific situation. This desirability scale can also be applied when speakers or writers need to add weight to the chosen adjectives by using modifiers. The capability of adding a degree modifier to a modal adjective, is one criteria for the gradability evaluation. Gradability is expressed in the next section.

Degrees in Modal Adjectives
Among non-modal adjectives, a class has been known as the extreme adjectives, for example big has the extreme counterpart huge and smart has brilliant (Paradis, 2001) and (Rett, 2008). This distinction is also applicable to modal adjectives; for example, crucial and certain are extreme or strong modal adjectives, comparing with non-extreme or weak ones such as important and likely. (Portner and Rubinstein, 2016) argue that strong modals cannot be gradable. Based on this we name extreme or strong modal adjectives as non-gradable modal adjectives and non-extreme or weak modal adjectives as gradable modal adjectives. The instances below taken from (Portner and Rubinstein, 2016) show the distinction between the two terms: • Non-gradable modal adjectives: A: It is crucial that our uninsured citizens get insurance. B: And it's crucial that we allow people to make their own choices. A: So we're stuck.
• Gradable modal adjectives: A: It is important that our uninsured citizens get insurance. B: It's also important that people make their own choices. A: So how do we balance these things? In the first example with the adjective crucial, Portner and Rubinstein point out that A and B are arguing that both of the following have the highest priority: uninsured citizens must get insurance and we must allow people to make their own choices. This leads to an impasse. In the second example with important, they argue, this impasse does not occur. However, there is a question whether the idea of non-gradability is as clear as this. This is because we find examples like the following: It is now widely apparent that the future of the earth as a living system is in many ways threatened, and that the basic cause is modern alienation from nature. There is a very essential difference between the present scientific way of regarding the earth, as a mass of inert matter, and the traditional view of it as a living, spiritual entity.
This suggests that modal adjectives are not straightforwardly distinguished as gradable or nongradable. We argue that the gradability of modal adjectives is flexible and negotiable within the communicative context. Modifiers can coerce nongradable modal adjectives to gradable ones. This view assumes that meaning of lexical items is not fixed but fluid, related to the contexts they are used in. It might be the case then that non-gradable modal adjectives have a potential to be coerced into gradable ones in different contexts (see, for example, Pustejovsky, 1995;Clark, 1996;Cooper and Kempson, 2008). Two research questions are considered in this study: Q1 To what extent can "non-gradable" modal adjectives be used as gradable? Q2 What is the meaning of non-gradable modal adjectives when they co-occur with degree modifiers? To answer the first research question we perform a corpus study of examples of such usages in order to examine to what degree a modification of allegedly non-gradable adjectives is found in general language use or to what degree we should trust the linguists' intuitions cited in the previous work. For the second research question, we specifically examine how non-gradable modal adjectives behave when co-occurring with degree modifier "very" and how their meanings vary across different contexts.
The contributions of our study are both to theoretical linguistics and language technology. It investigates on the example of the corpus study to what degree structures that are traditionally left out from semantic analyses on the grounds that they do not exist occur in corpora of free text. We demonstrate that these are found in corpora and their semantics are captured by information theoretic measures. Knowing the semantic properties of these constructions gives important insights how such structures should be modelled and represented in feature-based annotation and rule-based approaches to language technology but also knowing what meaning representations we expect unsupervised language models to capture.

Q1: Gradable use?
We draw our examples of adjective use from the ukWaC dataset (Baroni et al., 2009). This is a large corpus of British English which contains more than a billion words (N = 2, 283, 659, 645) sampled from websites in the UK domain. In order to answer the first research question, whether modifiers occur with "non-gradable" adjectives, we calculate log likelihood ratios as shown in Figure 1. With these we can test a hypothesis that a particular modifier and an adjective are collocated (h2) vs a hypothesis that the words are independent (h1). Firstly, looking at the co-occurrence counts for "very A" we see that in the ukWaC dataset we do find such examples. We also include "important" which is commonly agreed to be a gradable adjective. The statistical test in most cases confirms h2 that they are collocated (see column p < 0.05). In the last column we can see how many times the collocation hypothesis is more likely for that word combination than the hypothesis that the words are independent. The associations are very strong, e.g. 4.38e+7.

Q2: Meaning variation of gradable and non-gradable use
Our second research question addresses the semantics of allegedly non-gradable modal adjectives when they are used with and without a degree modifier. From the discussion in the previous section we have already rejected the possibility that all of them are non-gradable -why would they then occur with a degree modifier. Another possibility is that they are all gradable and there is no difference whether they are used with a degree modifier or not. A third possibility is that they can be gradable and nongradable but gradability is contextually determined. Therefore, we would expect that modifiers will be associated with certain contexts more than others.
To test these hypotheses, the following steps have been implemented. From the ukWaC corpus we took sample sentences containing different "non-gradable" adjectives (essential, crucial, necessary and vital) as well as the gradable adjective important in their contexts. Each context consists of a target sentence containing one of the adjectives plus one preceding and one following sentence as follows: S t−1 S t S t+2 where S t is a target sentence. For example: "As soon as you can, you should arrange further supplies by contacting your GP surgery. It is very vital that you never run out of drugs. For information about each of the drugs named below, click on each link." Two sets of 50 contexts were sampled: one set where in a target sentence an adjective co-occurred with the degree modifier "very" and the other set consisting of target sentences in which the adjectives did not occur with a degree modifier. From these another 50 contexts were created where the target sentences were modified by either removing or adding a degree modifier. The contexts are distributed as follows: • 25 target sentences containing a modal adjective and a modifier (very A) • 25 target sentences containing only a modal adjective (A) • 25 modified target sentences ( $ $ $ very A) from the first set • 25 modified target sentences (+very A) from the second set. Participants were randomly assigned to one of the tasks. They were asked to provide the closest synonym for each adjective in the target sentence. This way, we can analyse the meaning variation of the provided synonyms in each context to confirm the hypothesis about context dependent meaning.
In particular, our hope is that the semantic similarity of synonyms within the context will be stronger than across the contexts. Equally, we are expecting more semantic similarity between synonyms in the original contexts than in the modified contexts.
The tasks were presented to 120 English native speakers in two ways: a crowd-sourcing task which we ran on the Semant-o-matic tool 1 and the Amazon Mechanical Turk (AMT). Semant-o-matic was designed for the purpose of online collection of linguistic data and can be targeted to particular informants. The AMT also allows us to collect a large number of judgements more quickly but the background of participants is less known: for example we can only restrict our task to domains of English speaking countries. To further check that our participants are native speakers we asked them, somewhat indirectly, to list languages that they speak, from best to worst. If a participant reported English as their first language we considered them a native speaker. The same interface was used in both data collection experiments.
The collected data was assessed for quality. We selected the high quality answers from AMT for our analysis by removing answers that were nonsensical. We removed all data from participants who provided more than 33% irrelevant answers.
Some of the instances following by the discussion are explained here.

Qualitative analysis
Very vital and $ $ $ very vital "vital" as a nongradable adjective means "absolutely necessary", at the highest point in the scale of desirability. However, in the context below "vital" is used as gradable in the original context.  Figure 2 shows the synonyms provided by the annotators for both the original and modified contexts where the modifier very was removed. From the range of answers, we understand that "vital" in its gradable form can mean "important", "necessary", essential", "central" and "consequential". If we want to classify them, we may put all of these adjectives in the same range of meanings. However when the modified version of the context is considered ( $ $ $ very vital), other senses of meanings are also added to "vital". When the local context of the adjective is modified by removing very there is a slight meaning shift in order to be able to fit into the remaining context of the sentence. This results in a larger number of possible synonym choices indicating a more dynamic interpretation of the adjective. Effectively, the modified sentences become more difficult to interpret and therefore the results become less congruent as individual participants are attempting different interpretations. It seems that this modified version forced the context to bear another sense of meaning such as "engaging", "intrinsic", "integral", "chief", "substantial", "cornerstone", "big" and "key".
Vital and +very vital The variety of replacements was highly noticeable in the modified context where "vital" was used with a modifier. Figure 3 represents the variation clearly. Hence, this is in line to what we observed in the previous context which suggests that the gradability is linked to contexts. It appears that some contexts allow more or less gradability as seen for example in a slight difference in replacements for original contexts (very A vs A) between Figure 2 and 3.  Figure 2: Answers obtained for "vital" in the original ("very vital") and modified contexts (" $ $ very vital"). The results are ranked by counts (C).  Figure 3: Answers obtained for "vital" in the original ("vital") and modified contexts ("+very vital").
The pen/trap statute protects privacy and is an important investigative tool. Its application to the cyberworld is vital. Also, this legislation was passed in an era when telecommunication networks were configured in such a way that, in most cases, the information sought could be obtained by issuing an order to a single carrier.
Necessary and +very necessary Here is another context with the adjective necessary: "The bathroom is fully tiled and has a bath with overhead shower, bidet, w.c and wash hand basin. All the necessary bedding, bath and hand towels are provided. A useful store cupboard is located just inside the front door where the boiler is fitted." In this special context where necessary was originally used without a modifier the replacement options are "needed", "required", "essential", and "requisite" as shown in Figure 4. Therefore, "necessary" here conveys a fixed range of meanings in the  Figure 4: Answers obtained for "necessary" in the original ("necessary") and modified contexts ("+very necessary") area of requirements. The degree of requirement can be determined from the context. The meaningshift of the modified version is clearly observed. Having a gradable format of "necessary" instead of its non-gradable version in this context, leads to an increased ambiguity from a fixed range of meanings to a range of possible interpretations. Other senses were added by human evaluators.
Very necessary and $ $ $ very necessary Here is an example of a context with very necessary: "It exists to further speleology and that means discovering, exploring and recording caves and other underground sites wherever they may be found. A very necessary, I would say essential, part of this is the recording. The club has two log books where members can write up their exploits and achievements." As shown in Figure 5 in this specific context the adjective necessary has synonyms from "mandatory" to "important" with a limited number of other adjectives like "required", "needed", "essential". However, the meaning variation in the modified version in which "necessary" was used without a modifier is highly noticeable. Other meanings are added like "useful" and "basic". The degree to which adjectives can be replaced in the modified context seems to depend on the context itself, on the number of interpretations that can be reasonably constructed from it.
Very crucial and $ $ $ very crucial However, our experiment also shows that this distinction is not always so clear among the original and modified versions. Consider the following example:  Figure 5: Answers obtained for " necessary" in the original (" very necessary") and modified contexts (" $ $ very necessary") At beginning of course, when considering dialect, we looked at the relationship between social group identity and language. We considered the very crucial role that language plays in the formation and representation of identity. However, this account is limited in many senses.
As shown in Figure 6 in this specific context when crucial is used originally with a modifier the fluidity of meaning is observed to a higher degree than when it is used without a modifier.
In the next section we analyse this variation quantitatively using the measure of entropy which will give us a clearer picture to what extent this variation is possible in contexts and with adjectives chosen for this study.

Entropy as a measure of variation
To quantify the degree of variation of the replaced adjectives we calculate the entropy of their list W for each ground truth adjective and context as follows: where p is the likelihood of a word w being used/replaced in a particular context by an AMT worker. Since different contexts result in different number of replacements we normalise the obtained entropies by the maximal attainable entropy which is −log 2 (n) where n is the size of the set. If the normalised entropy of replaced synonyms is close very crucial C $ $ very crucial C   important  5  important  2  essential  2  essential  5  critical  5  critical  5  vital  3  vital  2  significant  1  significant  1   paramount  1  paramount  1  central  1  central  1  serious  1  deciding  1  decisive  1  large  1  determining 1  key  2  necessary 1 all important 1 imperative 1 fundamental 1 prominent 1 pivotal 1 substantial 1 appropriate 1 big 1 mandatory 1 Figure 6: Answers obtained for " crucial" in the original (" very crucial") and modified contexts (" $ $ very crucial") to 1, it means that we are approaching maximum variation of answers and randomness (all items are equally probable) compared to when it approaches 0 and all the answers are the same and therefore completely predictable. Figure 7 shows the meaning variation in very A/ $ $ $ very A compared to A/very A in our experiment. The red line shows the original contexts and the blue line shows the modified contexts. Adjectives under study are shown in the range of 5 in the horizontal lines which means 5 questions were devoted to each adjective. The vertical lines stand for the entropy result. Non-consistency among the adjectives can be inferred from the two figures which shows how adjectives behave differently with context consideration. It can be observed that how these adjectives mapped to the original and modified contexts sometimes with higher entropy and sometimes with lower entropy result. The detail of the entropy result is discussed in the next section.

Entropy Result over Original and Modified Contexts
As discussed in the previous section, the meaning of modal adjectives is fluid across different contexts. The entropy results support this idea of fluidity. Figure 8 shows the entropy results for the very A and $ $ $ very A condition. important as a commonly acceptable gradable modal adjective is also added for comparison. We can see that for important the difference in entropy of the answers for the original and the modified contexts is very small but for other adjectives "necessary", "crucial", "vi- tal" and "essential" it is larger. A two-tailed paired t-test found a significant difference between very-A versus $ $ $ very-A (t(19)=2.179, p=0.042) for these adjectives (excluding important). Looking at individual adjectives more closely, it is likely that necessary is in alignment with important as they both get lower entropy of answers in the modified version which is not the case with essential, crucial and vital.
Next we compare the answers obtained from the A and +very A contexts shown in Figure 9. Excluding important, the two-tailed paired t-test found no significant difference between A versus +very-A (t(19)=1.003, p=0.3283) which is in contrast to the previous condition. In the second condition only essential got a lower average entropy in the original version.

Entropy Results across Contexts
We used the entropy analysis to compare the answers obtained from the original very A and A contexts as shown in Figure 10. The two-tailed paired t-test found no significant difference between very A versus A (t(19)=-0.4688, p=0.6445) for all adjectives excluding important.
Finally, the entropy analysis was done to compare the modified contexts ( $ $ $ very A and +A) as shown in Figure 11. A two-tailed paired t-test found a significant difference between +very A versus $ $ $ very-A (t(19)=2.2808, p=0.0342) for all modal adjectives excluding important. This is expected since all of them are different contexts.

Discussion and Conclusion
Our findings, in particular, the log likelihood ratios, support the use of "non-gradable" adjectives with modifiers in a large corpus of British English. This demonstrates that the traditional distinction between gradable and non-gradable adjectives is not that straightforward.
However, what are the semantics of adjectives with a modifier and without a modifier is not straightforward when considering the analysis related to examination of synonym replacements. A possible explanation for our results is as follows. From the research on semantic coordination we know that the meaning of words shifts in contexts. Removing very ( $ $ $ very A) increases the entropy of synonyms while adding very (+very A) does not change the entropy of synonym replacements. Hence, if entropy of replacements corresponds to ambiguity, our explanation is that without a modifier an adjective is ambiguous between gradable and non-gradable reading, a form of underspecification. The interpretation is resolved within the context in which the adjective is used, including the communicative intent of the speaker. A context with very A will be non-gradable unambiguously by the virtue of the presence of the modifier and the non-gradable semantics must also be supported by the context, otherwise the utterance would not be well-formed. If we remove very, we therefore create a non-congruence with the non-gradable context since now also a non-gradable interpretation is at play. This leads to an increase in ambiguity of the sentence and an increase in entropy. On the other hand, original contexts without a modifier are ambiguous between gradable and non-gradable interpretations. Adding very simply selects a pref-  Figure 11: Average entropies and ranking over modified contexts for +very A and $ $ very A. erence for a non-gradable interpretation which was already available and therefore there is no change in entropy. Natural contexts very A and A have identical entropy distribution; while changed contexts $ $ $ very A and +very A have different entropy distributions which means that they are affected differently by the modification. This provides further support for the hypothesis that modification is linked to a loss of congruence with the context ( $ $ $ very A) and therefore increase in ambiguity or resolution of ambiguity (+very A) towards non-gradable use.
With the analysis of synonym replacements with the information theoretic measure of entropy we tried to evaluate what is the semantic potential of the context with the adjectives and a potential gradable modifier very. We have linked the variation to the ambiguity of the contexts: the higher the ambiguity of a context the higher potential for using adjectives in this context. In our future work we intend to compare the potential replacements at a more fine-grained level by comparing their contextual word embeddings (Devlin et al., 2018) with the word embeddings of the original adjective. We hope that the exercise will also contribute to the evaluation of contextual word-embeddings as the task that we are interested is a highly fine-grained semantic task that tries to evaluate semantic differences within a particular class of part of speech.