A Survey on Nonconformance Classification by Java/JML Developers

In this page we provide more details on the survey study with Java/JML Developers.

We make publicly available a replication package containing data for replication purposes and to support future studies. The package contains: (i) the source code of nonconformance presented in the survey study for the JML practitioners, (ii) the source code of a test case that reveals each nonconformance; (iii) JML practitioners answers and the map performed between practitioners' answers and our set of likely causes for nonconformances.

Research Questions

The goal of this experiment is to analyze two approaches (JML practitioners and JmlOk2) for the purpose of evaluation with respect to identifying nonconformances from the point of view of researchers in the context of real Java/JML projects. In particular, our experiment addresses the following research questions:

How diverse are the causes provided by developers for a given nonconformance? What are the possible reasons for the answers?

We discuss the range of answers for each nonconformance and speculate over their reasons.

Q2. How do answers compare to the baseline?

For each nonconformance, we discuss the number of coincident answers between practitioner answer and baseline.


From 84 nonconformances reported by Milanez et al. for JML systems, we randomly select three (labelled NC1, NC2, NC3), which are showed in Listings 1 to 3.

The survey instrument, available as an online form, first presents an overview of the classification model, along with guidelines for analysis. Next, each nonconformance is presented with location (package, class, and method) and type, along with links to the source code and the smallest test case revealing the nonconformance, emulating a debugging scenario. The form prompts the practitioner to suggest a category for the nonconformance: Specification error, Code error or Undefined; the likely cause must be provided as free text. We made this form available to the jml-interest mailing list -
 - , the main forum for discussions on JML. A JML practitioner has some experience with JML and subscribed to the discussion list.

    For creating the baseline, two researchers initially examine source code, contracts and the smallest test case that shows the nonconformance, and determine the error and likely cause. Next, they meet to reach an agreement. Their verdict was sent to developers of the three systems (Bomber, JAccounting, and HealthCard), and they all agreed on the classification.

Table 1. Randomly selected nonconformances for the survey. The Label corresponds to the name of each nonconformance. The System corresponds to the contract-based program in which the nonconformance was detected. The Class and Method columns provide the nonconformance location within the program. Column Type gives the type of the nonconformance. Column LOC shows the number of lines of Java code in each class. Column #CC presents the number of JML contract clauses of each class. Finally, Column #Method displays the number of methods of each class.

These nonconformances were made available for JML experts by means of the following form:


Fifteen voluntary JML practitioners answered the survey instrument, as showed in Figure 1. Each subject’s answer is mapped into our classification model – weak or strong precondition, weak or strong postcondition, strong invariant or code error. While several answers are straightforward to map – “Method <int> division(int, int) has a STRONG PRECONDITION (requires y> 0)”–, in others this is not the case; for instance, “Invariant is very complex, I could not, from the context, infer which one is wrong” is classified into Strong invariant. Also, we map to Code error answers such as “Code doesn’t treat values default for parameters to respect invariant”. The mapping performed is available on Table 2.

Figure 1. Survey answers, by nonconformance. The green bar indicates the baseline result.

Table 2. Mapping manually performed between JML experts' answers and our set of likely causes.

As likely cause for each nonconformance, four distinct answers are given. Overall, most answers differed from the baseline. For NC2 and NC3, the baseline cause was among the most mentioned causes. Nevertheless, it is visible that answers are highly diverse. Concerning individual answers, almost one third never agree with the Baseline, while another third agree twice. Almost half of the Subjects present only one agreement. In general, there is at least one agreement between subjects and baseline in 73.3% of the cases.


The response rate for the survey has been approximately 15% (15 out of 103 email recipients). The results show considerable diversity in answers, providing data for detailed analysis of Q1.

    Regarding NC1 (Listing 1), a precondition error is revealed by testDiv(); a call to distPointLine induces an internal call to div with y negative. Participant answers vary among four likely causes: (i) code error in distPointLine, since it does not scale by 10 the arguments, as done within other Division methods; (ii) strong precondition in div, since y should be possibly negative (baseline); (iii) weak precondition in distPointLine, allowing calls when vx and vy are small, resulting in calls to div with nonpositive y, or (iv) cause is undefined. Answers (i) and (ii) totalize 10 out of 15 answers, 6 and 4, respectively, indicating balance between code and contract. It is reasonable to assume developers think the precondition in distPointLine might have to be reinforced to avoid a call with low values – y must be greater than zero, otherwise a division by zero occurs. Domainspecific knowledge normally drive debugging; nevertheless, in contract-based programs, establishing the cause of a nonconformance is heavily dependent on the trade-off between weak and strong contracts. Code error is the most frequent answer; this might be evidence that developers tend to seek patterns when debugging – distPointLine, at first sight, deal with numbers differently from other methods, like square or div. In fact, code error is often answered for all nonconformances in the study, although none is baseline.

Listing 1. NC1. Class Division from system Bomber.
class Division {

    //@ requires true;
  //@ ensures \result == (x * y >> 10);

  int mul(long x, long y) {
    return (x * y >> 10);

  //@ requires y > 0;
    Strong precondition
  //@ ensures \result == ((x<<20)/y) >> 10;
int div(int x, int y){
    return ((x<<20)/y) >> 10;

int sqrRoot(long n) {
    if (n <= 0)
return 0;
long s = (n + 1024) >> 1;
int i=0;i<8;i++){
        s = (s+(((n<<20)/s)>>10))>>1;

return s;

int sqr(long n) {
return ((n * n) >> 10);

int distPointLine(int x, int y, int vx, int vy){
int dist = sqrRoot(sqr(vx) + sqr(vy));
return div((-mul(vx, y)+mul(x, vy)), dist); 

 public void testDiv(){

    For NC2, a postcondition violation is exposed by testCount. Subjects mentioned as causes: (i) weak precondition in getName (Listing 2), since name may be null, the precondition should forbid this state (baseline); (ii) code error because name is not initialized; (iii) strong postcondition in getName, forbidding a null return; and (iv) an undefined cause as well. In this case, several developers ignore clause nullable by default – weak precondition, chosen by six subjects. getName could be less restrictive as well, by weakening its postcondition (given the nullable clause), so three subjects answered strong postcondition.

Listing 2. NC2. Class Count from system JAccounting.
class Count {
String name;   

  //@ requires true;
  //@ ensures true;
  Count() {


  //@ ensures \result instanceof java.lang.String;
  //@ ensures \result.equals(this.name);
String getName(){    Weak precondition
return name;

public void testCount(){
    Count c = new Count();
    String s = c.getName();

    Regarding NC3, there is an invariant violation, for which subjects also suggested four likely causes: (i) weak constructor precondition, as it should restrict the permitted values for aID and dID (baseline); (ii) code error, allowing a call to the constructor that breaks the class invariant; (iii) strong class invariant, complex to be understood; and (iv) undefined cause. Choosing code error is reasonable if developers have defensive programming in mind – usually avoided in DBC. The class invariant constrains all fields, and they might be necessary indeed – medical care systems are critical. However, once the constructor has the default precondition, its weakness could be also considered. Therefore, both (i) and (iii) are acceptable, and only domain knowledge could determine the actual fault.

Listing 3. NC3.  Class Diagnostic from system HealthCard.
class Diagnostic {
    final short DESC = 160;
    final short TITLE= 30;

    byte appID,diagID;
    byte[] tit,d;

    /*@ invariant d != null && d.length == DESC;

    @ invariant tit != null && tit.length == TITLE;
    @ invariant 0 <= appID && appID <= 127;
    @ invariant 0 <= diagID && diagID <= 127;

byte aID, byte dID){    Weak precondition
        this.appID = aID;
        this.diagID = dID;

        this.tit = new byte[TITLE];
        this.d = new byte[DESC];

public void testDiag(){
    new Diagnostic((-1),10);
    For all nonconformances, there are four distinct answers. These results highlight the need of tool support to help developers. Furthermore, the need for tool support is also enforced by the percentage of Undefined answers (we found 13.33% in Likely Cause) when developers manually classifying small examples – each example has only a class and few methods (the longest example has five methods, as in – Listing 1); in a more complex context, the percentage of Undefined could increase considerably.

    Regarding Q2, the highest agreement with the baseline occurred in NC2, perhaps due to its simplicity (Listing 2). The low concordance for NC1 is probably due to distPointLine’s complexity, if compared with NC2 and NC3. There are several method calls into distPointLine’s body, which guided answers into a source code cause, or undefined (9 out of 15 answers).

Threats to Validity

Internal: subjects were not asked to fix the fault; without such commitment, they may have not put so much analysis effort over the answers. This problem is frequently related to voluntary-based surveys; we tried to make instructions as clear as possible, and offered a reasonable time (one month) for them to answer. In addition, the survey guidelines on the form may have influenced the practitioners’ answers.

Conclusion: there were 15 answers to our survey, so the sample is not representative for the universe of JML practitioners.

External: these results are valid only for the nonconformances presented in this section; they probably differ in nonconformances from other systems, from diverse domains.

Construct: the mapping performed between the free text answers and the set of likely causes considered in our approach may be subjective; the mapping was performed and reviewed by two researchers, separately.


In this work, we perform a survey of 15 JML practitioners on previously detected nonconformances in real JML systems, in order to compare practitioners answers with a baseline manually established, in the light of practitioners intuition over likely causes. There is a diversity of answers, most disagreeing with the baseline.

This result highlights how difficult it can be to manually establish a likely cause for nonconformances. In addition, our qualitative analysis of causes can be used as guidelines by developers in order to avoid nonconformances and when establishing causes for the detected ones.

As future work, we intend to investigate metrics for contracts, in order to further analyze the relationship between nonconformances and contract properties.