A Survey on Nonconformance Classification by Java/JML Developers

    In this page we provide more details on the survey study with Java/JML Developers.

We make publicly available a replication package containing data for replication purposes and to support future studies. The package contains: (i) the source code of nonconformance presented in the survey study for the JML practitioners, (ii) the source code of a test case that reveals each nonconformance; (iii) JML practitioners answers and the map performed between practitioners' answers and our set of likely causes for nonconformances.

Research Questions

The goal of this experiment is to analyze two approaches (JML practitioners and JmlOk2) for the purpose of evaluation with respect to identifying nonconformances from the point of view of researchers in the context of real Java/JML projects. In particular, our experiment addresses the following research questions:


Q1.
How diverse are the causes provided by developers for a given nonconformance? What are the possible reasons for the answers?

We discuss the range of answers for each nonconformance and speculate over their reasons.

Q2. How do answers compare to the baseline?

For each nonconformance, we discuss the number of coincident answers between practitioner answer and baseline.

Q3. How do practitioners suggest a cause for a nonconformance?

For each nonconformance, we analyze what are the guidelines a developer follows when establishing a cause for a nonconformance, based on their subjective answers.

Methodology

From 84 nonconformances reported by Milanez et al. for JML systems, we randomly select three (labelled NC1, NC2, NC3), which are showed in Listings 1 to 3.

The survey instrument, available as an online form, first presents an overview of the classification model, along with guidelines for analysis. Next, each nonconformance is presented with location (package, class, and method) and type, along with links to the source code and the smallest test case revealing the nonconformance, emulating a debugging scenario. The form prompts the practitioner to suggest a category for the nonconformance: Specification error, Code error or Undefined; the likely cause must be provided as free text. We made this form available to the jml-interest mailing list -
jmlspecs-interest@lists.sourceforge.net
 - , the main forum for discussions on JML. A JML practitioner has some experience with JML and subscribed to the discussion list.

    For creating the baseline, two researchers initially examine source code, contracts and the smallest test case that shows the nonconformance, and determine the error and likely cause. Next, they meet to reach an agreement. Their verdict was sent to developers of the three systems (Bomber, JAccounting, and HealthCard), and they all agreed on the classification.

Table 1. Randomly selected nonconformances for the survey. The Label corresponds to the name of each nonconformance. The System corresponds to the contract-based program in which the nonconformance was detected. The Class and Method columns provide the nonconformance location within the program. Column Type gives the type of the nonconformance. Column LOC shows the number of lines of Java code in each class. Column #CC presents the number of JML contract clauses of each class. Finally, Column #Method displays the number of methods of each class.

These nonconformances were made available for JML experts by means of the following form:

Results

Fifteen voluntary JML practitioners answered the survey instrument, as showed in Figure 1. Each subject’s answer is mapped into our classification model – weak or strong precondition, weak or strong postcondition, strong invariant or code error. While several answers are straightforward to map – “Method <int> division(int, int) has a STRONG PRECONDITION (requires y> 0)”–, in others this is not the case; for instance, “Invariant is very complex, I could not, from the context, infer which one is wrong” is classified into Strong invariant. Also, we map to Code error answers such as “Code doesn’t treat values default for parameters to respect invariant”. The mapping performed is available on Table 2.

Figure 1. Survey answers, by nonconformance. The green bar indicates the baseline result.

Table 2. Mapping manually performed between JML experts' answers and our set of likely causes.


As likely cause for each nonconformance, four distinct answers are given. Overall, most answers differed from the baseline. For NC2 and NC3, the baseline cause was among the most mentioned causes. Nevertheless, it is visible that answers are highly diverse. Concerning individual answers, almost one third never agree with the Baseline, while another third agree twice. Almost half of the Subjects present only one agreement. In general, there is at least one agreement between subjects and baseline in 73.3% of the cases.

Discussion

The response rate for the survey has been approximately 15% (15 out of 103 e-mail recipients). The results show considerable diversity in answers, providing data for detailed analysis of Q1.

    a precondition error is revealed by testDiv() in class Division; a call to distPointLine induces an internal call to div with y negative. This problem may happen whenever small values are given to distPointLine due to the internal calls to square, squareRoot and mul in which the received values are shifted by at least 10 bits. The developer may have chosen this structure to the methods considering Bomber context: a mobile game in which distances are calculated based on screen points (that can be big values); so, the bit shifts performed may not produce a negative (or even zero) value when calling div method.

    Participant answers vary among four likely causes: (i) code error in distPointLine, since it does not scale by 10 the arguments, as done within other Division methods; (ii) strong precondition in div, since y should be possibly negative (baseline); (iii) weak precondition in distPointLine, allowing calls when vx and vy are small, resulting in calls to div with nonpositive y, or (iv) cause is undefined. Answers (i) and (ii) totalize 10 out of 15 answers, 6 and 4, respectively, indicating balance between code and contract. Closer analysis of div’s precondition offers a few clues in how alternative answers may have come up. It is reasonable to assume developers think the precondition in distPointLine might have to be reinforced to avoid a call with low values - y must be greater than zero, otherwise a division by zero occurs. Domain-specific knowledge normally drive debugging; nevertheless, in contract-based programs, establishing the cause of a nonconformance is heavily dependent on the trade-off between weak and strong contracts. Code error is the most frequent answer; this might be evidence that developers tend to seek patterns when debugging - distPointLine, at first sight, deal with numbers differently from other methods, like square or div. Therefore, options (ii) and (iii) are more likely to be the real cause of this problem than an error into distPointLine’s body. In fact, code error is often answered for all nonconformances in the study, although none is baseline.


Listing 1. NC1. Class Division from system Bomber.
class Division {

    //@ requires true;
  //@ ensures \result == (x * y >> 10);

  int mul(long x, long y) {
    return (x * y >> 10);
  }

 
  //@ requires y > 0;
    Strong precondition
  //@ ensures \result == ((x<<20)/y) >> 10;
 
int div(int x, int y){
    return ((x<<20)/y) >> 10;
  }


 
int sqrRoot(long n) {
    if (n <= 0)
return 0;
   
long s = (n + 1024) >> 1;
    for(
int i=0;i<8;i++){
        s = (s+(((n<<20)/s)>>10))>>1;
    }

   
return s;
  }


 
int sqr(long n) {
    
return ((n * n) >> 10);
  }


 
int distPointLine(int x, int y, int vx, int vy){
   
int dist = sqrRoot(sqr(vx) + sqr(vy));
    
return div((-mul(vx, y)+mul(x, vy)), dist); 
  }
}

   
 public void testDiv(){
   Division.distPointLine(32,-4,10,1);
}

    For NC2, a postcondition violation is exposed by testCount, after a call to Count’s constructor, in the first call to getName, when field name is still null. In this case, the developer may have forgot to remove or to add an implementation for the default constructor of the class; or even s/he may have forgot to update the clauses into getName’s postcondition after added the nullable-by-default clause to the class. So, any call to the default constructor followed by a call to getName will break the method’s postcondition (the second clause nresult.equals(this.name) will produce a NullPointerException because the field name is yet null).
    In this postcondition problem, participants mentioned as causes: (i) weak precondition in getName (Listing 2), since name may be null, the precondition should forbid this state (baseline); (ii) code error because name is not initialized; (iii) strong postcondition in getName, forbidding a null return; and (iv) an undefined cause as well. Answers (i) and (ii) encompass 11 out of 15 answers, 6 and 5, respectively, also indicating balance between code and contract. We may assume that code is correct, because there is a clause declaring Count class as nullable by default; in this case, the developer does not need to initialize all the fields. Several developers ignore clause nullable by default { weak precondition, chosen by six participants. getName could be less restrictive as well, by weakening its postcondition (given the nullable clause), so three participants answered strong postcondition.

Listing 2. NC2. Class Count from system JAccounting.
class Count {
    
String name;   

  //@ requires true;
  //@ ensures true;
  Count() {
  }

 

  //@ ensures \result instanceof java.lang.String;
  //@ ensures \result.equals(this.name);
 
String getName(){    Weak precondition
   
return name;
  }
}


public void testCount(){
    Count c = new Count();
    String s = c.getName();
}

   
    Regarding NC3, there is an invariant violation, when the constructor of Diagnostic is called with one of its parameters (aID) as a negative value, breaking the invariant clause related to appID field. For this violation, participants also suggested four likely causes: (i) weak constructor precondition, as it should restrict the permitted values for aID and dID (baseline); (ii) code error, allowing a call to the constructor that breaks the class invariant; (iii) strong class invariant, complex to be understood; and (iv) undefined cause. Answers (i) and (ii) make up 10 out of 15 answers, 5 and 5, respectively, again indicating balance between code and contract.
    Choosing code error is reasonable if developers have defensive programming in mind - usually avoided in DBC. Even though our sample consists of JML programmers, it may still be the case. The class invariant constrains all fields, and they might be necessary indeed - medical care systems are critical since human lives may be involved. However, once the constructor has the default precondition (the implicit requires true), its weakness could be also considered. Therefore, both (i) and (ii) are acceptable, and only domain knowledge could determine the actual fault. Assuming that, in the context of contract-based programs, usually restrictions about parameters are added to the precondition, and no further inspection on those parameters are present into the method body, cause (ii) could be discarded.


Listing 3. NC3.  Class Diagnostic from system HealthCard.
class Diagnostic {
    final short DESC = 160;
    final short TITLE= 30;

    byte appID,diagID;
    byte[] tit,d;

    /*@ invariant d != null && d.length == DESC;

    @ invariant tit != null && tit.length == TITLE;
    @ invariant 0 <= appID && appID <= 127;
    @ invariant 0 <= diagID && diagID <= 127;
    @*/


    Diagnostic(
byte aID, byte dID){    Weak precondition
        super();
        this.appID = aID;
        this.diagID = dID;

        this.tit = new byte[TITLE];
        this.d = new byte[DESC];
    }


public void testDiag(){
    new Diagnostic((-1),10);
}
 
    For all nonconformances, participants provided four distinct answers. These results highlight the need for tool support in order to help developers. Furthermore, the need for tool support is also enforced by the percentage of Undefined answers (we found 13.33% in Likely Cause) even when developers are manually classifying small examples - each example has only a class and few methods (the longest example has five methods, as in - Listing 1); in a more complex context, the percentage of Undefined could escalate.

    Regarding Q2, the highest agreement with the baseline occurred in NC2, perhaps due to its simplicity (Listing 2). The low agreement for NC1 is probably due to distPointLine’s complexity, if compared with NC2 and NC3. There are several method calls into distPointLine’s body, which guided answers into a source code cause, or undefined (9 out of 15 answers); six and three, respectively. Code error is the most frequent answer; this might be evidence that developers tend to seek patterns when debugging { distPointLine, at first sight, deal with numbers differently from other methods, like square or div.

    With respect to Q3, practitioners answers enabled us to gather guidelines for the process of nonconformance classification: practitioners look the code at first, then the method’s precondition (and postcondition); in the end, they proceed to the requirement. For precondition nonconformances, most participants start by investigating the code (the caller’s body), checking for statements that could break the precondition; then, investigate whether the caller’s precondition should be reinforced - e.g. analyzing the requirement in comparison to the precondition, the definition must be stronger or equivalent to the precondition; next, analyze the context for deciding whether the precondition is stronger than needed.

    For postcondition problems, participants started the analysis by the method’s body, checking for statements in the source code that could break the postcondition; then, they investigate whether the method’s precondition should be reinforced; next, they analyze the context for deciding whether the postcondition is stronger than needed.

    Finally, for invariants, the analysis seems to start by checking the code that change state fields (fields that are referenced into the invariant); then, they investigate whether the method’s precondition or postcondition should be reinforced - e.g. analyzing the problem’s definition in comparison to the precondition, the definition must be stronger or equivalent to the it; next, analysis the context for deciding whether the invariant is stronger than needed.

    As an example, let us use the guidelines for suggesting a likely cause for the invariant problem (Listing 3). The first step is to check the constructor’s body:


Diagnostic(byte aID, byte dID){
    super();
    this.appID = aID;
    this.diagID = dID;

    this.tit = new byte[TITLE];
    this.d = new byte[DESC];
}


    Since there are only initializations for the class’s fields, the code seems to be right. Then, we look at the constructor’s precondition. The absence of an explicit precondition implies the JML default @requires true;, meaning that any value can be received by the parameters aID and dID; however, those parameters are used for initializing the class’s fields appID and diagID, mentioned into the class invariant (see Listing 3). Thus, a precondition should be added to avoid values out of the range of valid values (according to the invariant’s restrictions) and the likely cause suggested is Weak precondition.

Threats to Validity

We now discuss the threats to validity we identified when performing this survey and how we managed to overcome them.

    With respect to Internal validity, participants were not asked to fix the fault; without such commitment, they may have not put so much analysis effort over the answers. This problem is frequently related to voluntary-based surveys; we tried to make instructions as clear as possible, and offered a reasonable time (one month) for them to answer. In addition, the survey guidelines on the form may have influenced the practitioners’ answers. Furthermore, the contracts, code, and tests are presented to the respondents without any reference to an intended context. In this situation, respondents need to \guess" what is the correct answer.

   
From the point of view of Conclusion validity, there were 15 answers to our survey, so the sample is not representative for the universe of JML practitioners. In addition, the number of nonconformances is not representative for the whole set of problems a developer may face when debugging contract-based programs; however, due to the time required for manually analyzing each nonconformance, we chose three in order to maximize the number of participants in our survey. Regarding External, these results are valid only for the nonconformances presented in this section; they will probably differ in nonconformances from other systems, from diverse domains.

   
Finally, concerning Construct validity, the mapping performed between the free text answers and the set of likely causes considered in our approach may be participative; in order to overcome this limitation, the mapping was performed and reviewed by two researchers, separately.

Answers to the Research Questions

The results in this study indicate the following conclusions for the research questions delineated:

    Q1. How diverse are the causes provided by developers for a given nonconformance? What are the possible reasons for the answers?

    For all nonconformances, no agreement was significant - there are four distinct answers for each of them. Furthermore, the need for analysis tool support is enforced by the percentage of Undefined answers (we found 13.33%) when developers manually classify small examples - each example has only a class and few methods (the longest example has five methods, as in - Listing 1); in a more complex context, the percentage of Undefined tends to raise.

    Since a defect causing a nonconformance may be due to source code, contract or both, it is expected that developers provide some variety in their answers as they can give their analysis towards code or contract problems or even both.

    Q2. How do answers compare to the baseline?

    Regarding Q2, the highest agreement with the baseline occurred in NC2, perhaps due to its simplicity (Listing 2). The lack of consensus on NC1 is probably due to distPointLine’s complexity, if compared with NC2 and NC3. There are several method calls into distPointLine’s body, which guided answers into a source code cause, or undefined (9 out of 15 answers).

    Q3. How do practitioners suggest a cause for a nonconformance?

    Practitioners look the code at first, then the method’s precondition, postcondition, and the class invariant; in the end, they look the requirement.

Conclusions

In this work, we present and discuss results from surveying 15 JML practitioners who, by examining three examples of defective JML fragments within Java methods in real systems, tried to classify contract failures (nonconformances) in terms of their actual defect. Our aim is to gather evidence on how developers disagree in establishing causes for nonconformances with formal contracts embedded into program code. Their answers are compared with a manually-established defect baseline, which had been previously confirmed by actual system developers.

    Despite all participants having moderate to high experience with JML, answers are, in general, significantly diverse, most disagreeing with the baseline. Participants suggested at least four distinct causes for each nonconformance. This result highlights how difficult it can be to manually establish a likely cause for nonconformances. As an additional task, we asked participants to comment on the method used by them to make the decisions, then qualitatively analyzed those answers in order to provide a set of guidelines to avoid nonconformances and determine nonconformance causes: in general practitioners inspect the code at first, then the method’s precondition or postcondition; they only get back to the method’s requirement for comparison with the actual contracts. Considering precondition problems: the first step a developer should consider is to analyze the source code looking for a cause for the precondition problem in the caller’s body; then, analyze the precondition of the caller; next, analyze method calls made before calling the method source of the precondition failure; finally, analyze the context of the system in order to decide whether the method’s precondition is too strong. Moreover, the requirement can be compared to the precondition for verifying whether it is equivalent, otherwise, the requirement may be reinforced.

    As future work, we intend to carry out additional studies with DBC practitioners, specially regarding contract reading and debugging, on several language environments (plain Java assertions, JML or code contracts). In addition, we aim to investigate contract metrics, in order to deeply analyze the relationship between nonconformances and contract properties; for instance, it is desirable to look for evidences for hypotheses on how more complex contracts make it harder to find causes to nonconformances.