Comparison Automatic x JML Experts

From the 84 nonconformances that we discovered in our experimental units we randomly selected 10 (using the sample command from the R statistical tool (version 3.0.1)). These nonconformances are presented on Table 1. The detected nonconformances were ordered in the following manner: first we ordered the detected in sample programs – starting with BoundedStack package, then stacks, dbc, misc and finally list; then Bomber, HealthCard, JAccounting, Mondex and TransactedMemory. In each experimental unit the nonconformances were ordered alphabetically by the name of the methods where the nonconformances were detected.

Table 1. Randomly selected nonconformances released to JML experts categorize. The nonconformance number corresponds to the position of the nonconformance considering our 84 nonconformances – the counting starts in sample programs (BoundedStack) and continues until the last nonconformance discovered in TransactedMemory unit. Experimental Unit gives the name of the experimental unit. Class and Method columns give information about location of the nonconformance into the experimental unit. Finally, column Type gives the type of the nonconformance.

The form that we use to ask the JML experts is available online. The form has the following structure: first, we present the proposed three-level model for categorization and an overview of this model; next, a methodology to performs the manual categorization of nonconformances is suggested; then, for each nonconformance selected we present some details about the nonconformance – information about location (experimental unit, package, class, and method) and the nonconformance’s type –, a link for the contract-based program corresponding, and a test case that reveals the nonconformance; finally, we ask to JML expert give a categorization for the nonconformance, choosing a category for the nonconformance – between Specification error, Code error or Undefined; and suggesting a likely cause.


There were three voluntary JML experts that answered our form. The results are presented on Table 2.

Considering the matches metric we found that our approach is similar to JML experts categorization; furthermore, the most similar result occurred with Subject 1 – matches = 0.40; with Subject 2 and Subject 3 the matches (0.30) was the same. Therefore, our automatic categorization approach is similar to categorization performed by voluntary JML experts.