ContractJDoc - A new approach for documenting Java code

ContractJDoc

We propose and implement a Javadoc-like language extension (with it respective compiler) that we call
ContractJDoc. ContractJDoc provides a new way for documenting source code in which a developer can turn Javadoc comments into runtime checkable contracts by using a few new tags and following a specific pattern for writing comments (putting a boolean-valued expression into brackets). With ContractJDoc approach we try to fulfill the gap between informal documentation (such as Javadoc) and formal specification (such as JML) by allowing to the developer to write contracts by means of default tags from Javadoc (such as @param) and some new tags (such as @inv) in a controlled way. ContractJDoc supports preconditions, postconditions, and invariants.

    For creating a contract in ContractJDoc, the developer needs to write a booleanvalued expression into brackets attached to specific tags (such as @param). The expression can be composed by method calls (since the methods used have the tag @pure, indicating that those methods are side-effect free) and can have quantified expressions (either universal or existential expressions).

    ContractJDoc tries to benefit from the best features of Javadoc, JML, and Code Contracts. We mix natural language with controlled expressions for describing the method’s behavior. The controlled expressions that appear as boolean-valued expressions into brackets allows the runtime checking of the methods being developed (similar to what is achieved by using JML or Code Contracts).


Grammar

The grammar of ContractJDoc is available online:
Supporting infrastructure
We define ContractJDoc constructs as extensions to the traditional Javadoc block comments. ContractJDoc language is supported by the ajmlc-ContractJDoc compiler. The compiler, that is based on ajmlc, translates the comments in ContractJDoc notation to aspect-oriented assertions that, in runtime, check the conformance of the program behavior to the contracts.

    We divided our language in two levels. One level closer to Java developers without experience on contract-based languages (such as JML) and one level closer to JML expressions for those developers with experience on contract-based programming. The level zero is composed by conventional Javadoc tags, such as @param and @return that enables the writing of pre- and postconditions, respectively. In addition, the developer can use @throws or @exception for defining exceptional behaviors of a method. On the other hand, the level one has some additional tags for pre- and postconditions (@pre and @requires for preconditions and @post and @ensures for postconditions), the tag @inv for invariants; tags for quantified expressions: @forall for expressing universal quantifier, and @exists for existential; tags @doc_open and @doc_public for changing the visibility of fields or methods (for example, a private field in Java becomes publicly visible to a contract); @nothrows for declaring that a method will not throws an exception; @pure for declaring that a method is free of side-effects; @also for refining a inherited specification; and @old for referencing to the pre-state of an object.

    A runnable version of ContractJDoc with its supporting compiler is available online. In addition, the source code for ContractJDoc compiler is available at SourceForge.

Evaluation
    We here discuss in details how we evaluated the proposed approach. We performed three studies: a case study -- applying ContractJDoc to six open-source Java systems; an empirical study -- asking 24 Java developers to complete an implementation task according to a provided interface; and an exploratory survey -- evaluating the comprehensibiity of three documented Java interfaces: Javadoc, JML, and ContractJDoc.

Case Study
In this case study, we apply ContractJDoc to six Javadoc-annotated open source systems to evaluate compilation and generation infrastructure, assessing ContractJDoc’s applicability and effectiveness.

    Research Questions
    Our case study addresses the following research questions:
   
    Q1. Is the preciseness of ContractJDoc, when compared to Javadoc, useful for contract verification?
    We collect a few open source systems based on their use of Javadoc, applied ContractJDoc to each system and evaluate results in terms of detected conformance errors.

    Q2. What is the cost of applying ContractJDoc into systems with Javadoc?
    We discuss the problems faced when applying ContractJDoc in the described context.

    Systems
    The case study was performed on a convenience sample: six Javadoc-rich open source systems available at GitHub repository. They were selected based on the coverage of methodlevel Javadoc annotations. Projects are searched by a set of key phrases for looking into GitHub – we searched for phrases such as: "must be", "must not be", "should be", "should not be", "greater than", "not be null", "less than" into the Javadoc comments. After some visual filtering, we collected the five (5) most important classes in each system, based on overall dependence, and checked whether those classes contain method-level Javadoc comments for most of their methods. If so, the system was selected.
All systems with the contracts added in this study are available in a replication package. In order to run the ContractJDoc compiler, please copy and paste the folder aspectjml-lib into the folder of each system. Table 1 presents the systems characterization in terms of lines of code and contract clauses.
Table 1. Case study systems.
    Research method
    Three researchers applied ContractJDoc in six existing open-source systems available at GitHub repository. They followed a bottom-up approach for applying the ContractJDoc contracts: the researchers started applying ContractJDoc in the simplest methods and classes (or interfaces), continuing until the most complexes. The written contracts followed the Javadoc comments available in natural language (in English) and some of them were inferred from the experimental unit context and the source code of the methods. As a result, we were able to write 3,890 contract clauses: 1,892 preconditions, 1,979 postconditions, and 19 invariants. Figure 1 presents the steps performed by the researchers when applying ContractJDoc to the systems. The process is composed by four steps: 1) generation of the contracts based on the natural language comments available; 2) then, the contracts are compiled by means of ajmlc-ContractJDoc compiler, in order to generate the bytecode enriched with assertions. 3) next, the test suite available in each system is run over the contract-aware bytecode. 4) in the end, the results of the test suite execution are analyzed and conformance errors are investigated. All systems with the contracts added in this study are available in our companion website.

Figure 1. Steps for applying ContractJDoc to Javadoc-annotated systems.


    Results
    We grouped the contracts we wrote in common-case, repetitive with code and application-specific (see Table 2). Column System shows the name of the experimental unit used in the case study. Column #Clauses displays the number of clauses manually added in each system. Column #Errors presents the number of errors detected by the systems test suite after compiling the source code enhanced with contracts in ContractJDoc approach. Column Time to run reveals the time needed for compiling the whole project with its dependencies after applying ContractJDoc contracts. Columns #CommonCase to #Repetitive shows the contract clauses added in each system grouped by type.

Table 2. Case study results.


    Concerning the kinds of contracts, the only unit in which we were able to write more application-specific contracts– was OOP Aufgabe3 system. In this system, the majority of the written contracts are application-specific contracts (55%), however the common-case and repetitive contracts are really common (30% and 15%, respectivelly). On the other hand, in ABC-Music-Player, more than 90% of the contracts remains between common-case and repetitive code. For Dishevelled, the majority of the written contracts is classified as common-case (57.51%), other 36.92% are repetitive with code and only 5.57% are application-specific. In addition, all contracts written for Jenerics are concerned to verification of nullity from parameters or the return value, thus all contracts remains between common-case and repetitive code. In SimpleShop, the written contracts are distributed in the following manner: common-Case 60%, repetitive code 19%, and application-specific 21%; again the number of common-case and repetitive code outperforms application-specific contracts. Finally, in WebProtégé, the contracts are distributed in the following manner: common-Case 77.51%, repetitive code 14.38%, and application-specific 8.11%.

    When applying ContractJDoc to ABC-Music-Player, we found inconsistencies between the Javadoc comments and the source code. The problems occurred in the class Utilities (package sound) because there are comments concerning to a parameter declaring that the value of this parameter must not be greater than or equal to zero; however in the body of the methods there is an if-clause that throws exceptions when the value received by the parameter is less than zero. Those inconsistencies are problematic since someone writing a client-code for the class will always entering in the if and the exception will be thrown. We also found problems into Dishevelled project and WebProtégé project, concerning the use of exceptions in the Javadoc tag @throws that were not declared in the method’s signature.

    Discussion
    We wrote more postcondition and precondition than invariants. Concerning to pre- and postcondition, in the project ABC-Music-Player and WebProtégé, we wrote almost twice more postconditions than preconditions. For the unit ABC-Music-Player this is related to the number of getter methods in the unit and to the Javadoc available; moreover, based on the comment and the body of a method, define a postcondition appears to be more direct than a precondition, as we do not know the clients of the methods. And for WebProtégé system, this difference is also related to the comments available and to the contracts we were able to infer from method bodies.

    As a proof of concept, ContractJDoc and its compiler (ajmlc-ContractJDoc) enabled us to write runtime checkable code for third-party systems based on the available comments. As expected, the quality and variety of the contracts depended strongly on the available comments, however, we were able to detect and correct inconsistencies and missing expressions between source code and comments. For instance, the inconsistency between comment and source code into ABC-Music-Player unit (Utilities class), in which comments concerning to a parameter value declares this parameter must not be greater than or equal to zero, however in the body of the methods there is an if-clause that throws exceptions when the value received by the parameter is less than zero. According to what is expected for the methods in this case, the source code seems to be right, so, when writing contracts for the class, we followed the definitions presented in the conditional clause of the method’s body.

    With respect to the process of applying the approach, in general it is simple and easy, mainly for small systems (such as OOP Aufgabe3, and SimpleShop). In large systems, as in Dishevelled and WebProtégé, the application of ContractJDoc can become time consuming because the number of classes and methods that a developer needs to analyze. As in contract-based context, the contracts added to the code are so expressive as the comments available in the Javadoc: if there are more comments in which we are able to apply the approach, more expressive will be the contracts.

    Due to its size, results from this study cannot be generalized; its purpose is evaluating applicability and relative usefulness. The sample is not representative, since there is no available estimate of the Javadoc-rich project population in Github, then probability sample is impossible. Our approach is as systematic as feasible in selecting the evaluated project – manual translation does not scale, then sample contains only six projects. Dishevelled and WebProtégé sizes set them apart from the other systems. For instance, Dishevelled is more than 56 times bigger than ABC-Music-Player, 43 times bigger than Jenerics, 313 times bigger than OOP Aufgabe3, and 234 times bigger than SimpleShop.

    Threats to validity
    Concerning to construct validity, in order to reduce the threat on the manually-defined contracts, all systems were annotated and reviewed by three researchers, separately. Regarding external validity, the systems we used to evaluate the applicability of our language and compiler may not be representative of the real use of Javadoc in real systems; however, we were able to detect inconsistencies between Javadoc comments and source code, as occurred in Utilities class (ABC-Music-Player experimental unit) in which the comment for a parameter of the methods is the right opposite of the expected behavior in the source code. The comment declares “param a - must not be >= 0” but in the body of the methods there is an if-clause that launches an exception when the parameter value is less than zero. A developer writing a client-code for this class based on the documentation available (Javadoc comments) will write a piece of code that will always throw exception.

Empirical Study
In this study, we aim to evaluate what is the impact of mixing natural language with lightweight formal contracts in ContractJDoc.

Definition

The goal of this empirical study is to investigate ContractJDoc, for the purpose of evaluation with respect to readability and understandability, from the point of view of developers in the context of Java programming language. We group our research questions concerning to the factors evaluated. We have two factors: the task performed by the developer (task), and the documenting approach used (approach). Those factors have the following treatments: client and supplier – for task; and Javadoc, ContractJDoc, and JML – for approach. In particular, we address the following research questions:
    Q3. With respect to the task performed by each developer:
    (a) How difficult is the required task?
We ask the developers to perform a task based on a given documented interface. Then, we measure the difficult by means of a Likert-type scale.

    (b) How correct is the source code produced in a task?
We check the source code by using some manually produced test cases in order to investigate how well the developer followed the rules from the comments.

    Q4. Regarding the documenting approach used by the developer:
    (a) How difficult is for a developer to perform the required task by using the available documented interface?
By using a Likert-type scale, we measure the difficult for each approach considered in this experiment.

    (b) How correct is the source code produced by a developer?
We check the source code by using some manually produced test cases in order to investigate how well the developer followed the restrictions available in the comments.

    Q5. Concerning to the experience of the Java developers:
    (a) How difficult is for a developer to perform the required task?
By using a Likert-type scale, we measure the difficult for each experience level considered in this experiment.

    (b) How correct is the source code produced by a developer?
We check the source code by using some manually produced test cases in order to investigate how well the developer followed the rules from the comments.


    Participants
    The participants of our experiment are grouped by experience level, as follows:
  • Low
    • developer who has used/been using Java in classes and has no experience with contract-based languages (such as JML);
    • developer who has used/been using Java in personal projects and has no experience with contract-based languages.
  • Medium
    • developer who has used/been using Java in academic environment and has no experience with contract-based languages (such as JML);
    • developer who has used/been using Java in classes or in personal projects and has some experience with contract-based languages;
    • developer who has used/been using Java in academic environment and has some experience with contract-based languages;
    • developer who has used/been using Java in industrial environment and has no experience with contract-based languages.
  • High
    • a developer who has used/been using Java in industrial environment, and has some experience with contract-based languages (such as JML).
    Study Methodology
    In this study, we addressed two factors: approach for commenting source code, task to be performed; with the following treatments: Javadoc, ContractJDoc, and JML – for approach, and client and supplier – for the task. Moreover, we use two Java equivalent interfaces in this experiment: Stack and Queue. We use a factorial design. In a factorial design, we randomly assign subjects to each combination of the treatments. For the purpose of this experiment, each triple <approach, task, interface> is called a trial (a combination of the treatments). Since we have three documenting approaches, two tasks and two Java interfaces, we have a total of 12 trials in the experiment.

    The experiment uses a balanced design, which means that there is the same number of persons in each group (block). The assignment Participant - trial is performed by using a completely randomized design in order to not bias the results.

    The experiment was executed offline, i.e., participants received the experimental material via an online Survey platform that we use to collect the results. An example of survey sent to the participants can be found online. Each participant received an experiment package, consisting of (i) a statement of consent, (ii) a pretest questionnaire, (iii) instructions and materials to perform the experiment, and (iv) a post-test questionnaire. Before the study, we explained to participants what we expected them to do during the experiment: they were asked to perform an implementation task (a supplier or a client code) for the provided interface. Each participant received one of the following tasks: create a supplier code for an interface or a client code for using the interface methods.

    Before starting the experiment, we ask each participant to fulfill a pre-study questionnaire reporting their programming experience (with respect to Java and contract-based programming). After filling in the questionnaire, we randomly selected a task for each of them.

    The first part of the experiment consists on the following activities: (1) Apply a questionnaire pre-experiment – in order to collect information on developers experience; (2) Give some kind of training on the documenting approach, such as JML and ContractJDoc; (3) Ask them to execute the tasks – for each developer will be given one task for one approach with one Java interface; (4) Apply a post-experiment questionnaire – in order to collect qualitative information on developers view of each task.

    Results
    We had the participation of 27 Java developers in our experiment. We needed to discard three developers in order to maintain a balanced number of participants in each trial. Thus, we maintained 24 developers, from those, one developer has a low level of experience, 19 were developers with a medium level of experience, and four were developers with a high level of experience.

    Concerning the task performed by the developers, we present in Table 3 the answers for difficult (the answers follow a Likert-type scale, varying from Very Difficult to Very Easy, with five values). The implementation of the documented interface (Supplier task) appeared to be easier than the task of creating a Client class: four Participants answered Very easy for Supplier task whereas just two answered for Very easy for Client.

Table 3. Answers grouped for difficult with respect to the task performed, the approach used, and the experience level of each Participant.

 

    In addition, when analyzing the source code produced by the Participants, the task of implementing the interface (Supplier) got more source code corrects (correct using the documentation as oracle): seven correct implementations, whereas the Client task got just four correct implementations.

    When grouped by documenting approach, the results present Javadoc as the easiest approach to be understood (see Table 3). This is expected since Javadoc is a well established approach for documenting Java code. In addition, ContractJDoc was perceived as easier than JML: the former received three answers Very easy whereas JML did not receive any Very easy answer.

    Javadoc and ContractJDoc were the only documenting approaches in which all the Participants were able to produce a code satisfying the oracle (respecting the restrictions available in the comments). On the other hand, JML had one case in which the contract it was not satisfied by the implementation.

    Summarizing the answers for difficult of each trial regarding the experience level of each Participant, we found that Participants with high level of experience tend to consider the task easier when compared to those with a medium or low level (see Table 3). Half of the Participants with high level assigned very easy to the trial whereas just 21% from medium level.

    With respect to the source code produced when grouped by the experience level of the Participants, Participants with medium level of experience produced more code satisfying the restrictions available on the comments (100%). On the other hand, just 75% of the Participants with high level produced code satisfying the restrictions.

    Discussion
    We proceed with the discussion over the research questions.

    Q3. (a) We ask each developer to perform one task: either implement a given interface (task we called Supplier) or implement a client code for using the methods provided by an interface – such the use of an API (task we called Client). Analyzing the answers grouped by task (Table 3.3) we have more people assigning Very easy for the task Supplier (four – 33.33%). Furthermore, the other eight developers that performed the Supplier task, assigned Easy for the task. This means that, based on the comments of an interface, it is easier to a developer write a code for the methods of the interface than using the methods (such as an API).

    Q3. (b) The Client task was the most correct: all developers written client code in accordance with the contracts from the provided interface. This means that when writing a client code, developers tend to pay attention to the documentation available.

    Q4. (a) The easiest approach for implementing some code based on the provided comments is Javadoc. All answers remains between Easy (5) and Very easy (3). This is an expected result due to the fact Javadoc is largely known (in contrast of JML and
ContractJDoc). Nevertheless, ContractJDoc received the same number of Very easy answers as Javadoc and more than JML – JML did not received any Very easy answer. This result indicates ContractJDoc as an approach in a middle level between Javadoc and JML. The proposed approach is easier than JML and can be used for the same purpose: conformance checking of Java code, since the comments on ContractJDoc and JML approaches enable runtime checking. Therefore, the proposed approach has the benefits of being easy to understand (create a code based on the comments) – 75% of the answers for difficult between Easy and Very easy – and enabling the runtime checking of the comments by using the ajmlc-ContractJDoc compiler.

    Q4. (b) Concerning to the code correctness, all Participants using interfaces documented with Javadoc and
ContractJDoc produced code in accordance with the contracts available in the provided interfaces. Only one developer using an interface with JML contracts who was not able to satisfy all the contracts: in one method the source code produced is not in conformance with the provided contracts.

    Q5. (a) From the answers, we found evidence that more experienced developers (classified with high experience level, in this work) tend to have less difficult with programming tasks (either Supplier or Client tasks) than developers with a smaller level of experience (namely medium level).

    We had two people in each trial, so we were able to analyze the similarities and differences regarding each trial. With respect to experience level, we had eight pairs with the same experience level: medium. In those pairs, we analyzed the answers for difficult in each trial performed and found the same number of occurrences (four) for distinct and equal answers. Therefore, we have no evidence for claim that people with the same experience level will or not have the same level of difficult when performing the same task. In our experiment, the groups of equal and not equal answers for difficult have the same size.

    On the other hand, when evaluating the pairs with distinct experience levels ([high, low] and [high, medium]) we found that high and medium tends to have more distinct answers than high and low. The result [high, low] with the same difficult was a surprise since people with low experience level would probably have more difficult when performing a given programming task. However, the result [high, medium] was expected since more experience is likely to be related to less difficult with programming tasks.

    Q5. (b) The Participants classified as having a medium experience level, were those who produced code according to the restrictions available in the documentation more times. This result can be related to the fact that we had a number of Participants in this experience level almost five times higher than in high or low (four and one, respectively) levels.


    Threats to validity
    Concerning to internal validity, all material for the empirical study is available only in English, however, the native language of all participants is Portuguese. Therefore, the experience of the participants with English can have affected their performance across this study. Regarding external validity, we had only 24 Participants in our experiment and this sample is probably not representative of the real population of Java developers. In addition, we used only two similar data structure interfaces: queue and stack. In other domains with more complex structures, the results can vary considerably.

Comprehensibility Survey
We conducted an exploratory study that involved data collection through a survey with Java development professionals. This section describes the survey design, participants, results and discussion. The goal of this survey is to compare three documentation approaches (Javadoc, ContractJDoc, and JML) with respect to comprehensibility, from the point of view of developers. In particular, our survey addresses the following research question:

Q6. In the developer’s opinion, which approach, among Javadoc, ContractJDoc and JML, is the most effective in communicating a routine’s behavior?

    Survey Design
    For this study, we followed a quantitative method based on a web-based survey instrument, suited to measure opinions and behaviors in response to specific questions, in a non threatening way. The questions were available as an online form.

    The survey instrument begins a purpose clarification along a consent term. Then a characterization of the respondent is conducted by some questions related to Java experience and experience with contract-based programming. Next, the survey is presented: links for three Java interfaces with each one documented in a different approach is showed, then some questions related to the understanding of the behavior of a class implementing the interfaces based on the comments available is asked.

    We used likert-scale questions. In two questions we ask the developers to choose the most simple/understandable documentation approach: one question more specific – related to the provided interface; and one more general, concerning the use of the approach in a general context.

    Participants
    In order to generalize the results to the desired population, the selection must be representative for that population. The selection of subjects is also called a sample from a population. Concerning to that, we have a non-probability sampling technique called convenience sampling: the nearest and most convenient persons are selected as subjects. We send the survey link to the email lists from our universities, to some project lists, and to the jml-interests lists, the main forum for discussions on JML implementation and design. In addition, our contacts made a snowball approach, sending the survey to their respective Java contact lists, increasing the sample and the number of participants in our study.

    The survey was open for three weeks (from June to July 2016) and received 142 answers (from an estimated total of 700 who received the link, 20% response rate). The participants of our survey (called Subjects henceforth) are also grouped by experience level.

    Results
We had the participation of 142 Java developers in our survey. From those, 61% are classified as Medium experience level, 24.1% as Low and 14.9% as High.


With respect to the survey answers, 50.7% of the Subjects chose Javadoc as the simplest approach to understand when using it in a general context. In addition, for 38.0% of the Subjects Javadoc is also the most understandable approach with regard to the provided interface – see Table 4.

Table 4. Subjects’ answers on easiness to understand each documenting approach grouped by the context: general and the context of the experiment performed.

The survey results confirmed the results from the empirical study: Javadoc is the most understandable documentation approach. Moreover, ContractJDoc results are intermediate between JML and Javadoc. When compared individually for the provided interface, for 73.1% of the Subjects, the Javadoc-documented interface is easy or very easy to be understood, 60.3% considered the
ContractJDoc as easy or very easy to be understood, whereas only 39.7% of the Subjects considered the JML interface as easy or very easy to be understood – see Table 5.

Table 5. Subjects’ answers to the individual evaluation of comprehensibility for each documentation approach.


In order to evaluate the comprehensibility of the documentation approaches over the Subjects’ experience levels, Table 6 groups the results for low, medium, and high experience for each language.

Table 6. Subjects’ answers on comprehensibility of each documentation approach grouped by experience level.


    Discussion
    Javadoc is the most understandable documentation approach. Moreover, ContractJDoc results are intermediate between JML and Javadoc. In addition, as we can see in Table 3.6, in all levels, Javadoc is considered the most understandable documentation approach – as the accumulated result for Easy and Very easy outperform the result for the other approaches. Nevertheless, an interesting result arises when comparing the answers from low and high experienced Subjects: any Subject with low experience level chose Difficult for Javadoc, whereas 5% of the high ones chose. In addition, Subjects with high experience considered ContractJDoc easier than JML, with an accumulated rate of Easy and Very easy of 76% against 57% from JML.

    Overall, this survey enabled us to confirm the results from the experiment: ContractJDoc is easier to understand than JML, but harder than Javadoc. Furthermore, the results showed Javadoc as the easiest approach concerning to the comprehensibility of the behavior of a documented interface.

    Threats to validity
    Concerning to internal validity, all material for the survey study is available only in English, however, the native language of some Subjects is Portuguese. Therefore, the experience of the Subject with English can have affected their comprehensibility of the behavior of the provided interface.

    Related to conclusion validity, the majority of our sample (84 Subjects – 59.15%) did not know contract-based languages, so the result of Javadoc being the most understandable approach with respect to the methods’ behavior is expected. In order to overcome this threat, we made the survey available to the JML community; however, the rate of answers was low.

    The order in which we display the documented interfaces on the survey form, the questions used for evaluating comprehensibility, the kind of questions used, and the absence of open questions can threat the construct validity. For dealing with these threats we perform a pilot before applying the survey and used the results from the pilot to improve the survey structure as a whole.

    Regarding to external validity, even receiving a satisfactory number of answers – 142 – to our survey, the results are not representative for the community of Java developers. In addition, we used only one data structure interface: stack, for asking about the comprehensibility of the interface behavior. In other domains with more complex structures, the results can vary considerably.

Comments