Nonconformance between Programs and Contracts: A Study on C#/Code Contracts Open Source Systems

We evaluate our approach in 12 C#/Code Contracts real projects with respect to automatic detection and manual classification of nonconformances. Moreover, we investigate how practitioners write contracts in Code Contracts by analyzing

all 12 systems evaluated with ContractOk.

We make publicly available a replication package containing data for replication purposes and to support future studies. The package contains: (i) the source code of ContractOk tool with a runnable version of the tool; (ii) the source code of all experimental units used in this study; (iii) all test cases generated by ContractOk that enabled us to detect the 63 nonconformances reported in this page.

Definition

The goal of the first case study is to analyze one approach (implemented for ContractOk) for the purpose of evaluation with respect to detection and classification of nonconformances from the point of view of researchers in the context of contract-based programming. In particular, we address the following research questions:

Q1 How many nonconformances is the approach able to detect in real contract-based programs for a given time limit and which are the most common types and likely causes of those nonconformances?

We measure the number of detected nonconformances (#NCs, henceforth) in each system within a given test generation time limit, and analyze the frequency of nonconformances by types and likely causes.

Q2 What is the testing cost for detecting a nonconformance?

We measure and summarize metrics that define the complexity of the first failing test case, in terms of B and D.

The goal of the second case study is to discover how practitioners write contracts in Code Contracts for the purpose of evaluation with respect to kinds of contract clauses written from the point of view of researchers in the context of contractbased programming. In particular, these additional questions are defined:

Q3 Which are the most common contract types written by developers?

We measure the frequency of contract clauses (#CC) for each contract type in each system and discuss their differences.

Q4 Are there any relationship between contracts complexity and the number of detected nonconformances? And what is the relationship between contract type and nonconformance type?

We measure contract complexity (CCo) and relate this metric with #NC for each system, by using a correlation test. We also relate the most common contract type with the most common nonconformance type.

Q5 Which are the most common nonconformance types for the units with the most complex contracts?

We rank the top five systems for CCo and group the detected nonconformances in the respective unit according to the nonconformance type.

Q6 What is the relationship between contract type and number of nonconformances per contract clauses (nonconformance ratio)?

We discuss the relationship between contract type and number of nonconformances.

Planning

Experimental units

We evaluated ContractOk in 12 real C#/Code Contracts projects, summing up to 122 KLOC and 8 K Contract Clauses (KCC, henceforth). While AutoDiff is a library that provides fast, accurate and automatic differentiation (computes derivative/gradient) of mathematical functions: http://autodiff.codeplex.com/. Boogie is an Intermediate Verification Language (IVL) for describing proof obligations to be discharged by a reasoning engine, typically an SMT solver: http://boogie.codeplex.com/. Contractor is a tool developed to construct contract specifications with type-state information which can be used for verification of client code. Contractor uses and extends Code Contracts to provide stronger contract specifications: http://lafhis.dc.uba.ar/contractor/contractor.net-web/. DotNet Component Oriented Programming (DotNetCOP for short) is an example project of component oriented programming in .Net: https://github.com/marcel-valdez/dot_net_cop_example. DotNetExtensions contains some dot net extensions to the dictionary type, some numerical convenience functions, a couple of combinators for ArgMax: https://github.com/rebcabin/DotNetExtensionsImproved. DRail is a project related to parsing routes and tracks: https://github.com/david-laundav/d-rail. EuroManager is a football manager on-line with graphical simulation: https://euromanager.codeplex.com/. Frost is a hardware accelerated drawing and composition library written in C#: https://github.com/fealty/Frost. On the other hand, MonoMobileFacebook is a Facebook C# SDK for MonoMobile: https://github.com/anujb/MonoMobile.Facebook. NeuroFlowOverhaul is a Workflow Foundation based Machine Learning Algorithm Library by using GPGPU for computation back-end: https://github.com/unbornchikken/Neuroflow. ProgrammingWithCC is a set of exercises using C# and Code Contracts developed by Henrik Nørgaard: https://github.com/Pimple/Visual-Studio/tree/master/2015-03-26\%20Programming\%20with\%20Code\%20Contract. Finally, Yandex is the library that provides strong-typed and modern .NET interface to the JSON-based Yandex.Direct API with support of all available authentication scenarios: https://yandexdirect.codeplex.com/.

Table 1 makes available the program versions we used in the evaluation of ContractOk and a summary of all units in terms of lines of code and contract clauses.

Table 1. Experimental units characterization in terms of lines of code (LOC)* and contract clauses (CC)* - the contract clauses are also presented grouped by type.

ExperimentalUnitsCodeContracts

*The program used for counting lines of code and contract clauses is available online: ContractCounter.

Study Setup

We detect nonconformances with ContractOk. After detection, three researchers analyzed each nonconformance, and, after discussion, assigned a likely cause. Figure 1 presents this procedure: in the first step, we run ContractOk over each contract-based program in a given time limit (the time limit is the time used for tests generation); then, occurs the manual classification of each detected nonconformance: first, each researcher individually examines the source code with contracts and performs a classification of each nonconformance; then, there is a discussion session in order to establish the final classification of likely cause for each nonconformance.

Figure 1. First, we use ContractOk for detecting nonconformances in each system. Then, we perform the manual classification of likely causes for each detected nonconformance.

We performed the case study on a 3.6 GHz core i7 with 16 GB RAM running Windows 8.1 Enterprise, backed by Visual Studio Community 2015, and Code Contracts 1.9.10714.2, using time limits varying from 60s to 600s (for stabilization on the number of nonconformances detected, we applied limits until 1,100s for Boogie unit -- see Figure 2).

Figure 2. Number of nonconformances detected on Boogie with ContractOk over time limits. After 600s (10 minutes), the number of nonconformances detected stabilizes on 17.

In addition, we collected two metrics for each nonconformance: breadth (B) and depth (D). The first is related to the number of calls into a test method realized until the nonconformance exposure. This metric is defined in Equation 1; the calls(tm) returns the sequence of method calls into the test method received as parameter. The second, is related to the depth required to find each nonconformance the internal calls performed until the contract have been violated. This metric is defined in Equation 2. Given the method that corresponds to the position in which the nonconformance was revealed, if the nonconformance is into the body of this method D receives 1, otherwise, its value is recursively increased until the method that reveals the nonconformance.

R Script

The R script and data of this study is available online: RScript, Contract types data, Correlation data.

Results

Table 2 presents a summary of the case study on nonconformances detection and classification. The table presents, for each experimental unit, the following information: LOC - lines of code, CC - the number of contract clauses, the metrics breadth (B) and depth (D). Furthermore, the table also presents the CCo and NCsD metrics; the first is the ratio between contract clauses and lines of code; the last is the ratio between the number of nonconformances and the number of contract clauses. Concerning to the tests generated for each experimental unit, columns GT and TC shows the time given for tests generation and the average of the number of test cases generated in a given time limit, respectively. Column NCs gives the number of nonconformances detected in a given time limit. Regarding the nonconformances as a whole, columns PRE to EVAL presents the detected nonconformances in number and grouped by type -- PRE is an acronym for precondition, POST for postcondition, and INV for invariant. The group of columns Manual Classification, presents the manual likely causes assigned for the nonconformances detected grouped by likely cause -- following the classification model presented here. In this table we use the following acronyms: WPre for Weak Precondition, SPre for Strong Precondition, WPost for Weak Postcondition, SPost for Strong Postcondition, SInv for Strong Invariant, and CodeErr for code error.

SummaryCC

Now, we present a detailed result for each research question of this study: For Q1 and Q2, Table 3 presents the whole set of nonconformances detected, and the metrics B and D.

Nonconformances-B--D-metrics--CodeContracts

ContractOk detected 63 nonconformances in 12 C#/Code Contracts systems. The precondition type was the most recurrent (28) followed by postcondition (25). Regarding likely causes, Weak precondition (40) was the most frequent likely cause, followed by Code error – 18. All test generated for discovering those nonconformances are available online: Code Contracts.

Average breadth ranges from 2.00 (DRail) to 14.00 (AutoDiff); average depth varies from 1.50 (ProgrammingWithCC) to 11.00 (MonoMobileFacebook). Moreover, for 10 out of 12 systems, precondition clauses are the most common contract clause. On the other hand, invariant clauses are less common, they are present in only six systems. B and D are in general greater than 1: from 63 nonconformances, B is greater than 1 in 62 cases, whereas D is greater than 1 in 54 cases.

The complete manual classification of the nonconformances with an explanation of each value for B and D are available in our technical report: "A Baseline for Classifying Nonconformances in C#/Code Contracts Programs". This tech report also presents a set of suggestions for fixing the detected nonconformances.

In Figure 3, we present the contract types grouped by types.

Figure 3. Contract clauses grouped by type in all Code Contracts systems evaluated.

Developers of Code Contracts systems appearently prefer to use pre- and postcondition in comparison with invariant clauses (see Figure 3). Moreover, the number of precondition clauses in the projects is twice bigger than the number of postconditions.

Regarding the relationship between CCo and number of nonconformances detected, Figure 4 presents a plot correlating CCo metric and number of nonconformances detected for each experimental unit.

Figure 4. Graphic relating CCo with the number of nonconformances detected.

By performing correlation tests of Spearman and Kendall, both tests indicate a weak positive correlation between contract complexity (measured by means of CCo metric) and the number of nonconformances detected. According to these results, when the CCo metric increases there is a small increment in the number of nonconformances detected. Therefore, more complex contracts (measured by means of CCo) imply in a bigger number of nonconformances detected.

Table CCo--NonconformanceType relates the CCo metric with the most common nonconformance type in each system. According to this table, precondition is the most common nonconformance type for three out of the top five systems in terms of CCo metric. Table NC-ratio--ContractType shows the ratio between the number of nonconformances and the number of contract clauses (#NC/#CC) and the most common contract type for each system. Precondition is the contract type more common for three out of five the top-five systems in terms of nonconformance ratio: #NC/#CC. Finally, Table Contract-and-Nonconformance--type presents the relationship between contract and nonconformance type for each considered system. Systems that present more postcondition clauses usually have more postcondition nonconformances – indicating the postconditions may be strong to be satisfied. Considering the systems with more precondition clauses, more precondition clauses implies more precondition nonconformances – showing that those preconditions might be strong to be satisfied by the method callers.

RelationsCC

Discussion

Q1. ContractOk detected 63 nonconformances. From the 63 detected nonconformances in Code Contracts systems, most are precondition – 28 (44%), and 25 are postcondition (39%). These numbers indicate that violations occur both at the entry and the exit of operations, even if the cause is not in the operation itself. There are several possible explanations: specifiers expect certain behavior to which the code fails to comply, or a chain of previous calls fails to avoid certain undesirable states.

Regarding likely causes, our manual classification found Weak precondition as the overall main cause (with 40 instances). In this case, a method allows values that should be denied for the correct execution of the method. Weak precondition is the main likely cause due to the complexity to specify preconditions; once the specifier ignores the clients of a procedure, an overly strong precondition denies access to several clients; in contrast, given an overly weak precondition, several clients are unable to get the expected result. This represents an important tradeoff for contract-based programs. Nevertheless, Code error is also common (18 instances). This result might be due to the disparity in abstraction between programs and contracts, in which the coder may misunderstand a contract specification, or even contracts may be complex to implement; or there might as well be an evolution mismatch.

Q2. breadth and depth are, in average, higher than one, for all systems, which suggest that a typical nonconformancerevealing test present sequences of at least two calls (horizontal complexity) and more than one indirect call to a problematic method (vertical complexity). After a manual analysis of the test cases revealing nonconformances with the purpose of getting the minimal test case that is able to detect the nonconformance, we get the highest average values in AutoDiff (14.00) and MonoMobileFacebook (11.00) for breadth and depth, respectively. Conversely, the smallest mean for breadth (2.00) was obtained on DRail and for depth (1.50) on ProgrammingWithCC. In summary, ContractOk was able to detect nonconformances in 12 with average breadth of 6.14 and depth of 4.05.

Q3. Developers apparently prefer to write pre- and postcondition clauses in comparison with invariants. For further evidence, we performed a paired t test, getting a p-value of 0.002, allowing us to reject the null hypothesis samples do not differ, in average, with a confidence level of 95%. As a conclusion, the use of precondition clauses in the studied systems outperform the use of postcondition clauses, which corroborates with Estler et al.: there is no preference for certain contract type, however preconditions, tend to have more clauses than postconditions.

Q4. We found evidence that when contract complexity grows, the number of nonconformances detected also grows. This result corroborates with Polikarpova et al.: more contract clauses allow the detection of more nonconformances.

Regarding the relationship between contract and nonconformance type, systems with postconditions, mostly, tend to present, more often, postcondition and invariant nonconformances – indicating the postconditions may be harder to satisfy, however they are often insufficient to avoid invariant errors. Considering systems in which preconditions predominate, more numerous preconditions imply more precondition nonconformances – showing that those preconditions might be strong to be satisfied by the method callers. This result may be related to the trade-off in establishing preconditions: if the precondition is permissive, the clients may be not able to get the expected value, however, a restrictive precondition can rule out valid clients. Moreover, since precondition problems occur when a method internally calls other methods, the place in which the preconditions stay may not be the most suitable or the body of more external methods should deal with the values received as parameters in order to maintain the preconditions of the called methods.

Q5. Precondition nonconformances are preponderant in three of the top five complex-contract systems. This result is expected because the precondition clauses are the most common and in accordance to the results of Polikarpova et al., more contract clauses allow the detection of more nonconformances. And, as the preconditions act as filters of input values, they are able to avoid calls to methods in situations in which a postcondition or an invariant clause could be violated.

Q6. Postcondition clauses were the kind of contract more common in systems with the highest nonconformance ratios (ProgrammingWithCC and DotNetComponentOrientedProgramming). This is expect, since preconditions are like a filter for avoiding method executions with forbidden values, reducing the nonconformance ratio. Therefore, more postcondition clauses imply more nonconformances per contract clause.