Segmented Evolutionary Testing of Android Apps (EvoDroid)

EvoDroid is an evolutionary approach for system testing of Android apps.  EvoDroid overcomes a key shortcoming of using evolutionary techniques for system testing, i.e., the inability to pass on genetic makeup of good individuals in the search. To that end, EvoDroid combines two novel techniques: (1) an Android-specific program analysis technique that identifies the segments of the code amenable to be searched independently, and (2) an evolutionary algorithm that given information of such segments performs a step-wise search for test cases reaching deep into the code.

More detailed description of EvoDroid can be found in our publication below: 
Riyadh Mahmood, Nariman Mirzaei, and Sam Malek. “EvoDroid: Segmented Evolutionary Testing of Android Apps.” In proceedings of the 22th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE 2014), Hong Kong, November 2014. 

Evaluation Results

Android Monkey is developed by Google and represents the state-of-practice in automated testing of android apps. It basically sends random inputs and events to the app under test. Dynodroid is a recently published work work from researchers at Georgia Tech. 

We selected 10 open source apps to evaluate the line coverage between EvoDroid, Monkey, and Dynodroid. We were not able to run Dynodroid on two of the subject apps, and thus we are not able to report on those. As shown below, EvoDroid consistently achieves significantly higher coverage than both Monkey and EvoDroid. On average EvoDroid achieves 47% and 27% higher coverage than Monkey and Dynodroid, respectively.

Line Code Coverage

Some of the reasons for not achieving complete coverage are unsupported emulator functions, such as camera, as well as spawning asynchronous tasks that may fail, not finish by the time the test finishes, and thus not get included in the coverage results.  Other reasons include code for handling external events, such as receiving a text message, dependence on other apps, such as calendars and contacts lists, and custom exception classes that are not encountered or thrown.  Additionally, some of the applications contained dead code or test code that was not reachable, thus the generated EvoDroid model would not be fully connected. Indeed, in many of these apps achieving 100% coverage is not possible, regardless of the technique. 

Synthetic Apps Experiment Setup

The limitations of emulator, peculiarities in the third-party apps, and incomplete models made it very difficult to assess the characteristics of EvoDroid independently. In other words, it was not clear whether the observed accuracy and performance was due to the aforementioned issues, and thus an orthogonal concern, or due to the fundamental limitations in EvoDroid’s approach to test generation. We, therefore, complemented our evaluation on real apps with a benchmark using synthetic apps.

To control the characteristics of the subjects (i.e., apps under test), we developed an Android app generator that synthesizes apps with different levels of complexity for our experiments. Since we needed a way of ensuring the synthetic apps were representative of real apps, we first conducted an empirical study involving 100 real world apps chosen randomly from an open source repository, called F-Droid. We analyzed these apps according to four complexity metrics that could impact EvoDroid.

Synthetic Apps Results:

We benchmarked EvoDroid with Android Monkey and Dynodroid in terms of code coverage and execution time. 

Line Code Coverage

Execution Time (minutes)