Dr. Michael Decker’s proposed new software infrastructure called srcDiff – derived from Source Code Differencer – identifies changes made to code more accurately and comprehensively than existing methods by modeling a programmer’s viewpoint of the software change.
The three-year project is being funded by a $750,000 grant from the National Science Foundation.
Decker, an associate professor in the BGSU Computer Science Department, is helping to develop the infrastructure and soliciting feedback through community outreach, which is a substantial focus of the project.
“The unique part about this grant is its focus on community outreach,” Decker said. “We’re developing open-source software that will enable other researchers and industry practitioners to further their research or apply it to their projects or industry.”
BGSU has been a pioneer in computer science for more than 50 years, leading the way as Ohio’s first public university with an undergraduate computer science program.
The differencing process
Developers constantly change code to improve functionality, fix bugs, add new features, merge software and detect clones, among many other things.
Software developers need to be able to find and understand what changes have occurred within the code. To do that, they use a process called differencing, which compares two files side-by-side to determine what differentiates them.
“For example, the code was working right, or you knew it worked at some point,” Decker said. “Then changes are made to the system, and now that code no longer works. Something must have changed to break that code.”
The challenge for developers, Decker said, is that current differencing tools rely on the mechanical property of edit distance alone. Edit distance is how dissimilar an item, such as a sequence of characters, is to another.
Decker said srcDiff uses edit distance in combination with differencing rules that contain a developer’s domain knowledge, increasing its efficiency and effectiveness in identifying what changes have occurred.
The focus on producing human-centric differences reduces the burden researchers and practitioners often face in obtaining, analyzing and processing software changes.
“The infrastructure we’re developing helps people more easily understand what changes occurred in the system,” Decker said. “It produces a more accurate and human-understandable difference of changes. The approach is very scalable and can be applied to large software systems.”
Novel feature
One of the infrastructure’s novel features is its ability to identify and report changes to nonexecutable parts of code, including white space and comments developers include to explain changes they’ve made.
Typical differencing tools only capture changes to the executable code, which directs the computer to perform various functions.
“There’s a lot of meaning in the white space of code, and the comments are very important for a developer to understand the code and the work done on that body of code. Other tools discard all the white space and comments. They are only differencing on the executable parts. The srcDiff infrastructure preserves all of that.”
In addition to developing and refining the srcDiff infrastructure, Decker is also building a support library and soliciting feedback through community workshops, client meetings and attending conferences.