STRICT: Information Retrieval Based Search Term Identification for Concept Location

Abstract: During maintenance, software developers deal with numerous change requests that are written in an unstructured fashion using natural language. Such natural language texts illustrate the change requirement involving various domain related concepts. Software developers need to find appropriate search terms from those concepts so that they could locate the possible locations in the source code using a search technique. Once such locations are identified, they can implement the requested changes there. Studies suggest that developers often perform poorly in coming up with good search terms for a change task. In this paper, we propose a novel technique--STRICT--that automatically identifies suitable search terms for a software change task by analyzing its task description using two information retrieval (IR) techniques-- TextRank and POSRank. These IR techniques determine a term's importance based on not only its co-occurrences with other important terms but also its syntactic relationships with them. Experiments using 1,939 change requests from eight subject systems report that STRICT can identify better quality search terms than baseline terms from 52%--62% of the requests with 30%--57% Top-10 retrieval accuracy which are promising. Comparison with two state-of-the-art techniques not only validates our empirical findings and but also demonstrates the superiority of our technique.


Replication Package

Baseline & Suggested Queries

We conducted experiments using 1,939 change requests from 8 Java-based systems. From each request, 6 baseline queries are extracted, and our technique--STRICT--also suggests one search query. The baseline queries and our suggested queries along with their effectiveness (i.e., rank of the first correct result) are given below:

Note: The detailed dataset can be also found in this Google Drive folder

Tool Installation:
  1. Download strict-exec.jar, models and pp-data from Google Drive.
  2. Keep all three items in the same directory.
  3. Download and keep sample folders--change-requests and suggested-queries in the same directory (Optional).

Change Request Preparation:
  1. Extract title and description from a bug report, and store them in a text file.
  2. First line of the text file should be the title whereas the remaining lines should contain the description texts from the bug report. Please check change-requests for examples.
  3. Each change request (i.e., bug report) should have an individual text file.

Running STRICT

Execute the following command with your custom parameters:

java -jar strict-exec.jar -requestdir ./change-requests -outputdir ./suggested-queries -K 5
  • -requestdir : the directory containing the change requests
  • -outputdir : the directory for storing the suggested queries from the requests
  • -K : Number of top search terms to be returned
Once the tool is done, you can execute the queries against a source code corpus with the help of a search engine (e.g., Lucene).

Snapshot of STRICT's Run:

STRICT Library Installation:
  1. Download strict-exec.jar, models and pp-data from Google Drive.
  2. Add strict-exec.jar to the CLASSPATH of your Java project
  3. Add models and pp-data to the home directory of your Java project

Accessing STRICT Library (Sample Code):
	Library name:  ca.usask.cs.srlab.strict
String title = "33888 Enhance abbreviation options for logger and class layout pattern converters"; String reportContent = ContentLoader .loadFileContent("./change-requests/33888.txt"); int TOPK=10; SearchTermProvider searchTermProvider = new SearchTermProvider(title, reportContent, TOPK); String keywords = searchTermProvider.provideSearchTerms(); System.out.println("Keywords:" + keywords);

Related Publication(s)

author={Mohammad Masudur Rahman and C. K. Roy},
booktitle={Proc. SANER},
title={STRICT: Information Retrieval Based Search Term Identification for Concept Location},
pages={79--90} }

author={Mohammad Masudur Rahman and C. K. Roy},
booktitle={Proc. SANER},
title={TextRank based search term identification for software change tasks},
pages={540-544} }

← Check out other tools by Masud Rahman

© Masud Rahman, Computer Science, University of Saskatchewan, Canada.