STRICT: Information Retrieval Based Search Term Identification for Concept Location

Abstract: During maintenance, software developers deal with numerous change requests that are written in an unstructured fashion using natural language. Such natural language texts illustrate the change requirement involving various domain related concepts. Software developers need to find appropriate search terms from those concepts so that they could locate the possible locations in the source code using a search technique. Once such locations are identified, they can implement the requested changes there. Studies suggest that developers often perform poorly in coming up with good search terms for a change task. In this paper, we propose a novel technique--STRICT--that automatically identifies suitable search terms for a software change task by analyzing its task description using two information retrieval (IR) techniques-- TextRank and POSRank. These IR techniques determine a term's importance based on not only its co-occurrences with other important terms but also its syntactic relationships with them. Experiments using 1,939 change requests from eight subject systems report that STRICT can identify better quality search terms than baseline terms from 52%--62% of the requests with 30%--57% Top-10 retrieval accuracy which are promising. Comparison with two state-of-the-art techniques not only validates our empirical findings and but also demonstrates the superiority of our technique.


Experimental Data

Baseline & Suggested Queries

We conducted experiments using 1,939 change requests from 8 Java-based systems. From each request, 6 baseline queries are extracted, and our technique--STRICT--also suggests one search query. The baseline queries and our suggested queries along with their effectiveness (i.e., rank of the first correct result) are given below:


It would be an Eclipse IDE plug-in (Work in progress)

Related Publication(s)

author={Mohammad Masudur Rahman and C. K. Roy},
booktitle={Proc. SANER},
title={STRICT: Information Retrieval Based Search Term Identification for Concept Location},
pages={79--90} }

author={Mohammad Masudur Rahman and C. K. Roy},
booktitle={Proc. SANER},
title={TextRank based search term identification for software change tasks},
pages={540-544} }

© Masud Rahman, Computer Science, University of Saskatchewan, Canada.