RACK: Automatic Reformulation of Query for Code Search using Crowdsourced Knowledge

Abstract: Traditional code search engines (e.g., Krugle) often do not perform well with natural language queries. They mostly apply keyword matching between query and source code. Hence, they need carefully designed queries containing references to relevant APIs for the code search. Unfortunately, preparing an effective search query is not only challenging but also time-consuming for the developers according to existing studies. In this article, we propose a novel query reformulation technique--RACK--that suggests a list of relevant API classes for a natural language query intended for code search. Our technique offers such suggestions by exploiting keyword-API associations from the questions and answers of Stack Overflow (i.e., crowdsourced knowledge). We first motivate our idea using an exploratory study with 19 standard Java API packages and 344K Java related posts from Stack Overflow. Experiments using 175 code search queries randomly chosen from three Java tutorial sites show that our technique recommends correct API classes within the Top-10 results for 83% of the queries, with 46% mean average precision and 54% recall, which are 66%, 79% and 87% higher respectively than that of the state-of-the-art. Reformulations using our suggested API classes improve 64% of the natural language queries and their overall accuracy improves by 19%. Comparisons with three state-of-the-art techniques demonstrate that RACK outperforms them in the query reformulation by a statistically significant margin. Investigation using three web/code search engines shows that our technique can significantly improve their results in the context of code search.

 

Updated Replication Package: https://github.com/masud-technope/RACK-Replication-Package

 

Replication Package

Plug-in overview
  • Type: Eclipse IDE Plug-in
  • System Requirement: Kepler 4.3.2+ (Tested)
  • Prototype: RACK update site
RACK Installation
  1. Copy the update site URL
  2. Use with Help>Add New Software>Work with option of Eclipse IDE for RACK installation
  3. Installation will require the IDE to restart
  4. Once installed successfully, you will see RACK icon in the main menu
  5. Please check Use remote option for code search if you do not have RACK server installed

RACK User Guide

Note: RACK server might be down sometimes. If you want to try the plug-in, please contact Masud Rahman. Thanks for your interest.

We collected code search queries from three popular programming tutorial sites -- KodeJava, Java2s and JavaDB. Each of the uploaded files below contains queries in the odd line and relevant API names in the next (i.e., even) line.

  • NL-Queries & Ground Truth
    • NL-queries and their oracle (i.e., ground-truth API classes)
  • Suggested Queries by RACK
    • Suggested API classes by RACK (for noun+verb NL keywords)
    • Suggested API classes by RACK (for noun keywords)
    • Suggested API classes by RACK (for verb keywords)
    • Suggested API classes by RACK (KAC)
    • Suggested API classes by RACK (KKC)

git clone https://github.com/masud-technope/RACK-Replication-Package.git


Download from Google Drive

Please contact Masud Rahman for information related to the experiments.

Tool Installation:
  1. Download rack-exec.jar, models, stopword and database from Google Drive.
  2. Keep all four items in the same directory.
  3. Download and keep sample files--sample-queries and sample-output in the same directory (Optional).

NL-Query Preparation:
  1. Add each code search query in a single line. If you have N queries, the query file should have N lines
  2. Avoid special symbols in the query texts.
  3. Keep the query file in the tool's installation directory

Running RACK

Execute the following command with your custom parameters:

java -jar rack-exec.jar -queryfile ./sample-queries.txt -outputfile ./sample-output.txt -K 5
  • -queryfile : the file containing the NL-queries, each query on an individual line
  • -outputfile : the file containing the suggested API names by RACK
  • -K : Number of top relevant API names to be returned
Once the tool is done, you can compare the suggested API names with the ground truth provided above.

Snapshot of RACK's Run:

RACK Library Installation:
  1. Download rack-exec.jar, models, stopword and database from Google Drive.
  2. Add rack-exec.jar to the CLASSPATH of your Java project
  3. Add models, stopword and database to the home directory of your Java project
  4. database contains NL term to API mappings extracted from Stack Overflow questions and answers.

Accessing RACK Library (Sample Code):
	
	Library name:  ca.usask.cs.srlab.rack.server
	
String query = "How to send email in Java?"; int TOPK=5; CodeTokenProvider ctProvider = new CodeTokenProvider(query,TOPK); System.out.println(ctProvider.recommendRelevantAPIs());

Related Publication(s)


@ARTICLE{emse2018masud, 
author={Rahman, M. M. and Roy, C. K. and Lo, D.}, 
journal={EMSE}, 
title={Automatic Reformulation of Query for Code Search using Crowdsourced Knowledge}, 
year={2018}, 
pages={52} 
}

@INPROCEEDINGS{saner2016masud,
author={Rahman, M. M. and Roy, C. K. and Lo, D.},
booktitle={Proc. SANER}, title={{RACK}: {A}utomatic {API} {R}ecommendation using {C}rowdsourced {K}nowledge},
year={2016},
pages={349--359} }

@INPROCEEDINGS{icse2017masud,
author={Rahman, M. M. and Roy, C. K. and Lo, D.},
booktitle={Proc. ICSE}, title={RACK: Code Search in the IDE using Crowdsourced Knowledge},
year={2017},
pages={51--54} }

← Check out other tools by Masud Rahman


© Masud Rahman, Computer Science, University of Saskatchewan, Canada.