Project Background
STATEMENT OF REGIONAL OR STATE WATER PROBLEM In the fifty-five years since its initiation, workers at the University of Hawaii Water Resources Research Center have produced an unparalleled array of water focused research products ranging from technical reports and peer-reviewed publications to critical water-quality and geospatial datasets. The American Samoa Community College Land Grant (ASCCLG) has also produced a significant body of work relating to many facets of natural resources management in its thirty-eight years of operation. However, both of these collections of work are currently archived in highly disconnected and disparate locations. A centralized repository and access point for all research products generated by the WRRC is greatly needed to aid in the implementation of the institution’s mission; “To promote understanding of critical state and regional water resource management and policy issues through research, community outreach, and public education”. The public dissemination of research products in an up-to-date, electronic format is a top priority to achieving these goals. Other water resources research institutes throughout the country have developed web based data portals (e.g. WERI, 2016). However, these existing data portals are generally designed to host local research outputs, as opposed to aggregating data and products from multiple jurisdictions. The scope of this work is proposed as a pilot project to demonstrate the requirements for, and the utility of, a water resources institute data portal that will incorporate research products from both WRRC and ASCCLG. If successful, the portal will provide a model for a system that could one day include data and research products from all of the island region WRRIP centers (Guam, Hawaii, Puerto Rico, and the U.S. Virgin Islands). A web-space and data portal that serves all island centers would not only provide a centralized location for downloading data, but would also facilitate the exchange of information between institutes where researchers study similar topics in similar locations. In addition to making the products of each institute available to the other ones, and to the public as well, it is anticipated that such a portal would encourage cross-institute collaboration as researchers at different institutions would become more familiar with each other’s foci. This pilot project is an important first step for testing and demonstrating the feasibility of such a system.
NATURE, SCOPE, AND OBJECTIVES The primary goal for this proposed work will be to develop an open data portal to archive and publicly serve existing digitized and publicly-available research products developed through the past and present efforts of UH WRRC and ASCCLG. The portal will enhance the end-user experience by improving searchability for publications, reports, and data through intelligent annotation via association of content keywords with each product. A key challenge in making reports and other research products queryable, lies within the task of data/file annotation, which if performed manually, requires a significant amount of time and effort. We propose to use Natural Language Processing (NLP) methods (Bhowmik, 2008; Vivek, 2019) for keyword extraction from abstracts to automate the creation of metadata for reports and publications, thereby significantly improving the relevance of search results within the portal and providing a means for new products to be uploaded and subsequently queried without the need for maintaining ongoing availability of human resources to perform annotation services. Because this work will be intended as a small-scale pilot project with the intention of making it possibly expandable to other centers in the island region, emphasis will also be placed on communicating with and assessing the needs of the other WRRIP centers. From these interactions, we will identify system requirements for the data portal that will facilitate integrating products and data from these centers as well.