The Data Recovery Service in NoSQL

Authors:  Chia-Ping Tsai, Hung-Chang Hsiao, and Yu-Chen Lai

Publication: IEEE International Conference on Big Data (IEEE BigData), Dec. 17-20. 2022, Osaka, Japan (IEEE premier conference on big data)

Abstract:

Not Only SQL (NoSQL) is a critical technology that is scalable and provides flexible schemas, thereby complementing existing relational database technologies. Although NoSQL is flourishing, present solutions lack the features required by enterprises for critical missions. In this paper, we explore solutions to the data recovery issue in NoSQL. Data recovery for any database table entails restoring the table to a prior state or replaying (insert/update) operations over the table given a time period in the past. Recovery of NoSQL database tables enables applications such as failure recovery, analysis for historical data, debugging, and auditing. In this paper, we first identify the design and implementation issues with regard to the data recovery problem for NoSQL databases, including time length of recovery, fault tolerance, scalability, memory constraint, software compatibility, and quality of recovery. Particularly, our study emphasizes on columnar NoSQL databases. We then propose and evaluate four solutions to address the data recovery problem in NoSQL; each solution has its pros and cons. We implement our solutions based on Apache HBase, a popular NoSQL database in the Hadoop ecosystem widely adopted by industry. Our implementations are extensively benchmarked with an industrial NoSQL benchmark under real environments. Specifically, our research findings and implementations in this paper have been contributed to and integrated with Apache HBase for global distribution.