Resilient distributed processing technique (RDPT), in which mapper and reducer are simplified with the Spark contexts and support distributed parallel query processing.Design/methodology/approachThe proposed work is implemented with Pig Latin with Spark contexts to develop query processing in a distributed environment.FindingsQuery processing in Hadoop influences the distributed processing with the MapReduce model. MapReduce caters to the works on different nodes with the implementation of complex mappers and reducers. Its results are valid for some extent size of the data.Originality/valuePig supports the required parallel processing framework with the following constructs during the processing of queries: FOREACH; FLATTEN; COGROUP.
International Journal of Intelligent Computing and Cybernetics – Emerald Publishing
Published: Apr 23, 2021
Keywords: Query processing; MapReduce; Scalability; Resilient distributed processing; Spark