Introduction to Spark development

What is Spark The whole Hadoop ecosystem is divided into distributed file system HDFS, computing framework MapReduce and resource scheduling framework Yan. However, with the development of the times, MapReduce's high-intensity disk IO, network communication frequency and dead write make it seriously slow down the operation speed of the who ...

Posted by switchdoc on Sun, 09 Jan 2022 10:15:36 +0100

MapReduce learning 1: overview and simple case preparation

1, Overview 1.1MapReduce definition MapReduce is a programming framework for distributed computing programs and the core framework for users to develop "hadoop based data analysis applications". The core function of MapReduce is to integrate the business logic code written by the user and its own default components into a complete ...

Posted by pckidcomplainer on Sat, 08 Jan 2022 13:35:01 +0100

Resource scheduling in Yarn

Three Scheduling Strategies FIFO Scheduler, Capacity Scheduler and Fair Scheduler policies are listed from left to right. These three policies are introduced below FIFO Scheduler: first in, first out scheduling strategy Tasks are carried out in turn. Resources can only be released after the execution of previous tasks. This is unreasona ...

Posted by ConnorSBB on Sat, 08 Jan 2022 04:07:34 +0100

Data analysis & node management & setting up NFS gateway service | Cloud computing

1. Data analysis 1.1 problems This case requires statistical analysis exercises: Use the client to create the input directory on hdfsAnd upload * txt file to input directoryCall the cluster to analyze the uploaded files and count the words with the most occurrences 1.2 steps To implement this case, you need to follow the following st ...

Posted by Q695 on Wed, 05 Jan 2022 18:31:13 +0100

Big data and Hadoop & distributed file systems & distributed Hadoop clusters | Cloud computing

1. Deploy Hadoop 1.1 problems This case requires the installation of stand-alone Hadoop: Hot word analysis:Minimum configuration: 2cpu, 2G memory, 10G hard diskVirtual machine IP: 192.168.1.50 Hadoop 1Installing and deploying hadoopData analysis to find the most frequently occurring words 1.2 steps To implement this case, you need to ...

Posted by ball420 on Wed, 05 Jan 2022 18:23:47 +0100

MapReduce framework principle - InputFormat data input

catalogue 1, Introduction to InputFormat 2, Parallelism of slicing and MapTask tasks 3, Job submission process source code 4, InputFormat implementation subclass 5, Slicing mechanism of FileInputFormat (1) Slicing mechanism: (2) Slice source code analysis (3) Slicing steps (4) FileInputFormat default slice size parameter configuration ...

Posted by cihan on Tue, 04 Jan 2022 16:10:16 +0100

Use of MapReduce Framework-Join

Catalog I. Introduction 2. Use of Join in Relational Database MySQL Cartesian product: CROSS JOIN Internal connection: INNER JOIN Left Connection: LEFT JOIN Right Connection: RIGHT JOIN Outer connection: OUTER JOIN 3. Reduce Join 1. Introduction to Reduce Join 2. Cases 2.1 Requirements: 2.2 Ideas for implementation: reduce end table ...

Posted by Seol on Tue, 04 Jan 2022 11:58:06 +0100

spring boot integrates spark and runs and submits spark task spark on yarn based on yarn

preface The previous project was based on springboot and integrated spark, running on standalone. I once wrote a blog, link: https://blog.csdn.net/qq_41587243/article/details/112918052?spm=1001.2014.3001.5501 The same scheme is used now, but spark is submitted on the production environment yarn cluster, and kerbores verification is required, ...

Posted by not_john on Mon, 03 Jan 2022 23:15:07 +0100

Spark on Hive and Hive on Spark for Big Data Hadoop

1. Differences between Spark on Hive and Hive on Spark 1)Spark on Hive Spark on Hive is Hive's only storage role and Spark is responsible for sql parsing optimization and execution. You can understand that Spark uses Hive statements to manipulate Hive tables through Spark SQL, and Spark RDD runs at the bottom. The steps are as follows: ...

Posted by joejoejoe on Mon, 03 Jan 2022 02:40:47 +0100

Building Hadoop using virtual machine (pseudo distributed building, distributed building)

After learning Hadoop for a semester, I finally chewed off this big bone, tears!!! This article is more like a summary of learning Hadoop 1, Preparatory work 1. hadoop compressed package There will be this official website. Download the compressed package and prepare it. I use version 2.7.1 2. jdk compressed package This is the Java r ...

Posted by many_pets on Sat, 01 Jan 2022 19:05:16 +0100