MinIO introduces deployment and springboot integration

Posted by abakash on Sat, 05 Mar 2022 10:56:44 +0100

Enter the new project team and use minio tools in file storage. Therefore, find relevant materials online and learn minio related knowledge, features, application scenarios, storage architecture and basic concepts in combination with the use of minio in the project. On this basis, actually deploy and build minio services locally and integrate them into the springboot project for use and record here, In order to follow-up in-depth study, but also provide reference for latecomers. There are some omissions in the article. I hope readers can correct them!

1. Basic information of Minio

MinIO is based on Apache license v2 Object storage service of 0 open source protocol. It is suitable for storing large capacity unstructured data, such as pictures, videos, log files, backup data and container / virtual machine images. An object file can be of any size, ranging from several KB to the maximum 5TB. Open source and developed in Go language, with web operation interface, we can use it to build storage cloud services compatible with S3 protocol. Compared with Hadoop, HDFS distributed storage service is much lighter and supports single node deployment.

Object storage:
Object Storage Service (OSS) is a massive, secure, low-cost and highly reliable cloud storage service, which is suitable for storing any type of files. Flexible expansion of capacity and processing capacity, multiple storage types to choose from, and comprehensively optimize the storage cost.

2. MinIO characteristics

1) High performance
MinIO is the world's leading object storage pioneer, with millions of users worldwide On standard hardware, the read / write speeds are as high as 183 GB / s and 171 GB / s.
Object storage can act as the main storage layer to deal with Spark, Presto, TensorFlow, H2O AI and other complex workloads and become a substitute for Hadoop HDFS.
MinIO is used as the main storage of cloud native applications. Compared with traditional object storage, cloud native applications require higher throughput and lower latency. These are the performance indicators that MinIO can achieve.

2) Scalability
MinIO uses the hard won knowledge of Web scaler to bring a simple scaling model for object storage. This is our firm concept of "simple and scalable." In MinIO, expansion starts with a single cluster that can be federated with other MinIO clusters to create a global namespace and span multiple different data centers when needed. By adding more clusters, you can expand the namespace and more racks until you achieve your goal.

3) Cloud native support
MinIO is a software built from 0 in the past four years, which conforms to the architecture and construction process of all native cloud computing, and contains the latest new technologies and concepts of cloud computing. These include container technologies that support Kubernetes, microservices, and multi tenancy. Make object storage more friendly to Kubernetes.

4) Compatible with Amazon S3
Amazon cloud's S3 API (Interface Protocol) is an object storage protocol that has reached a consensus all over the world and is a recognized standard all over the world. MinIO adopted S3 compatible protocol very early, and MinIO is the first product to support S3 Select MinIO is proud of its comprehensive compatibility and has been recognized by more than 750 organizations, including Microsoft Azure's S3 gateway using MinIO - which is more than the sum of other similar products.

5) Simple
Minimalism is the guiding design principle of MinIO. Simplicity reduces the chance of error, improves the uptime and provides reliability. At the same time, simplicity is the basis of performance. Simply download a binary file and execute it to install and configure MinIO in a few minutes. The number of configuration options and variants is kept to a minimum, which reduces the probability of failed configuration to a level close to zero. MinIO upgrade is completed through a simple command, which can complete the upgrade of MinIO without interruption and can complete the upgrade operation without shutdown - reducing the total use and operation and maintenance cost.

3. Application scenario

The application scenario of MinIO can be used not only as the object storage service of private cloud, but also as the gateway layer of cloud object storage, seamlessly connecting Amazon S3 or MicroSoft Azure.

4. Storage architecture

Minio also sets the corresponding storage architecture for different application scenarios:

4.1 single host, single hard disk mode

In this mode, Minio only builds services on one server, and the data is stored on a single disk. This mode has a single point of risk and is mainly used for development and testing
The command to start is:

minio --config-dir ~/tenant1 server --address :9001 /disk1/data/tenant1

4.2 single host, multi hard disk mode

In this mode, Minio builds services on one server, but the data is scattered on multiple (more than 4) disks, providing data security

minio --config-dir ~/tenant1 server --address :9001 /disk1/data/tenant1 /disk2/data/tena

4.3 multi host and multi hard disk mode (distributed)

This mode is the most commonly used architecture for Minio services by sharing an access_key and secret_key, set up services on multiple (2-32) servers, and the data is scattered on multiple (more than 4, no upper limit) disks, providing a relatively powerful data redundancy mechanism (Reed Solomon erasure code).

export MINIO_ACCESS_KEY=<TENANT1_ACCESS_KEY>
export MINIO_SECRET_KEY=<TENANT1_SECRET_KEY>
minio --config-dir ~/tenant1 server --address :9001 http://192.168.10.11/data/tenant1 ht

Distributed benefits:
In the field of big data, the usual design concepts are decentralized and distributed. Minio distributed mode can help build a highly available object storage service. You can use these storage devices without considering their real physical location.
1) Data protection
Distributed Minio uses erasure codes to prevent multiple node downtime and bit rot.
Distributed Minio requires at least 4 hard disks, and the function of erasure code is automatically introduced by using distributed Minio.
2) High availability
There is a single point of failure in the stand-alone Minio service. On the contrary, if it is a distributed Minio with n hard disks, as long as N/2 hard disks are online, your data is safe.
However, you need at least N/2+1 hard disks to create new objects.

For example, a 16 node Minio cluster has 16 hard disks per node. Even if eight servers are down, the cluster is still readable, but you need nine servers to write data.

Note that you can combine different nodes and several hard disks per node as long as you comply with the limitations of distributed Minio.
For example, you can use two nodes, four hard disks per node, or four nodes, two hard disks per node, and so on.

3) Consistency
In distributed and stand-alone mode, all read and write operations of Minio strictly follow the read after write consistency model.
4) The data of MinIO is highly reliable
Minio uses the two features of Erasure Code erasure code and Bit Rot Protection data corruption protection, so the data reliability of Minio is high.

5. Basic concepts

1) Object: basic objects stored in Minio, such as files, byte streams, Anything

2) Bucket: the logical space used to store objects. The data of each bucket is isolated from each other. For the client, the top-level folder is equivalent to a file.

3) Drive: the disk that stores data. It is passed in as a parameter when Minio is started. All object data in Minio will be stored in the drive.

4)Set
That is, a Set of drives. Distributed deployment automatically divides one or more sets according to the cluster size, and the drives in each Set are distributed in different locations. An object is stored on a Set. (For example: {1…64} is divided into 4 sets each of size 16.)

An object is stored on a Set
A cluster is divided into multiple sets
The number of drives contained in a Set is fixed, which is automatically calculated by the system according to the cluster size by default
The drives in a SET are distributed on different nodes as much as possible

Relationship between Set /Drive:
Set /Drive are the two most important concepts in MINIO. An object is finally stored on set.
A node machine can contain multiple hard disks. A Drive is a piece in a node, which can be simply understood as a hard disk. A Set is a Set of multiple drives across nodes

5) Minio write object process:
MINIO encodes the original data into N copies through data coding. N is the number of drives on a Set. N mentioned many times later refers to this meaning.
After the object is encoded into N copies, write each copy to the corresponding Drive, which is to store an object on the whole Set.
A cluster contains multiple sets. The final Set on which each object is stored is hashed according to the name of the object, and then mapped to the unique Set. This method theoretically ensures that the data can be evenly distributed to all sets.

According to the observation, the data distribution is also very uniform. How many drives are included in a Set is automatically calculated by the system according to the cluster size. Of course, it can also be configured by itself.

A Set Drive system will consider putting it on as many nodes as possible to ensure its reliability.

6. Deployment

Minio supports stand-alone deployment, multi tenant deployment and distributed deployment. Support original file storage and erasure code mode storage. During stand-alone deployment, Minio's client tools can be used for backup.

6.1 binary deployment

Deployment environment: Ubuntu 20.04.2 LTS
 System architecture: amd64(uname -a or arch Command to view the system architecture. Note: x86_64,x64,AMD64 Basically the same thing)

Use the following command to run a stand-alone MinIO server on a Linux host running 64 bit Intel/AMD architecture. Replace / data with the path to the drive or directory where you want MinIO to store data.

wget https://dl.min.io/server/minio/release/linux-amd64/minio
chmod +x minio
./minio server /data

The installation packages corresponding to different architectures of 64 bit Intel / AMD, 64 bit arm, 64 bit PowerPC Le (ppc64le) and IBM Z-Series (S390X) are https://dl.min.io/server/minio/release/ Find yourself under

Parameter – console address ": 9001" specifies the browser access port

6.2 Docker deployment

1) Pull image

docker pull minio/minio

2) Run mirror MinIO:

docker run -p 9000:9000 -p 9001:9001 --name minio \
  -v /etc/localtime:/etc/localtime \
  -v /data/minio/data:/data \
  -v /data/minio/config:/root/.minio \
  -d minio/minio server /data --console-address ":9001"

6.3 console access settings

Browser access http://192.168.109.130:44561/ Account password minioadmin/minioadmin

1) Create bucket: after entering the system, we first need to click the "+" button in the lower right corner to create a file bucket (enter the name and press enter), and upload files to this file bucket. Create bucket and Upload file. Here I created a bucket test and uploaded a picture


2) For uploaded files, there is a share button on the file list interface. Clicking share will generate the access URL address of the file to specify the effective time of the link. The maximum effective time is 7 days and the minimum unit is minutes. After the valid time expires, you will be prompted to fail when accessing the picture.


3) bucket access policy
By default, the bucket can have three Access Policy policies: public, custom and private
Policy public: you can directly access resources without any authentication
Policy private: no operation is allowed without authorization
Policy customer: it appears through the following custom Access Rules, readonley/writeonly/readwrite

After adding a customer, the Access Policy is automatically set to customer. After all customers are deleted, the Access Policy is automatically set to private;


7. Integrated use of springboot

7.1 introducing jar package

<dependency>
    <groupId>io.minio</groupId>
    <artifactId>minio</artifactId>
    <version>7.0.2</version>
</dependency>

<dependency>
    <groupId>cn.hutool</groupId>
    <artifactId>hutool-all</artifactId>
    <version>5.6.6</version>
</dependency>

7.2 add configuration

platform:
  oss:
    endpoint: http://192.168.109.130:9000
    accessKeyId: minioadmin
    accessKeySecret: minioadmin
    bucketName: tduck-cloud
    domain: http://192.168.109.130:9000/tduck-cloud

7.3 code integration

@Data
@Component
@Slf4j
@ConfigurationProperties(prefix = "platform.oss")
public class OssStorageConfig {

    /**
     * oss type
     * Refer to osstypeenum java
     */
    private OssTypeEnum ossType;

    /**
     * Alibaba cloud: endpoint
     */
    private String endpoint;

    /**
     * accessKeyId
     */
    private String accessKeyId;

    /**
     * accessKeySecret
     */
    private String accessKeySecret;

    /**
     * Bucket name
     */
    private String bucketName;

    /**
     * Preview domain name
     */
    private String domain;
    
    /**
     * Local storage file storage address
     */
    private String uploadFolder;
    
    /**
     * Local storage file access path
     */
    private String accessPathPattern;
}


@Component
public class MIniOStorageService {

    private MinioClient client;

    public MIniOStorageService(OssStorageConfig config) {
        this.config = config;
        //initialization
        init();
    }

    private void init() {
        try {
            client = new MinioClient(config.getEndpoint(), config.getAccessKeyId(), config.getAccessKeySecret(), false);
        } catch (InvalidEndpointException e) {
            e.printStackTrace();
        } catch (InvalidPortException e) {
            e.printStackTrace();
        }
    }

    @Override
    public String upload(InputStream inputStream, String path) {
        try {
            PutObjectOptions poo = new PutObjectOptions(inputStream.available(), -1);
            poo.setContentType(MimeTypeEnum.getContentType(path));
            client.putObject(config.getBucketName(), path, inputStream, poo);
        } catch (Exception e) {
            throw new StorageException("Failed to upload file, please check the configuration information", e);
        }
        return config.getDomain() + "/" + path;
    }

    @Override
    public String upload(byte[] data, String path) {
        try {
            PutObjectOptions poo = new PutObjectOptions(data.length, -1);
            poo.setContentType(MimeTypeEnum.getContentType(path));
            client.putObject(config.getBucketName(), path, new ByteArrayInputStream(data), poo);
        } catch (Exception e) {
            throw new StorageException("Failed to upload file, please check the configuration information", e);
        }
        return config.getDomain() + "/" + path;
    }

    @Override
    public void delete(String path) {
        try {
            client.removeObject(config.getBucketName(), path);
        } catch (Exception e) {
            throw new StorageException("Failed to delete file", e);
        }
    }
}


    @Autowired
    private MIniOStorageService mIniOStorageService;

    /**
     * Upload user files
     * <p>
     * User Id MD5 encrypts the files of the same user and puts them in a directory
     *
     * @param file
     * @param userId
     * @return
     * @throws IOException
     */
    @PostMapping("/user/file/upload")
    public Result<String> uploadUserFile(@RequestParam("file") MultipartFile file, @RequestAttribute Long userId) throws IOException {
        String path = new StringBuffer(SecureUtil.md5(String.valueOf(userId)))
                .append(CharUtil.SLASH)
                .append(IdUtil.simpleUUID())
                .append(CharUtil.DOT)
                .append(FileUtil.extName(file.getOriginalFilename())).toString();
        String url = mIniOStorageService.upload(file.getInputStream(),path);
        return Result.success(url);
    }

8. References

http://www.minio.org.cn/
http://docs.minio.org.cn/docs/
https://blog.csdn.net/lj15559275886/article/details/121441031
https://blog.csdn.net/crazymakercircle/article/details/120855464

Topics: Spring Boot cloud computing Middleware