2021-2022 Blue Bridge Cup winter vacation training - question G: HTML new hand - picture collector

Posted by kontesto on Wed, 23 Feb 2022 07:05:11 +0100

HTML novice - picture collector

Time limit: 2 seconds
Space limit: 1024M

The reason why "の" is added to the solution of CSDN is that the title of CSDN blog does not allow the word "novice" to be included??!

Title Description

The spring river tides even the sea, and the bright moon on the sea tides together.
Where is the spring wave? Where is the moon!
The river flows around Fangdian, the moon shines, and the flower forest is like graupel;
Frost flows in the air and you don't feel it flying. You can't see the white sand on the ting.
The river and sky are the same without fine dust, and the solitary moon wheel in the sky is bright.
Who first saw the moon by the river? When does the moon shine on people?
Life has been endless from generation to generation, and the river and moon look similar from year to year.
I don't know who Jiang Yue treats, but I see the Yangtze River sending water.
The white clouds go away, and the green maple river is full of sorrow.
Whose family is boating tonight? Where is Acacia moon tower?
Poor moon Pei Hui upstairs should leave the dressing table.
The jade curtain cannot be rolled away, and the clothes are brushed on the anvil.
At this time, we look at each other and don't hear each other. We are willing to shine on you month by month.
Wild geese fly long without light, and fish dragons dive into the water.
Last night, I dreamed of falling flowers in the idle pool. Poor spring didn't return home.
The river flows, the spring goes, and the river pond falls, and the moon tilts to the West.
The slanting moon is deep in the fog of the Tibetan sea, and the Jieshi Xiaoxiang infinite road.
I don't know how many people return by the moon. The falling moon shakes the river and trees.

——Xiaot found that Beihua didn'T seem to have an HTML course, so she started her HTML learning path.


Through learning, you only need to input in the html file

<img src="Picture address">

The picture corresponding to the address can be displayed in the browser.

For example, in an html file, enter:

<img src="https://www.baidu.com/img/flexible/logo/pc/result.png">

The address is displayed in the browser https://www.baidu.com/img/flexible/logo/pc/result.png The display result is:

Similarly, enter in the html file

<html>
    <body>
        <h1>LetMeFly</h1>
        <img src="https://letmefly.xyz" alt="First Img">

        <h1>La La La</h1>
        <img src="https://qkiller.xyz" height="80px">

        <br>
        <img src="https://diary.letmefly.xyz" width="66px" />
    </body>
</html>

Then three pictures will be displayed in the browser, and the picture addresses are https://letmefly.xyz,https://qkiller.xyz,https://diary.letmefly.xyz

(special situations such as comments are not considered) that is, as long as < img SRC = "??" * * * > In this case, it is considered that an address with??? In which * * * stands for other characters. Please don't imagine the problem to be too complicated.

As a picture collector, little T wants to know which pictures are included in a standard html file.

Enter description

The input is a standard html source file and meets the following requirements:

  1. The file does not contain comments

  2. The file does not contain a newline < img > tag

  3. All picture addresses are enclosed in double quotation marks (without single quotation marks)

  4. The picture address does not contain double quotes

  5. The file size is no more than 2M and contains no more than 1000 pictures

  6. The address length of each picture is less than 1024

Output description

Output which pictures are in the given html file.

An integer in the first line represents the number of pictures in the source file, n

Next, n lines are output in order, and line i represents the address of the ith picture

Example 1

input

<html>
    <body>
        <h1>LetMeFly</h1>
        <img src="https://letmefly.xyz" alt="First Img">

        <h1>La La La</h1>
        <img src="https://qkiller.xyz" height="80px">

        <br>
        <img src="https://diary.letmefly.xyz" width="66px" />
    </body>
</html>

output

3
https://letmefly.xyz
https://qkiller.xyz
https://diary.letmefly.xyz

Topic analysis

According to the title description, there is no more complex label.

Therefore, the following ideas can be adopted:

  1. First find out the start and end positions of all img labels

    As long as we encounter "< img", we think we have found an img tag, and then find the first ">" after that, we can separate the IMG tag (< img * * * * >)

  2. For this img tag, we only need to find the src attribute

    For a found img tag, we only need to look for "src =" "followed by" "" within the tag range, and the part between them is the address of the picture.

AC code

#include <stdio.h>
#include <string.h>

char s[2 * 1024 * 1024 + 10]; // Input file Max 2M

char img[1000][1024]; // Up to 1000 pictures, picture length < 1024 (plus \ 0 exactly ≤ 1024)

int findImg4End(int begin) { // Find Img's End: start from begin to find the end of img tag. (because this topic ensures that the IMG tag will not wrap, as long as "< img" appears, it can be considered that the IMG tag appears, and it can be considered that there will be a closed ">" later.)
    while (s[begin] != '>')
        begin++;
    return begin;
}

int main() {
    int imgNum = 0;
    while (gets(s)) { // Because the title stipulates that img tags will not cross lines, it is enough to analyze one line at a time
        int l = strlen(s);
        int analyze2 = 0; // Analyze to, which character is analyzed
        while (analyze2 < l) {
            if (analyze2 + 4 < l && s[analyze2] == '<' && s[analyze2 + 1] == 'i' && s[analyze2 + 2] == 'm' && s[analyze2 + 3] == 'g' && s[analyze2 + 4] == ' ') { // Found the beginning of an img tag
                int img4End = findImg4End(analyze2 + 5); // Find an img tag, range [analyze2, img4End]
                // printf("img[%d, %d]\n", analyze2, img4End); //****
                int srcBegin = analyze2;
                while (!(s[srcBegin] == 's' && s[srcBegin + 1] == 'r' && s[srcBegin + 2] == 'c' && s[srcBegin + 3] == '=' && s[srcBegin + 4] == '"')) // The input data will be found
                    srcBegin++;
                srcBegin = srcBegin + 5; // Start after ''
                int srcEnd = srcBegin;
                while (s[srcEnd] != '"')
                    srcEnd++;
                // The real range of picture address is [srcBegin, srcEnd)
                // printf("src[%d, %d)\n", srcBegin, srcEnd); //****
                for (int loc = srcBegin; loc < srcEnd; loc++) {
                    img[imgNum][loc - srcBegin] = s[loc];
                }
                img[imgNum++][srcEnd] = '\0';
                analyze2 = img4End + 1;
            }
            else {
                analyze2++;
            }
        }
    }
    printf("%d\n", imgNum);
    for (int i = 0; i < imgNum; i++) {
        puts(img[i]);
    }
    return 0;
}

gcc. Exe (x86_64-win32-seh-rev0, build by mingw-w64 project) 8.1.0 compiled successfully

Original is not easy, please attach a reprint Original link Oh~
Tisfy: https://letmefly.blog.csdn.net/article/details/123068245

Topics: html