request/response to solve Chinese garbled code!!!

Posted by neridaj on Sun, 07 Nov 2021 05:09:20 +0100

Request Chinese garbled code problem and its solution

Add three knowledge points:

Get is the URL decoding method. The default decoding format is Tomcat encoding format. Therefore, the URL decoding is UTF-8, which overrides the request container decoding format
Post is the decoding method of entity content. The default decoding format is request encoding format and UTF-8. Independent of Tomcat encoding format
To solve the problem of request garbled code, use System.out.println(request.getCharacterEncoding()) in the code; See how the received request is encoded. My coding method is UTF-8, so there is no garbled code problem
The following provides a general solution to the problem of garbled code. If it has not been solved, you can try to solve it yourself in combination with the above three.

For the Chinese garbled code requested by Post:

// A setCharacterEncoding() method is provided in the HttpServletRequest interface, which is used to set the decoding method of the request object.
request.setCharacterEncoding("utf-8");  //Set the decoding method of the request object before all objects are obtained

Chinese garbled code during GET submission:

// The user name can be re encoded into bytes using the error code table ISO-8859-1, and then decoded using the code table UTF-8. Modify the RequestParamsServlet again
// Because the get request passes the value after the url, the value passed in is already in the encoding form of iso8859-1. All data should be re encoded into binary bytes according to iso8859-1, and then decoded in the form of utf-8
name = new String(name.getBytes("iso8859-1"),"utf-8");

Response Chinese garbled code problem and its solution

The following is a case to demonstrate the causes and solutions of the garbled code problem.

@WebServlet("/jsp/test")
public class TestServlet extends HttpServlet {

    public void doGet(HttpServletRequest request, HttpServletResponse response)throws ServletException, IOException {

        String data = "Muggle";
        PrintWriter out = response.getWriter();
        out.println(data);
    }
}

When this is not done, the Chinese displayed on the page is generally garbled

Because the data in the computer is stored in binary form, conversion between characters and bytes occurs when text data is transmitted. The conversion between characters and bytes is completed by looking up the code table. The process of converting characters into bytes is called encoding, and the process of converting bytes into characters is called decoding. If the code table used for encoding and decoding is inconsistent, it will lead to garbled code.

Knowing the principle is a good way to solve the problem of Chinese garbled code. First, check the default decoding table of your browser on the browser console document.charset. Here, I use Edge browser. The default is GBK, and then use response.getCharacterEncoding() in the code to check the code of your response to the browser. Here, I use UTF-8, Of course, UTF-8 is generally used for development, so the response is the coding table used when writing your own code. The essence of the garbled problem above is the Chinese encoded with UTF-8, and then use GBK to decode in the browser. Finally, there must be garbled problem.

Add several knowledge points:

The function of response.setCharacterEncoding("UTF-8") is to specify the encoding that the server responds to the browser. However, the browser will decode in the default way.
The function of response.setContentType("text/html;charset=utf-8") is to specify the encoding of the server response to the browser. At the same time, the browser re encodes (or decodes) the received data according to this parameter.

For sending data, the server encodes the data to be sent according to the priority of response.setCharacterEncoding < - contentType < - pageEncoding, that is, the priority of setCharacterEncoding is higher than that of contentType.

JSP After two "coding", the first stage will use pageEncoding，The second stage will be used utf-8 to utf-8，The third stage is by Tomcat Out of the web page, using contentType.  
The first stage is jsp Translation(translate)become.Java，It will be based on pageEncoding Setting reading of jsp，The results are translated by the specified coding scheme into a unified UTF-8 JAVA Source code (i.e.java)，If pageEncoding If the setting is wrong or not set, what comes out is Chinese garbled code. 
The second stage is from the source code(.java)Compile to bytecode file(.class)，Regardless JSP What coding scheme is used when writing, and the results of this stage are all UTF-8 of encoding of java Source code. 
JAVAC use UTF-8 of encoding read java Source code, compiled into UTF-8 encoding Binary code of (i.e.class)，This is JVM Binary code for constant digit string( java encoding)Specification of internal expression. 
The third stage is Tomcat(Or its application Container)From load and run phase II JAVA Binary code, the output result, which is seen at the client. At this time, the parameters hidden in phase 1 and phase 2 contentType It worked

There are three solutions:

The first method takes the browser as the standard and changes the coding method in the code to the coding method of the browser (not recommended, because the coding methods of different users' browsers are likely to be different)

@WebServlet("/jsp/test")
public class TestServlet extends HttpServlet {

    public void doGet(HttpServletRequest request, HttpServletResponse response)throws ServletException, IOException {
	// Here, I will return the response to the browser and keep the code consistent with the browser, so as to solve the problem of browser garbled code
	response.setCharacterEncoding("gbk");    //Set the Chinese of the response back to gbk encoding, and then the browser will use the default gbk decoding, so there will be no problem
        String data = "Muggle";
        PrintWriter out = response.getWriter();
        out.println(data);
    }
}

The second method takes the code as the standard and notifies the browser to change the decoding method to the coding method in the code. The general coding method is utf-8, which is better than the first method. However, some people may use the system default gbk for coding (but in this case, there may be no garbled code (￣▽ ￣) "), so there may also be problems

@WebServlet("/jsp/test")
public class TestServlet extends HttpServlet {

    public void doGet(HttpServletRequest request, HttpServletResponse response)throws ServletException, IOException {
        response.setContentType("text/html;charset=utf-8"); //Notify browser to use utf-8 decoding   
        // The effect of these two methods is the same. They both set the content type of the browser. Of course, there are some differences in the effect of different browsers. I use edge here
        // response.setHeader("Content-Type", "text/html;charset=utf-8");    
        String data = "Muggle";
        PrintWriter out = response.getWriter();
        out.println(data);
    }
}

Insert a test in the middle

@WebServlet("/jsp/test")
public class TestServlet extends HttpServlet {

    public void doGet(HttpServletRequest request, HttpServletResponse response)throws ServletException, IOException {
        System.out.println(response.getCharacterEncoding()); // UTF-8, here is the default uppercase UTF-8
        response.setContentType("text/html;charset=gbk");

        System.out.println(response.getCharacterEncoding());// gbk
        response.setCharacterEncoding("utf-8");

        System.out.println(response.getCharacterEncoding());// I modified it to lowercase utf-8

        String data = "Muggle";
        PrintWriter out = response.getWriter();
        out.println(data);
        // Finally, the browser console document.charset: 'UTF-8'
        // If setContentType and setCharacterEncoding are swapped, the browser console document.charset: 'GBK' will be used at last
        // This situation is different from some people on the Internet who say setCharacterEncoding has a high priority. I will take the actual measurement as the standard here because the priority is determined by the code sequence. Some leaders know the reason and ask for guidance
        // However, if setContentType is not set, setCharacterEncoding will only modify the encoding method in the code, not the browser's default decoding method
    }
}

As can be seen from the above tests, if setContentType is not set, setCharacterEncoding will not be effective

Theoretically, the unified coding mode is utf-8, which can basically solve all problems

@WebServlet("/jsp/test")
public class TestServlet extends HttpServlet {

    public void doGet(HttpServletRequest request, HttpServletResponse response)throws ServletException, IOException {
        
        response.setCharacterEncoding("utf-8");
        response.setContentType("text/html;charset=utf-8");

        String data = "Muggle";
        PrintWriter out = response.getWriter();
        out.println(data);
    }
}

However, this method has the same effect as the second method. Finally, the second method is recommended. Programmers are lazy. Writing less is yyds.

Topics: Java

Programmer Think

request/response to solve Chinese garbled code!!!

Hot Topics