Ordinary Verification Code Recognition Based on Baidu Picture Recognition orc

Posted by plasko on Tue, 28 May 2019 00:07:28 +0200

When crawling websites, we encounter verification codes, so what methods do we have for the program to automatically identify verification codes? In fact, there are many coding platforms on the Internet, but all of them need money. But it is a waste to access the coding platform only by crawling data points. So Baidu's free orc is just available. (Free 500 times a day)

1. Register Baidu Account, Baidu Cloud Management Center to create applications, generate AppKey, SecretKey (Program Call Interface is to generate access_token)

2. Generating access_token with AppKey and SecretKey
To Authorization Service Address https://aip.baidubce.com/oaut... Send the request (POST is recommended) with the following parameters in the URL:
grant_type: Must parameter, fixed to client_credentials;
client_id: Must parameter, API Key applied;
client_secret: Required parameter, Secret Key applied
The code is as follows:

/**
 1. Get AccessToken
 2. APIKey:
 3. SecretKey:
 4. @return
 */
public static String getAccessToken() {
    String accessToken = "";
    HttpRequestData httpRequestData = new HttpRequestData();
    HashMap<String, String> params = new HashMap<>();
    params.put("grant_type", "client_credentials");
    params.put("client_id", "Your APIKey");
    params.put("client_secret", "SecretKey");
    httpRequestData.setRequestMethod("GET");
    httpRequestData.setParams(params);
    httpRequestData.setRequestUrl("https://aip.baidubce.com/oauth/2.0/token");
    HttpResponse response = HttpClientUtils.execute(httpRequestData);
    String json = "";
    try {
        json = IOUtils.toString(response.getEntity().getContent());
    } catch (IOException e) {
        e.printStackTrace();
    }
    if (response.getStatusLine().getStatusCode() == 200) {
        JSONObject jsonObject = JSONObject.parseObject(json);
        if (jsonObject != null && !jsonObject.isEmpty()) {
            accessToken = jsonObject.getString("access_token");
        }
    }
    return accessToken;
}

3. Request Baidu orc Universal Character Recognition API (take Baidu Universal Recognition API recognition as an example below)
The URL of the request API https://aip.baidubce.com/rest...
Request method POST
Request URL parameter access_token
Header Content-Type application/x-www-form-urlencoded
Request parameters are placed in Body. Details of the main parameters are as follows:

  • Image: image data, base64 encoding, require Base64 encoding size not exceed 4M, the shortest side at least 15px, the longest side 4096px, support jpg/png/bmp format, when the image field exists, the url field is invalid

  • url: The complete image url, the length of the url does not exceed 1024 bytes, the size of the corresponding picture base64 after encoding does not exceed 4M, the shortest side at least 15px, the longest side at 4096px, support jpg/png/bmp format, when the image field exists, the url field is invalid

/**
 * Obtain identification verification code
 * @param imageUrl
 * @return
 */
public static String OCRVCode(String imageUrl){
    String VCode = "";

    if (StringUtils.isBlank(ACCESS_TOKEN)) {
        logger.error("accessToken Empty");
        return VCode;
    }
    OCRUrl = OCRUrl + "?access_token=" + ACCESS_TOKEN;

    HashMap<String, String> headers = new HashMap<>();
    headers.put("Content-Type", "application/x-www-form-urlencoded");

    HashMap<String, String> params = new HashMap<>();
    imageUrl = ImageBase64ToStringUtils.imageToStringByBase64(imageUrl);
    params.put("image", imageUrl);

    HttpRequestData httpRequestData = new HttpRequestData();
    httpRequestData.setHeaders(headers);
    httpRequestData.setRequestMethod("post");
    httpRequestData.setParams(params);
    httpRequestData.setRequestUrl(OCRUrl);
    HttpResponse response = HttpClientUtils.execute(httpRequestData);
    String json = "";
    if (response.getStatusLine().getStatusCode() == 200) {
        try {
            json = IOUtils.toString(response.getEntity().getContent());
            JSONObject jsonObject = JSONObject.parseObject(json);
            JSONArray wordsResult = jsonObject.getJSONArray("words_result");
            VCode = wordsResult.getJSONObject(0).getString("words");
        } catch (IOException e) {
            logger.error("Request recognition failed!", e);
        }
    }
    return VCode;
}

base64 encoding characters for pictures

/**
 * Base64-bit encoding of local pictures
 * @param imageFile
 * @return
 */
public static String encodeImgageToBase64(String imageFile) {
    // Base64 encoding processing
    byte[] data = null;
    // Read image byte array
    try {
        InputStream in = new FileInputStream(imageFile);
        data = new byte[in.available()];
        in.read(data);
        in.close();
    } catch (IOException e) {
        e.printStackTrace();
    }

    // Base64 encoding for byte array
    return Base64Util.encode(data);
}
4,Returns the result as follows json Way back
{
    "log_id": 2471272194,
        "words_result_num": 2,
        "words_result": 
[
    {"words": " TSINGTAO"},
    {"words": "Qingdao wine"}
]
}

Project github address: https://github.com/xwlmdd/ipP...
Note: orc Picture Recognition Module is a tool class in this project

My public number, like friends can pay attention to Oh

Topics: Java JSON encoding github REST