OCR (text recognition) function and ASR (speech recognition) java application development (based on Baidu Intelligent Cloud)

Posted by nev25 on Fri, 11 Feb 2022 21:51:59 +0100

Baidu cloud official website:

Baidu Intelligent Cloud - Intelligent era infrastructure Baidu intelligent cloud focus on cloud computing, intelligent big data, artificial intelligence services, provide stable cloud server, cloud hosting, cloud storage, CDN, domain name registration, Internet of things and other cloud services, support API docking, quick filing and other professional solutions.https://cloud.baidu.com/

1, OCR (character recognition) function

First, login the Baidu cloud account on Baidu Intelligent Cloud official website, click the management console and click the word recognition:

Click create application and fill in as required. Pay attention to select the interface you need in the interface selection. After setting, click Create immediately:

After successful creation, you can view the AppID, API Key and Secret Key of the application in the application list:

 

These three parameters will be used in the project to connect this application:

java project writing method:

public class GeneralRecognition {
        //Set APPID/AK/SK
        public static final String APP_ID = "";
        public static final String API_KEY = "";
        public static final String SECRET_KEY = "";
        private static AipOcr client = null;

        public static void main(String[] args) throws IOException, URISyntaxException {

            File file = new File(chooseFile());
            Desktop desktop = Desktop.getDesktop();
            desktop.open(file);
//            URI uri = new URI("E:\\");
//            desktop.browse(uri);
            dis(file.getPath());
        }

        //Select file to upload
        public static String chooseFile() {
            FileSystemView fsv = FileSystemView.getFileSystemView();

            JFileChooser fileChooser = new JFileChooser();
            fileChooser.setCurrentDirectory(fsv.getHomeDirectory());
            fileChooser.setDialogTitle("Please select the file to upload...");
            fileChooser.setApproveButtonText("determine");
            fileChooser.setFileSelectionMode(JFileChooser.FILES_ONLY);

            int result = fileChooser.showOpenDialog(null);

            if (JFileChooser.APPROVE_OPTION == result) {
                String path = fileChooser.getSelectedFile().getPath();
                return path;
            }
            return "Can't find";
        }

        public static void init(){
            // Initialize an AipOcr
            if(client == null){
                client = new AipOcr(APP_ID, API_KEY, SECRET_KEY);
            }
           
            // Optional: set network connection parameters
            client.setConnectionTimeoutInMillis(2000);
            client.setSocketTimeoutInMillis(60000);
        }

        //Common character recognition
        public static void dis(String path){
            init();
            // Call interface by passing in optional parameters
            HashMap<String, String> options = new HashMap<>();
            options.put("language_type", "CHN_ENG");
            options.put("detect_direction", "true");
            options.put("detect_language", "true");
            options.put("probability", "true");

          //The parameter is the local picture path
        JSONObject res = client.basicGeneral(path, options);
        System.out.println(res.toString(2));
}

There is a problem when calling the interface in the middle:

[main] INFO com.baidu.aip.client.BaseClient - get access_token success. current state: STATE_AIP_AUTH_OK
{
  "error_msg": "No permission to access data",
  "error_code": 6
}

Process finished with exit code 0

The reason is that there is no permission to use the method (API).

Error messages like this can be viewed in the error message of the application:

 

Solution steps:

1. Enter the application list, as shown below:

2. Click Manage and edit successively. In addition to the interface checked by default in this application, and then check other interfaces that need to be used, you can also click to get free interface permission:

 

Note: some interfaces need some authentication. For example, the public security authentication interface and the ID card and name comparison interface need enterprise authentication before submitting the enterprise authentication. After the authentication is passed, you need to apply in the console - face recognition - offline collection SDK management office according to the process before you can use it. After passing the authentication, you will automatically open the interface permission for you, Generally, it is automatically approved within 2 hours.  

3. Click Save to modify and call again to solve the problem.

After receiving or applying for other permissions for free or paying, you can use the APIs of related functions, and you can also view the usage of related APIs:

2, ASR (speech recognition) function

The steps are similar to the character recognition steps above. First find the character recognition or speech recognition module on the console, and then create an application in the corresponding function module. When or after creating, pay attention to configuring the interface permissions to ensure that the corresponding API can be called normally later. Each application has three important parameters: app ID, API key and secret key, Configure these three parameters into the project. The following is the code of asr speech recognition project:

public class MandarinRecognition {
        //Set APPID/AK/SK
        public static final String APP_ID = "";
        public static final String API_KEY = "";
        public static final String SECRET_KEY = "";
        private static AipSpeech client = null;

        public static void main(String[] args) throws IOException, URISyntaxException {

            File file = new File(chooseFile());
//            Desktop desktop = Desktop.getDesktop();
//            desktop.open(file);
//            URI uri = new URI("E:\\");
//            desktop.browse(uri);
            System.out.println("Preparing for output..");
            String outPutPath = "template/asrOutput.txt";
            dis(file.getPath(),outPutPath);
        }

        //Select file to upload
        public static String chooseFile() {
            FileSystemView fsv = FileSystemView.getFileSystemView();

            JFileChooser fileChooser = new JFileChooser();
            fileChooser.setCurrentDirectory(fsv.getHomeDirectory());
            fileChooser.setDialogTitle("Please select the file to upload...");
            fileChooser.setApproveButtonText("determine");
            fileChooser.setFileSelectionMode(JFileChooser.FILES_ONLY);

            int result = fileChooser.showOpenDialog(null);

            if (JFileChooser.APPROVE_OPTION == result) {
                String path = fileChooser.getSelectedFile().getPath();
                return path;
            }
            return "Can't find";
        }

        public static void init(){
            // Initialize an AipSpeech
            if(client == null){
                client = new AipSpeech(APP_ID, API_KEY, SECRET_KEY);
            }
        }

        //Common character recognition
        public static void dis(String imgPath, String outPutPath) throws IOException {
            init();
            // Call interface by passing in optional parameters
            HashMap<String, Object> options = new HashMap<>();
            options.put("dev_pid",1537);

            //The parameter is the local picture path
            System.out.println(imgPath);
            /**
             * The audio format of the original pcm must comply with 16k sampling rate, 16bit depth and mono. Supported formats are: PCM (uncompressed), wav (uncompressed, PCM encoded), amr (compressed format).
             * Support up to 60s recording files. There is no limit on the file size, only the length of time.
             */
            System.out.println(client.asr(imgPath, "pcm", 16000, options));
        }
    }

Output result:

Preparing for output..
E:\16k.pcm
[main] INFO com.baidu.aip.client.BaseClient - get access_token success. current state: STATE_AIP_AUTH_OK
{"result":["Beijing Science and Technology Museum."],"err_msg":"success.","sn":"238256483091644572246","corpus_no":"7063384013687529084","err_no":0}

Process finished with exit code 0

Topics: Java Back-end AI cloud computing