Baidu Map POI Data Acquisition

Posted by aswini_1978 on Sun, 12 May 2019 04:35:23 +0200

This paper mainly introduces the process of Baidu Map POI data acquisition and subsequent processing.There are two main steps in POI data acquisition and subsequent processing, namely

  • POI data acquisition: get POI data from Baidu Map and save it in json format;
  • EXCEL import of data: Convert data saved in json format to excel file.

The principles of POI data acquisition can also be used as a reference. Zero Foundation Master Baidu Map Points of Interest Obtain POI Crawler (python language crawl) (Foundation).

POI Data Acquisition

Baidu Map POI data can be obtained from the API provided by Baidu Map - point of interest coordinates.The POI information obtained includes name, latitude and longitude coordinates, address, etc. The specific interface usage instructions can be referred to in the API instructions of Baidu Map WEB Service Location Retrieval.

From the documentation, we can see that the key to obtaining POI data is to construct an appropriate url that can be accessed to request the corresponding POI data.Therefore, we first give a detailed description of url in Baidu Map WEB Service api.

http://api.map.baidu.com/place/v2/search?query=bank&bounds=39.915,116.404,39.975,116.414&output=json&ak={your key} //GET request

The above is an example of a search url provided by Baidu Map Description Document, which can be divided into the following parts:

  • Prefix section: This section is required for the requested url regardless of the search, the data format required
    http://api.map.baidu.com/place/v2/search?

  • Parameters section: Customize the requested data, you can specify specific keywords, search area, output type, and your ak (access key)
    query=bank&bounds=39.915,116.404,39.975,116.414&output=json&ak={your key}

The prefix section does not require much explanation to be consistent across all requests, while the parameter section affects the results of the search and needs to be detailed.Since Baidu Map provides three POI search modes, namely, administrative division area search, peripheral search and rectangular area search, but these searches differ only in some parameters, most of the parameters are the same, and the results returned are the same. This paper only takes rectangular area search request parameters as an example:

Parameter Name Parameter Meaning type Example Is it necessary
query Retrieve keywords.Perimeter and rectangular area searches support multiple keyword union searches, with different keywords separated by a $symbol and up to 10 keyword searches.For example: "Bank$Hotel" string(45) Tiananmen Mandatory
bounds Retrieve rectangular areas, separated by "," between groups of coordinates string(50) 38.76623,116.43213,39.54321,116.46773 lat, LNG (lower left coordinate), lat, LNG (upper right coordinate) Mandatory
output Output format is json or xml string(50) json or xml Optional
scope Retrieve results in detail.A value of 1 or empty returns basic information; a value of 2 returns retrieving POI details string(50) 1,2 Optional
page_size The number of POI s recalled per recall, defaulting to 10 records, returns a maximum of 20.When retrieving multiple keywords, the number of records returned is the number of keywords *page_size. int 10 Optional
page_num Page number, default 0,0 for the first page, 1 for the second page, and so on.It is often used with page_size. int 0,1,2 Optional
coord_type Coordinate type, 1 (wgs84ll is GPS latitude and longitude), 2 (gcj02ll is National Bureau of Survey latitude and longitude coordinate), 3 (bd09ll is Baidu latitude and longitude coordinate), 4 (bd09mc is Baidu metre coordinate) Note: "ll is lowercase LL" int 1, 2, 3 (default), 4 Optional
ret_coordtype Optional parameter, POI returns to National Survey longitude and latitude coordinates after adding string(50) gcj02ll Optional
ak Developer's access key, required.Prior to v2, this property was key. string(50) Mandatory

Return parameters

Name type Explain
status int This API access status returns 0 if successful and other numbers if failed.(see Service Status Code)
total int Total number of POI retrieves, total field appears only if page_num field is set in the developer request.For data protection purposes, the total for a single request is up to 400.
name string poi name
location object poi latitude and longitude coordinates
address string poi address information

Of particular note are:

  1. In order to protect the data, Baidu Map makes a single request for total up to 400, which means only 400 results can be searched. If the search results are more than 400, only 400 records can be displayed.
  2. Baidu Maps provides developers with a quota of 2,000 requests per day with a concurrent access limit of 120.

The first problem can be solved by dividing the sub-search area, dividing the rectangular area to be searched into smaller rectangular areas, and merging their search results to get the desired search results.

The second problem is solved by applying multiple ak, alternating them, and slowing down the request.

The code for the final implementation is as follows:

# -*- coding: utf-8 -*- 
# The first line must have, otherwise the Chinese character is not an ascii code error

import urllib
import json
import time

#ak needs to apply for Baidu Map Open Platform
ak = "XXX"

#Key word
query=["social welfare institute"]
page_size=20
page_num=0
scope=1

#Range:
#Lower left coordinate 30.379,114.118
#Upper right coordinates 30.703,114.665
#Intermediate coordinates 30.541,114.3915

bounds=[
    [30.379,114.118,30.541,114.3915],
    [30.379,114.3915,30.541,114.665],
    [30.541,114.118,30.703,114.3915],
    [30.541,114.3915,30.703,114.665]
]

new_bounds = []
# col_row subdivides each block of bounds into 3 rows and 3 columns to prevent the maximum number of searches within an area of 400
col_row = 3 
for lst in bounds:
    distance_lat = (lst[2] - lst[0])/col_row
    distance_lon = (lst[3] - lst[1])/col_row
    for i in range(col_row):
        for j in range(col_row):
            lst_temp = []
            lst_temp.append(lst[0]+distance_lat*i)
            lst_temp.append(lst[1]+distance_lon*j)
            lst_temp.append(lst[0]+distance_lat*(i+1))
            lst_temp.append(lst[1]+distance_lon*(j+1))
            new_bounds.append(lst_temp)

queryResults = []

for bound in new_bounds:
    np=True
    a=[]
    while np==True:
        #Use the url splicing conditions provided by Baidu
        url="http://api.map.baidu.com/place/v2/search?ak="+str(ak)+"&output=json&query="+str(query[0])+"&page_size="+str(page_size)+"&page_num="+str(page_num)+"&bounds="+str(bound[0])+","+str(bound[1])+","+str(bound[2])+","+str(bound[3])

        #Request url read to create web page object
        jsonf=urllib.urlopen(url)
        page_num=page_num+1
        jsonfile=jsonf.read()

        #Determine the query paging process
        s=json.loads(jsonfile)
        total=int(s["total"])
        a.append(total)

        queryResults.append(s)

        max_page=int(a[0]/page_size)+1
        #Prevent high concurrency, Baidu Map requires less than 120 concurrency
        time.sleep(1) 

        if page_num>max_page:
            np=False
            page_num=0
            print "search complete"
            print "output: "+str(bound)
            print "total: "+str(a[0])
            print ("")

results=open(".\results.txt",'a')
results.write(str(queryResults).decode('unicode_escape'))
results.close()
print "ALL DONE!"

The result is saved in results.txt, but due to character encoding problems, the result has the exception character u', which needs to be replaced all manually with'.The search results from result.txt are then copied to the result.js file, with [and] symbols appended at the beginning of the file to form an array of objects to facilitate subsequent traversal when importing into Excel.

EXCEL Import of Data

EXCEL is imported by traversing all the objects in the object array and constructing tables that resemble the table structure in HTML, then converting tables to Excel files in a specific way.The code is as follows:

<!DOCTYPE html>

<head>
    <meta http - equiv="content-type" content="text/html; charset=utf-8">
    <tile>ARRAY TO EXCEL</tile>
    <script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
    <script src='Home of respect for the aged.js'></script>
    <script src='Senior apartment.js'></script>
    <script src='Home for the Aged.js'></script>
    <script src='social welfare institute.js'></script>
    <script src='Community Health Station.js'></script>
    <script src='Community Health Centre.js'></script>
    <script src='Community hospitals.js'></script>
    <script src='Service Centers for the Aged.js'></script>
    <script src='Old Age Institutions.js'></script>
    <script src='Beadhouse.js'></script>
</head>

<body>
    <input type="button" id="wwo" value="export" />

    <script type="text/javascript">
        $(document).ready(function() {
            $('#wwo').click(function() {
               ArrayToExcelConvertor(jly, "Home of respect for the aged");
               ArrayToExcelConvertor(lngy, "Senior apartment");
               ArrayToExcelConvertor(lnzj, "Home for the Aged");
               ArrayToExcelConvertor(shfly, "social welfare institute");
               ArrayToExcelConvertor(sqwsfwz, "Community Health Station");
               ArrayToExcelConvertor(sqyy, "Community hospitals");
               ArrayToExcelConvertor(ylfwzx, "Old Age Service Centers");
               ArrayToExcelConvertor(yljg, "Old Age Institutions");
               ArrayToExcelConvertor(yly, "Beadhouse");
            });
        });

        function ArrayToExcelConvertor(Data, FileName) {
            var excel = '<table>';
            var row = "";
            for (var i = 0; i < Data.length; i++) {
                if (Data[i].results.length > 0) {
                    for (var j = 0; j < Data[i].results.length; j++) {
                        var name = Data[i].results[j].name;
                        var lng = Data[i].results[j].location.lng;
                        var lat = Data[i].results[j].location.lat;
                        var addr = Data[i].results[j].address;
                        row += '<tr>';
                        row += '<td>' + name + '</td>';
                        row += '<td>' + lng + '</td>';
                        row += '<td>' + lat + '</td>';
                        row += '<td>' + addr + '</td>';
                        row += "</tr>";
                    }
                }
            }
            excel += row + "</table>";

            var excelFile = "<html xmlns:o='urn:schemas-microsoft-com:office:office' xmlns:x='urn:schemas-microsoft-com:office:excel' xmlns='http://www.w3.org/TR/REC-html40'>";
            excelFile += '<meta http-equiv="content-type" content="application/vnd.ms-excel; charset=UTF-8">';
            excelFile += '<meta http-equiv="content-type" content="application/vnd.ms-excel';
            excelFile += '; charset=UTF-8">';
            excelFile += "<head>";
            excelFile += "<!--[if gte mso 9]>";
            excelFile += "<xml>";
            excelFile += "<x:ExcelWorkbook>";
            excelFile += "<x:ExcelWorksheets>";
            excelFile += "<x:ExcelWorksheet>";
            excelFile += "<x:Name>";
            excelFile += "{worksheet}";
            excelFile += "</x:Name>";
            excelFile += "<x:WorksheetOptions>";
            excelFile += "<x:DisplayGridlines/>";
            excelFile += "</x:WorksheetOptions>";
            excelFile += "</x:ExcelWorksheet>";
            excelFile += "</x:ExcelWorksheets>";
            excelFile += "</x:ExcelWorkbook>";
            excelFile += "</xml>";
            excelFile += "<![endif]-->";
            excelFile += "</head>";
            excelFile += "<body>";
            excelFile += excel;
            excelFile += "</body>";
            excelFile += "</html>";
            var uri = 'data:application/vnd.ms-excel;charset=utf-8,' + encodeURIComponent(excelFile);
            var link = document.createElement("a");
            link.href = uri;
            link.style = "visibility:hidden";
            link.download = FileName + ".xls";
            document.body.appendChild(link);
            link.click();
            document.body.removeChild(link);
        }

    </script>
</body>

</html>

Topics: Excel JSON xml JQuery