This paper mainly introduces the process of Baidu Map POI data acquisition and subsequent processing.There are two main steps in POI data acquisition and subsequent processing, namely
- POI data acquisition: get POI data from Baidu Map and save it in json format;
- EXCEL import of data: Convert data saved in json format to excel file.
The principles of POI data acquisition can also be used as a reference. Zero Foundation Master Baidu Map Points of Interest Obtain POI Crawler (python language crawl) (Foundation).
POI Data Acquisition
Baidu Map POI data can be obtained from the API provided by Baidu Map - point of interest coordinates.The POI information obtained includes name, latitude and longitude coordinates, address, etc. The specific interface usage instructions can be referred to in the API instructions of Baidu Map WEB Service Location Retrieval.
From the documentation, we can see that the key to obtaining POI data is to construct an appropriate url that can be accessed to request the corresponding POI data.Therefore, we first give a detailed description of url in Baidu Map WEB Service api.
http://api.map.baidu.com/place/v2/search?query=bank&bounds=39.915,116.404,39.975,116.414&output=json&ak={your key} //GET request
The above is an example of a search url provided by Baidu Map Description Document, which can be divided into the following parts:
Prefix section: This section is required for the requested url regardless of the search, the data format required
http://api.map.baidu.com/place/v2/search?Parameters section: Customize the requested data, you can specify specific keywords, search area, output type, and your ak (access key)
query=bank&bounds=39.915,116.404,39.975,116.414&output=json&ak={your key}
The prefix section does not require much explanation to be consistent across all requests, while the parameter section affects the results of the search and needs to be detailed.Since Baidu Map provides three POI search modes, namely, administrative division area search, peripheral search and rectangular area search, but these searches differ only in some parameters, most of the parameters are the same, and the results returned are the same. This paper only takes rectangular area search request parameters as an example:
Parameter Name | Parameter Meaning | type | Example | Is it necessary |
---|---|---|---|---|
query | Retrieve keywords.Perimeter and rectangular area searches support multiple keyword union searches, with different keywords separated by a $symbol and up to 10 keyword searches.For example: "Bank$Hotel" | string(45) | Tiananmen | Mandatory |
bounds | Retrieve rectangular areas, separated by "," between groups of coordinates | string(50) | 38.76623,116.43213,39.54321,116.46773 lat, LNG (lower left coordinate), lat, LNG (upper right coordinate) | Mandatory |
output | Output format is json or xml | string(50) | json or xml | Optional |
scope | Retrieve results in detail.A value of 1 or empty returns basic information; a value of 2 returns retrieving POI details | string(50) | 1,2 | Optional |
page_size | The number of POI s recalled per recall, defaulting to 10 records, returns a maximum of 20.When retrieving multiple keywords, the number of records returned is the number of keywords *page_size. | int | 10 | Optional |
page_num | Page number, default 0,0 for the first page, 1 for the second page, and so on.It is often used with page_size. | int | 0,1,2 | Optional |
coord_type | Coordinate type, 1 (wgs84ll is GPS latitude and longitude), 2 (gcj02ll is National Bureau of Survey latitude and longitude coordinate), 3 (bd09ll is Baidu latitude and longitude coordinate), 4 (bd09mc is Baidu metre coordinate) Note: "ll is lowercase LL" | int | 1, 2, 3 (default), 4 | Optional |
ret_coordtype | Optional parameter, POI returns to National Survey longitude and latitude coordinates after adding | string(50) | gcj02ll | Optional |
ak | Developer's access key, required.Prior to v2, this property was key. | string(50) | Mandatory |
Return parameters
Name | type | Explain |
---|---|---|
status | int | This API access status returns 0 if successful and other numbers if failed.(see Service Status Code) |
total | int | Total number of POI retrieves, total field appears only if page_num field is set in the developer request.For data protection purposes, the total for a single request is up to 400. |
name | string | poi name |
location | object | poi latitude and longitude coordinates |
address | string | poi address information |
Of particular note are:
- In order to protect the data, Baidu Map makes a single request for total up to 400, which means only 400 results can be searched. If the search results are more than 400, only 400 records can be displayed.
- Baidu Maps provides developers with a quota of 2,000 requests per day with a concurrent access limit of 120.
The first problem can be solved by dividing the sub-search area, dividing the rectangular area to be searched into smaller rectangular areas, and merging their search results to get the desired search results.
The second problem is solved by applying multiple ak, alternating them, and slowing down the request.
The code for the final implementation is as follows:
# -*- coding: utf-8 -*-
# The first line must have, otherwise the Chinese character is not an ascii code error
import urllib
import json
import time
#ak needs to apply for Baidu Map Open Platform
ak = "XXX"
#Key word
query=["social welfare institute"]
page_size=20
page_num=0
scope=1
#Range:
#Lower left coordinate 30.379,114.118
#Upper right coordinates 30.703,114.665
#Intermediate coordinates 30.541,114.3915
bounds=[
[30.379,114.118,30.541,114.3915],
[30.379,114.3915,30.541,114.665],
[30.541,114.118,30.703,114.3915],
[30.541,114.3915,30.703,114.665]
]
new_bounds = []
# col_row subdivides each block of bounds into 3 rows and 3 columns to prevent the maximum number of searches within an area of 400
col_row = 3
for lst in bounds:
distance_lat = (lst[2] - lst[0])/col_row
distance_lon = (lst[3] - lst[1])/col_row
for i in range(col_row):
for j in range(col_row):
lst_temp = []
lst_temp.append(lst[0]+distance_lat*i)
lst_temp.append(lst[1]+distance_lon*j)
lst_temp.append(lst[0]+distance_lat*(i+1))
lst_temp.append(lst[1]+distance_lon*(j+1))
new_bounds.append(lst_temp)
queryResults = []
for bound in new_bounds:
np=True
a=[]
while np==True:
#Use the url splicing conditions provided by Baidu
url="http://api.map.baidu.com/place/v2/search?ak="+str(ak)+"&output=json&query="+str(query[0])+"&page_size="+str(page_size)+"&page_num="+str(page_num)+"&bounds="+str(bound[0])+","+str(bound[1])+","+str(bound[2])+","+str(bound[3])
#Request url read to create web page object
jsonf=urllib.urlopen(url)
page_num=page_num+1
jsonfile=jsonf.read()
#Determine the query paging process
s=json.loads(jsonfile)
total=int(s["total"])
a.append(total)
queryResults.append(s)
max_page=int(a[0]/page_size)+1
#Prevent high concurrency, Baidu Map requires less than 120 concurrency
time.sleep(1)
if page_num>max_page:
np=False
page_num=0
print "search complete"
print "output: "+str(bound)
print "total: "+str(a[0])
print ("")
results=open(".\results.txt",'a')
results.write(str(queryResults).decode('unicode_escape'))
results.close()
print "ALL DONE!"
The result is saved in results.txt, but due to character encoding problems, the result has the exception character u', which needs to be replaced all manually with'.The search results from result.txt are then copied to the result.js file, with [and] symbols appended at the beginning of the file to form an array of objects to facilitate subsequent traversal when importing into Excel.
EXCEL Import of Data
EXCEL is imported by traversing all the objects in the object array and constructing tables that resemble the table structure in HTML, then converting tables to Excel files in a specific way.The code is as follows:
<!DOCTYPE html>
<head>
<meta http - equiv="content-type" content="text/html; charset=utf-8">
<tile>ARRAY TO EXCEL</tile>
<script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
<script src='Home of respect for the aged.js'></script>
<script src='Senior apartment.js'></script>
<script src='Home for the Aged.js'></script>
<script src='social welfare institute.js'></script>
<script src='Community Health Station.js'></script>
<script src='Community Health Centre.js'></script>
<script src='Community hospitals.js'></script>
<script src='Service Centers for the Aged.js'></script>
<script src='Old Age Institutions.js'></script>
<script src='Beadhouse.js'></script>
</head>
<body>
<input type="button" id="wwo" value="export" />
<script type="text/javascript">
$(document).ready(function() {
$('#wwo').click(function() {
ArrayToExcelConvertor(jly, "Home of respect for the aged");
ArrayToExcelConvertor(lngy, "Senior apartment");
ArrayToExcelConvertor(lnzj, "Home for the Aged");
ArrayToExcelConvertor(shfly, "social welfare institute");
ArrayToExcelConvertor(sqwsfwz, "Community Health Station");
ArrayToExcelConvertor(sqyy, "Community hospitals");
ArrayToExcelConvertor(ylfwzx, "Old Age Service Centers");
ArrayToExcelConvertor(yljg, "Old Age Institutions");
ArrayToExcelConvertor(yly, "Beadhouse");
});
});
function ArrayToExcelConvertor(Data, FileName) {
var excel = '<table>';
var row = "";
for (var i = 0; i < Data.length; i++) {
if (Data[i].results.length > 0) {
for (var j = 0; j < Data[i].results.length; j++) {
var name = Data[i].results[j].name;
var lng = Data[i].results[j].location.lng;
var lat = Data[i].results[j].location.lat;
var addr = Data[i].results[j].address;
row += '<tr>';
row += '<td>' + name + '</td>';
row += '<td>' + lng + '</td>';
row += '<td>' + lat + '</td>';
row += '<td>' + addr + '</td>';
row += "</tr>";
}
}
}
excel += row + "</table>";
var excelFile = "<html xmlns:o='urn:schemas-microsoft-com:office:office' xmlns:x='urn:schemas-microsoft-com:office:excel' xmlns='http://www.w3.org/TR/REC-html40'>";
excelFile += '<meta http-equiv="content-type" content="application/vnd.ms-excel; charset=UTF-8">';
excelFile += '<meta http-equiv="content-type" content="application/vnd.ms-excel';
excelFile += '; charset=UTF-8">';
excelFile += "<head>";
excelFile += "<!--[if gte mso 9]>";
excelFile += "<xml>";
excelFile += "<x:ExcelWorkbook>";
excelFile += "<x:ExcelWorksheets>";
excelFile += "<x:ExcelWorksheet>";
excelFile += "<x:Name>";
excelFile += "{worksheet}";
excelFile += "</x:Name>";
excelFile += "<x:WorksheetOptions>";
excelFile += "<x:DisplayGridlines/>";
excelFile += "</x:WorksheetOptions>";
excelFile += "</x:ExcelWorksheet>";
excelFile += "</x:ExcelWorksheets>";
excelFile += "</x:ExcelWorkbook>";
excelFile += "</xml>";
excelFile += "<![endif]-->";
excelFile += "</head>";
excelFile += "<body>";
excelFile += excel;
excelFile += "</body>";
excelFile += "</html>";
var uri = 'data:application/vnd.ms-excel;charset=utf-8,' + encodeURIComponent(excelFile);
var link = document.createElement("a");
link.href = uri;
link.style = "visibility:hidden";
link.download = FileName + ".xls";
document.body.appendChild(link);
link.click();
document.body.removeChild(link);
}
</script>
</body>
</html>