Text currency amount extraction and analysis, jiunlp

Posted by Hatdrawn on Mon, 25 Oct 2021 13:56:26 +0200

Given a piece of text, extract the monetary amount string, and standardize all amounts.

Jianlp Chinese preprocessing and parsing Toolkithttps://github.com/dongrixinyu/JioNLP Where, jio.ner.extract_money and jio.parse_money can extract monetary amounts from a piece of text and standardize the results. Let's take an example:

Given a paragraph of text, such as:

HNA lost HK $70 million on the sale of Hong Kong apartments. On December 12, according to the Hong Kong Economic Daily, HNA Group sold part of the property in the Yoo Residence building in Causeway Bay, Hong Kong at a price of HK $260 million. Compared with the price of HK $330 million at the beginning of last year, HNA sold the property in the mode of equity transfer, resulting in a loss of more than HK $70 million. The property includes a penthouse luxury apartment, a layered property and five parking spaces. It is reported that two months ago, HNA sought buyers for this part of the property in the market and once offered tens of millions of dollars. In addition, a few months ago, HNA sold a shop on the first floor of the ground floor, which was bought at the same time last year, at a price of HK $86.5 million. The buyer was a Hong Kong company called Rongqi, which lost about HK $33.5 million compared with the price of nearly HK $120 million last year.
    From this point of view, HNA investment Yoo Residence lost more than HK $100 million in one year. Since this year, HNA has continuously sold its real estate assets in Hong Kong. In February, HNA Group sold plots 6565 and 6562 in Kai Tak District, Hong Kong to Hong Kong Henderson Zhaoye real estate (00012.HK) for HK $15.959 billion, with a share price of 23.40 yuan. In March, HNA sold No. 6564 of the New Kowloon inland section at No. 1 site, 1L area, Kai Tak, Kowloon to huidefeng (00020.HK) at a price of HK $6.359 billion.
    It raised HK $5047 million five months ago. In addition to selling the land, HNA also sold an office in Admiralty, Hong Kong. On March 21, according to Ming Pao, a local media in Hong Kong, HNA sold an office located in Hong Kong Jinzhong Libao center today, with a transaction price of more than HK $40 million and a unit price of HK $28000 / square foot (equivalent to 243300 yuan / square meter), nearly 20% lower than the market value of the property of HK $38000 / square foot. Up to now, HNA has cashed out at least HK $22.7 billion from the sale of real estate properties in Hong Kong.
import jionlp as jio

text = "maritime aviation..."

results = jio.ner.extract_money(text, with_parsing=True)
for item in results:
    print(item)

 

{'text': '7000 HK $10000', 'offset': [4, 11], 'type': 'money', 'detail': {'num': '70000000.00', 'case': 'Hong Kong dollar', 'definition': 'accurate'}}
{'text': '2.6 HK $billion', 'offset': [83, 89], 'type': 'money', 'detail': {'num': '260000000.00', 'case': 'Hong Kong dollar', 'definition': 'accurate'}}
{'text': '3.3 HK $billion', 'offset': [103, 109], 'type': 'money', 'detail': {'num': '330000000.00', 'case': 'Hong Kong dollar', 'definition': 'accurate'}}
{'text': '7000 More than HK $million', 'offset': [149, 157], 'type': 'money', 'detail': {'num': ['70000000.00', '80000000.00'], 'case': 'Hong Kong dollar', 'definition': 'blur'}}
{'text': 'Tens of millions of dollars', 'offset': [225, 232], 'type': 'money', 'detail': {'num': ['10000000.00', '100000000.00'], 'case': 'dollar', 'definition': 'blur'}}
{'text': '8650 HK $10000', 'offset': [270, 277], 'type': 'money', 'detail': {'num': '86500000.00', 'case': 'Hong Kong dollar', 'definition': 'accurate'}}
{'text': 'Nearly 1.2 HK $billion', 'offset': [301, 308], 'type': 'money', 'detail': {'num': '120000000.00', 'case': 'Hong Kong dollar', 'definition': 'blur-'}}
{'text': 'About HK $33.5 million', 'offset': [316, 324], 'type': 'money', 'detail': {'num': '33500000.00', 'case': 'Hong Kong dollar', 'definition': 'blur'}}
{'text': 'Over HK $100 million', 'offset': [362, 367], 'type': 'money', 'detail': {'num': '100000000.00', 'case': 'Hong Kong dollar', 'definition': 'blur+'}}
{'text': '159.59 HK $billion', 'offset': [431, 440], 'type': 'money', 'detail': {'num': '15959000000.00', 'case': 'Hong Kong dollar', 'definition': 'accurate'}}
{'text': 'Twenty three yuan and forty cents', 'offset': [465, 472], 'type': 'money', 'detail': {'num': '23.00', 'case': 'element', 'definition': 'accurate'}}
{'text': '63.59 HK $billion', 'offset': [517, 525], 'type': 'money', 'detail': {'num': '6359000000.00', 'case': 'Hong Kong dollar', 'definition': 'accurate'}}
{'text': '50.47 HK $billion', 'offset': [565, 573], 'type': 'money', 'detail': {'num': '5047000000.00', 'case': 'Hong Kong dollar', 'definition': 'accurate'}}
{'text': '4000 More than HK $million', 'offset': [659, 667], 'type': 'money', 'detail': {'num': ['40000000.00', '50000000.00'], 'case': 'Hong Kong dollar', 'definition': 'blur'}}
{'text': '28000 Hong Kong dollar', 'offset': [673, 680], 'type': 'money', 'detail': {'num': '28000.00', 'case': 'Hong Kong dollar', 'definition': 'accurate'}}
{'text': '243300 element', 'offset': [688, 695], 'type': 'money', 'detail': {'num': '243300.00', 'case': 'element', 'definition': 'accurate'}}
{'text': '38000 Hong Kong dollar', 'offset': [719, 726], 'type': 'money', 'detail': {'num': '38000.00', 'case': 'Hong Kong dollar', 'definition': 'accurate'}}
{'text': 'At least HK $22.7 billion', 'offset': [757, 765], 'type': 'money', 'detail': {'num': '22700000000.00', 'case': 'Hong Kong dollar', 'definition': 'blur+'}}

It can be seen that the tool can accurately extract the monetary amount, and supports the following features:

-Support pure digital format, such as US $987273.3
-Support the amount in Chinese in words, such as seventy-six million three hundred and forty-four thousand three hundred and twenty-one yuan and five cents
-Support mixed format, such as HK $12600
-Support * * modifier * * analysis, such as nearly 60000 yuan and at least 1000 yuan
-Support * * fuzzy amount * * analysis, such as more than 20000 yuan, more than 600 billion yen
-Support * * spoken Chinese * * format, such as 35 yuan and 30 cents; However, there are * * ambiguities * * in the text for strings such as "thirty five yuan and eight sugars", so ` ` ` jio.ner.extract_money ` ` ` this string will not be extracted, but ` ` ` parse_money ` ` ` you can regard "35.8 yuan" as a complete colloquial amount, which is standardized as "35.80 yuan"
-Support a variety of common currency types: RMB, Hong Kong dollar, Macao dollar, US dollar, Japanese yen, Australian dollar, Korean won, ruble, British pound, mark, franc, euro, Canadian dollar, Thai baht, etc.

In addition, the tool has a very powerful Temporal semantic parsing abilityhttps://blog.csdn.net/dongrixinyu/article/details/120245280.

Topics: Python AI NLP Digital Currency