Python's excellent flexibility and ease of use make it one of the most popular programming languages, especially for data processing and machine learning. Its powerful data processing library and algorithm library make Python the preferred language for introductory data science. In daily use, CSV, JSON and XML data formats dominate. Next, I will share the fast processing methods for three data formats.
Welcome to collect and learn. You can click three times to support those you like~
CSV data
CSV is the most common way to store data. Most of the data of the Kaggle game is stored in this way. We can use the built-in Python csv library to read and write CSV. Usually, we read the data into the list.
Look at the code below. When we run csv.reader(), all CSV data becomes accessible. The csvreader.next() function reads a row from CSV; Each time it is called, it moves to the next line. We can also use the for loop to iterate through each row of CSV for row in csvreader. Make sure that the number of columns in each row is the same, otherwise you may eventually encounter some errors when processing the list.
import csv filename = "my_data.csv" fields = [] rows = [] # Reading csv file with open(filename, 'r') as csvfile: # Creating a csv reader object csvreader = csv.reader(csvfile) # Extracting field names in the first row fields = csvreader.next() # Extracting each data row one by one for row in csvreader: rows.append(row) # Printing out the first 5 rows for row in rows[:5]: print(row)
Writing CSV in Python is just as easy. Set field names in a single list and data in a list. This time we will create a writer() object and use it to write our data to the file, which is basically the same as the method when reading.
import csv # Field names fields = ['Name', 'Goals', 'Assists', 'Shots'] # Rows of data in the csv file rows = [ ['Emily', '12', '18', '112'], ['Katie', '8', '24', '96'], ['John', '16', '9', '101'], ['Mike', '3', '14', '82']] filename = "soccer.csv" # Writing to csv file with open(filename, 'w+') as csvfile: # Creating a csv writer object csvwriter = csv.writer(csvfile) # Writing the fields csvwriter.writerow(fields) # Writing the data rows csvwriter.writerows(rows)
We can use Pandas to convert CSV into a quick single line dictionary list. After formatting the data into a dictionary list, we will use the dict to XML library to convert it to XML format. We also save it as a JSON file!
import pandas as pd from dicttoxml import dicttoxml import json # Building our dataframe data = {'Name': ['Emily', 'Katie', 'John', 'Mike'], 'Goals': [12, 8, 16, 3], 'Assists': [18, 24, 9, 14], 'Shots': [112, 96, 101, 82] } df = pd.DataFrame(data, columns=data.keys()) # Converting the dataframe to a dictionary # Then save it to file data_dict = df.to_dict(orient="records") with open('output.json', "w+") as f: json.dump(data_dict, f, indent=4) # Converting the dataframe to XML # Then save it to file xml_data = dicttoxml(data_dict).decode() with open("output.xml", "w+") as f: f.write(xml_data)
JSON data
JSON provides a concise and easy to read format that maintains a dictionary like structure. Just like CSV, Python has a built-in JSON module, which makes reading and writing very easy! When we read the CSV in the form of a dictionary, then we write the dictionary format data to the file.
import json import pandas as pd # Read the data from file # We now have a Python dictionary with open('data.json') as f: data_listofdict = json.load(f) # We can do the same thing with pandas data_df = pd.read_json('data.json', orient='records') # We can write a dictionary to JSON like so # Use 'indent' and 'sort_keys' to make the JSON # file look nice with open('new_data.json', 'w+') as json_file: json.dump(data_listofdict, json_file, indent=4, sort_keys=True) # And again the same thing with pandas export = data_df.to_json('new_data.json', orient='records')
As we have seen before, once we get the data, we can easily convert it to CSV through pandas or using the built-in Python CSV module. When converting to XML, you can use the dicttoxml library. The specific codes are as follows:
import json import pandas as pd import csv # Read the data from file # We now have a Python dictionary with open('data.json') as f: data_listofdict = json.load(f) # Writing a list of dicts to CSV keys = data_listofdict[0].keys() with open('saved_data.csv', 'wb') as output_file: dict_writer = csv.DictWriter(output_file, keys) dict_writer.writeheader() dict_writer.writerows(data_listofdict)
XML data
XML is a little different from CSV and JSON. CSV and JSON are easy to read, write and interpret because they are simple and fast. XML takes up more memory space. Transmission and storage require greater bandwidth, more storage space and longer running time. However, XML also has some additional features based on JSON and CSV: you can use namespaces to build and share structural standards, better inheritance, and industry standardized methods of data representation such as XML and DTD.
To read in XML data, we will use Python's built-in XML module and submodule ElementTree. We can use the xmltodict library to convert ElementTree objects into dictionaries. Once we have a dictionary, we can convert it to CSV, JSON or Pandas Dataframe! The specific codes are as follows:
import xml.etree.ElementTree as ET import xmltodict import json tree = ET.parse('output.xml') xml_data = tree.getroot() xmlstr = ET.tostring(xml_data, encoding='utf8', method='xml') data_dict = dict(xmltodict.parse(xmlstr)) print(data_dict) with open('new_data_2.json', 'w+') as json_file: json.dump(data_dict, json_file, indent=4, sort_keys=True)
last:
This article is to help you quickly learn the knowledge points in Python and share Python related technical articles, practical cases, tool resources, etc.
If you want to teach yourself Python, you can follow me. I'll share the pits I've stepped on with you, so that you don't step on the pits, improve your learning speed, and sort out a set of systematic learning materials. Friends in need are welcome to [private letter] ~ this set of materials covers many learning contents: development tools, basic tutorials, practical materials and e-books. I believe it can help you get twice the result with half the effort in the shortest time. It's also very good to review.