1, Brief description Boto3
- Boto3 has two API s, low-level and high-level
- Low level API: it corresponds to the HTTP interface of AWS one by one and is exposed through boto3.client("xx");
- Advanced API: it is object-oriented and exposed through boto3.resource("xxx"), which does not necessarily cover all APIs.
- Boto3 is the SDK of the whole AWS, not just S3. It can also be used to access SQS, EC2, etc.
- boto3.resource("s3") example
import boto3 s3 = boto3.resource("s3") # Create a bucket bucket = s3.create_bucket(Bucket="my-bucket") # After obtaining all buckets, boto will automatically process API page turning and other information. for bucket in s3.buckets.all(): print(bucket.name) # Filter buckets and return a bucket_iterator s3.buckets.fitler() # Generate a Bucket resource object bucket = s3.Bucket("my-bucket") bucket.name # bucket name bucket.delete() # Delete bucket # Delete some objects bucket.delete_objects( Delete={ 'Objects': [ { 'Key': 'string', 'VersionId': 'string' }, ], 'Quiet': True|False }, ) # Return results { 'Deleted': [ { 'Key': 'string', 'VersionId': 'string', 'DeleteMarker': True|False, 'DeleteMarkerVersionId': 'string' }, ], 'RequestCharged': 'requester', 'Errors': [ { 'Key': 'string', 'VersionId': 'string', 'Code': 'string', 'Message': 'string' }, ] } # Download File bucket.download_file(Key, Filename, ExtraArgs=None, Callback=None, Config=None) # Downloading to a file object may automatically start multi-threaded downloading with open('filename', 'wb') as data: bucket.download_fileobj('mykey', data) # Upload file object = bucket.put_object(Body=b"data"|file, ContentMD5="", Key="xxx") # This method will automatically start multi-threaded upload with open('filename', 'rb') as f: bucket.upload_fileobj(f, 'mykey') # List all objects bucket.objects.all() # Filter and return objects objects = bucket.objects.filter( Delimiter='string', EncodingType='url', Marker='string', MaxKeys=123, Prefix='string', RequestPayer='requester', ExpectedBucketOwner='string' ) # Create an object obj = bucket.Object("xxx") # perhaps obj = s3.Object("my-bucket", "key") obj.bucket_name obj.key # delete object obj.delete() # Download object obj.download_file(path) # Automatic multi-threaded download with open('filename', 'wb') as data: obj.download_fileobj(data) # Get file content rsp = obj.get() body = rsp["Body"].read() # File content obj.put(Body=b"xxx"|file, ContentMD5="") # Upload file obj.upload_file(filename) # Automatic multi-threaded upload obj.upload_fileobj(fileobj)
2, Low level clients
- Create clients
- Clients are created in a manner similar to resources
import boto3 # Create a low-level client with the service name sqs = boto3.client('sqs')
- Low level clients can also be accessed from existing resources
# Create the resource sqs_resource = boto3.resource('sqs') # Get the client from the resource sqs = sqs_resource.meta.client
- Service operations
The service operation is mapped to the method of the client with the same name, and provides access to the same operation parameters through keyword parameters;
# Make a call using the low-level client response = sqs.send_message(QueueUrl='...', MessageBody='...')
- As can be seen from the above, the method parameters are directly mapped to the associated SQS API;
- To make Python code look better, method names have been snake capitalized;
- Parameters must be sent as keyword parameters. They cannot be used as positional parameters.
- Handling responses
The response is returned as a python dictionary, which can traverse or otherwise process the response of the required data. The response may not always contain all the expected data;
- In the following example, response.get('QueueUrls', []) is used to ensure that the list is always returned, even if the response has no key 'QueueUrls':
# List all your queues response = sqs.list_queues() for url in response.get('QueueUrls', []): print(url)
- The response in the above example is as follows:
{ "QueueUrls" : [ "http://url1" , "http://url2" , "http://url3" ] }
- Waiters
Waiters use the client's service operation to poll the status of AWS resources and pause execution until the AWS resources reach the status that waiters are polling or fail during polling. By using the client, you can know the name of each Waiter that the client has access to:
import boto3 s3 = boto3.client('s3') sqs = boto3.client('sqs') # List all of the possible waiters for both clients print("s3 waiters:") s3.waiter_names print("sqs waiters:") sqs.waiter_names
- If the client does not have any waiters, it will access its waiters_ An empty list will be returned when using the names attribute;
s3 waiters: [u'bucket_exists', u'bucket_not_exists', u'object_exists', u'object_not_exists'] sqs waiters: []
- Use client's get_ The waiter () method can obtain a specific waiter from the list of possible waiters;
# Retrieve waiter instance that will wait till a specified # S3 bucket exists s3_bucket_exists_waiter = s3.get_waiter('bucket_exists')
- Then, to start waiting, the attendant's wait() method must be called with the appropriate parameters of the passed in method;
# Begin waiting for the S3 bucket, mybucket, to exist s3_bucket_exists_waiter.wait(Bucket='mybucket')
- Multithreading or multiprocessing with clients
Multi processing: although clients are thread safe, they cannot be shared across processes due to their network implementation. Doing so may result in incorrect response sequence when calling the service;
- Shared Metadata: the client exposes metadata to the end user through some attributes (i.e. meta, exceptions and waiter_names). These reads are safe, but any mutation should not be considered thread safe;
- Custom Botocore Events: Botocore (the library for building boto3) allows advanced users to provide their own custom event hooks that can interact with boto3 clients. Most users will not need to use these interfaces, but those who use these interfaces should no longer consider their client thread safety without careful review.
- Example
import boto3.session from concurrent.futures import ThreadPoolExecutor def do_s3_task(client, task_definition): # Put your thread-safe code here def my_workflow(): # Create a session and use it to make our client session = boto3.session.Session() s3_client = session.client('s3') # Define some work to be done, this can be anything my_tasks = [ ... ] # Dispatch work tasks with our s3_client with ThreadPoolExecutor(max_workers=8) as executor: futures = [executor.submit(do_s3_task, s3_client, task) for task in my_tasks]
3, resource
- sketch
- resource represents the object-oriented interface of Amazon Web Services (AWS);
- It provides a higher level of abstraction than the original data call made by the service client;
- To use the resource, call the resource() method of the Session and pass in the service name
# Get resources from the default session sqs = boto3.resource('sqs') s3 = boto3.resource('s3')
- Each resource instance has many properties and methods. Conceptually, it can be divided into identifiers, attributes, actions, references, sub resources and collections;
- Resources themselves can also be conceptually divided into service resources (such as SQS, s3, ec2, etc.) and single resources (such as sqs.Queue or s3.Bucket);
- The service resource does not have an identifier or property. Otherwise, they share the same components.
- Identifiers and attributes
- An identifier is a unique value used to invoke an operation on a resource. Resources must have at least one identifier, except for top-level service resources (such as sqs or s3);
- The identifier is set when the instance is created. If all necessary identifiers are not provided during instantiation, an exception will be caused.
- Identifier example:
# SQS Queue (url is an identifier) queue = sqs.Queue(url='http://...') print(queue.url) # S3 Object (bucket_name and key are identifiers) obj = s3.Object(bucket_name='boto3', key='test.py') print(obj.bucket_name) print(obj.key) # Raises exception, missing identifier: key! obj = s3.Object(bucket_name='boto3')
- The identifier can also be passed as a location parameter:
# SQS Queue queue = sqs.Queue('http://...') # S3 Object obj = s3.Object('boto3', 'test.py') # Raises exception, missing key! obj = s3.Object('boto3')
- Identifiers also play a role in resource instance equality. For two instances of a resource to be considered equal, their identifiers must be equal:
>>> bucket1 = s3.Bucket('boto3') >>> bucket2 = s3.Bucket('boto3') >>> bucket3 = s3.Bucket('some-other-bucket') >>> bucket1 == bucket2 True >>> bucket1 == bucket3 False
- Resources may also have properties, which are deferred load properties on instances. They can be set at creation based on the response to an operation on another resource, or they can be set at access or by explicitly invoking a load or reload operation.
- Attribute example:
# SQS Message message.body # S3 Object obj.last_modified obj.e_tag
- Warning:
-
Property may cause a load operation on the first access. If latency is a problem, manually invoking load will allow precise control over when to invoke the load operation (and latency). The document for each resource clearly lists its properties.
-
In addition, properties may be reloaded after an operation on a resource. For example, if the last of S3 object is loaded_ The modified attribute is then called the placement operation, then the next visit to last_ When modified, it reloads the metadata of the object.
-
- Actions
An action is a method that invokes a service. The operation may return a low-level response, a new resource instance, or a list of new resource instances. The action automatically sets the resource identifier as a parameter, but allows you to pass other parameters through keyword parameters.
- Action example:
# SQS Queue messages = queue.receive_messages() # SQS Message for message in messages: message.delete() # S3 Object obj = s3.Object(bucket_name='boto3', key='test.py') response = obj.get() data = response['Body'].read()
- Example of sending additional parameters:
# SQS Service queue = sqs.get_queue_by_name(QueueName='test') # SQS Queue queue.send_message(MessageBody='hello')
- Parameters must be passed as keyword arguments. They will not work as positional arguments.
- Sub resources
A child resource is similar to a reference, but it is a related class rather than an instance. A child resource shares an identifier with its parent resource when instantiated. This is a strict parent-child relationship. In terms of relationships, these can be considered one to many.
- Sub resource example:
# SQS queue = sqs.Queue(url='...') message = queue.Message(receipt_handle='...') print(queue.url == message.queue_url) print(message.receipt_handle) # S3 obj = bucket.Object(key='new_file.txt') print(obj.bucket_name) print(obj.key)
- Waiters
- Waiter is similar to an action. The waiter polls the state of the resource and pauses execution until the resource reaches the polling state or fails during polling. Waiters automatically sets the resource identifier as a parameter, but allows you to pass other parameters through keyword parameters.
- Waiter's examples include:
# S3: Wait for a bucket to exist. bucket.wait_until_exists() # EC2: Wait for an instance to reach the running state. instance.wait_until_running()
- Multithreading or multiprocessing with resources
- Resource instances are not safe threads and should not be shared across threads or processes. These special classes contain additional metadata that cannot be shared.
- Create a new resource for each thread or process:
import boto3 import boto3.session import threading class MyTask(threading.Thread): def run(self): # Here we create a new session per thread session = boto3.session.Session() # Next, we create a resource client using our thread's session object s3 = session.resource('s3') # Put your thread-safe code here
- In the above example, each thread has its own Boto3 session and its own S3 resource instance;
- This data can be modified because resources contain shared data when loading and invoking operations, accessing properties, or manually loading or reloading resources.
4, session
- Default session
- Boto3 acts as a proxy for the default session. This is automatically created when a low-level client or resource client is created:
import boto3 # Using the default session sqs = boto3.client('sqs') s3 = boto3.resource('s3')
- Custom session
- You can manage your own sessions and create low-level clients or resource clients from them:
import boto3 import boto3.session # Create your own session my_session = boto3.session.Session() # Now we can create low-level clients or resource clients from our custom session sqs = my_session.client('sqs') s3 = my_session.resource('s3')
- Session configurations
Configure each session with specific credentials, AWS area information or configuration files;
- The most common configurations are:
- aws_access_key_id - the specific AWS access key ID.
- aws_secret_access_key - specific AWS secret access key.
- region_name - the AWS zone in which you want to create a new connection.
- profile_name - the profile used when creating the session.
- Set the profile only if the session requires a specific profile_name parameter. To use the default profile, do not set the profile at all_ Name parameter. If profile is not set_ Name parameter and there is no default configuration file, an empty configuration dictionary will be used.
- Using sessions for multithreading or multiprocessing
- Similar to Resource objects, Session objects are not thread safe and should not be shared across threads and processes;
- Create a new Session object for each thread or process:
import boto3 import boto3.session import threading class MyTask(threading.Thread): def run(self): # Here we create a new session per thread session = boto3.session.Session() # Next, we create a resource client using our thread's session object s3 = session.resource('s3') # Put your thread-safe code here