Basic usage of Boto3 accessing S3

Posted by notsleepy on Fri, 12 Nov 2021 06:11:08 +0100

1, Brief description Boto3

  1. Boto3 has two API s, low-level and high-level
  • Low level API: it corresponds to the HTTP interface of AWS one by one and is exposed through boto3.client("xx");
  • Advanced API: it is object-oriented and exposed through boto3.resource("xxx"), which does not necessarily cover all APIs.
  1. Boto3 is the SDK of the whole AWS, not just S3. It can also be used to access SQS, EC2, etc.
  2. boto3.resource("s3") example
import boto3

s3 = boto3.resource("s3")

# Create a bucket
bucket = s3.create_bucket(Bucket="my-bucket")

# After obtaining all buckets, boto will automatically process API page turning and other information.
for bucket in s3.buckets.all():
    print(bucket.name)

# Filter buckets and return a bucket_iterator
s3.buckets.fitler()

# Generate a Bucket resource object
bucket = s3.Bucket("my-bucket")
bucket.name  # bucket name
bucket.delete()  # Delete bucket

# Delete some objects
bucket.delete_objects(
    Delete={
        'Objects': [
            {
                'Key': 'string',
                'VersionId': 'string'
            },
        ],
        'Quiet': True|False
    },
)
# Return results
{
    'Deleted': [
        {
            'Key': 'string',
            'VersionId': 'string',
            'DeleteMarker': True|False,
            'DeleteMarkerVersionId': 'string'
        },
    ],
    'RequestCharged': 'requester',
    'Errors': [
        {
            'Key': 'string',
            'VersionId': 'string',
            'Code': 'string',
            'Message': 'string'
        },
    ]
}

# Download File
bucket.download_file(Key, Filename, ExtraArgs=None, Callback=None, Config=None)

# Downloading to a file object may automatically start multi-threaded downloading
with open('filename', 'wb') as data:
    bucket.download_fileobj('mykey', data)

# Upload file
object = bucket.put_object(Body=b"data"|file, ContentMD5="", Key="xxx")

# This method will automatically start multi-threaded upload
with open('filename', 'rb') as f:
    bucket.upload_fileobj(f, 'mykey')

# List all objects
bucket.objects.all()

# Filter and return objects
objects = bucket.objects.filter(
    Delimiter='string',
    EncodingType='url',
    Marker='string',
    MaxKeys=123,
    Prefix='string',
    RequestPayer='requester',
    ExpectedBucketOwner='string'
)

# Create an object
obj = bucket.Object("xxx")
# perhaps
obj = s3.Object("my-bucket", "key")

obj.bucket_name
obj.key

# delete object
obj.delete()
# Download object
obj.download_file(path)
# Automatic multi-threaded download
with open('filename', 'wb') as data:
    obj.download_fileobj(data)
# Get file content
rsp = obj.get()
body = rsp["Body"].read()  # File content
obj.put(Body=b"xxx"|file, ContentMD5="")

# Upload file
obj.upload_file(filename)
# Automatic multi-threaded upload
obj.upload_fileobj(fileobj)

2, Low level clients

  1. Create clients
  • Clients are created in a manner similar to resources
import boto3

# Create a low-level client with the service name
sqs = boto3.client('sqs')
  • Low level clients can also be accessed from existing resources
# Create the resource
sqs_resource = boto3.resource('sqs')

# Get the client from the resource
sqs = sqs_resource.meta.client
  1. Service operations

The service operation is mapped to the method of the client with the same name, and provides access to the same operation parameters through keyword parameters;

# Make a call using the low-level client
response = sqs.send_message(QueueUrl='...', MessageBody='...')
  • As can be seen from the above, the method parameters are directly mapped to the associated SQS API;
  • To make Python code look better, method names have been snake capitalized;
  • Parameters must be sent as keyword parameters. They cannot be used as positional parameters.
  1. Handling responses

The response is returned as a python dictionary, which can traverse or otherwise process the response of the required data. The response may not always contain all the expected data;

  • In the following example, response.get('QueueUrls', []) is used to ensure that the list is always returned, even if the response has no key 'QueueUrls':
# List all your queues
response = sqs.list_queues()
for url in response.get('QueueUrls', []):
    print(url)
  • The response in the above example is as follows:
{ 
    "QueueUrls" :  [ 
        "http://url1" , 
        "http://url2" , 
        "http://url3" 
    ] 
}
  1. Waiters

Waiters use the client's service operation to poll the status of AWS resources and pause execution until the AWS resources reach the status that waiters are polling or fail during polling. By using the client, you can know the name of each Waiter that the client has access to:

import boto3

s3 = boto3.client('s3')
sqs = boto3.client('sqs')

# List all of the possible waiters for both clients
print("s3 waiters:")
s3.waiter_names

print("sqs waiters:")
sqs.waiter_names
  • If the client does not have any waiters, it will access its waiters_ An empty list will be returned when using the names attribute;
s3 waiters:
[u'bucket_exists', u'bucket_not_exists', u'object_exists', u'object_not_exists']
sqs waiters:
[]
  • Use client's get_ The waiter () method can obtain a specific waiter from the list of possible waiters;
# Retrieve waiter instance that will wait till a specified
# S3 bucket exists
s3_bucket_exists_waiter = s3.get_waiter('bucket_exists')
  • Then, to start waiting, the attendant's wait() method must be called with the appropriate parameters of the passed in method;
# Begin waiting for the S3 bucket, mybucket, to exist
s3_bucket_exists_waiter.wait(Bucket='mybucket')
  1. Multithreading or multiprocessing with clients

Multi processing: although clients are thread safe, they cannot be shared across processes due to their network implementation. Doing so may result in incorrect response sequence when calling the service;

  • Shared Metadata: the client exposes metadata to the end user through some attributes (i.e. meta, exceptions and waiter_names). These reads are safe, but any mutation should not be considered thread safe;
  • Custom Botocore Events: Botocore (the library for building boto3) allows advanced users to provide their own custom event hooks that can interact with boto3 clients. Most users will not need to use these interfaces, but those who use these interfaces should no longer consider their client thread safety without careful review.
  • Example
import boto3.session
from concurrent.futures import ThreadPoolExecutor

def do_s3_task(client, task_definition):
    # Put your thread-safe code here

def my_workflow():
    # Create a session and use it to make our client
    session = boto3.session.Session()
    s3_client = session.client('s3')

    # Define some work to be done, this can be anything
    my_tasks = [ ... ]

    # Dispatch work tasks with our s3_client
    with ThreadPoolExecutor(max_workers=8) as executor:
        futures = [executor.submit(do_s3_task, s3_client, task) for task in my_tasks]

3, resource

  1. sketch
  • resource represents the object-oriented interface of Amazon Web Services (AWS);
  • It provides a higher level of abstraction than the original data call made by the service client;
  • To use the resource, call the resource() method of the Session and pass in the service name
# Get resources from the default session
sqs = boto3.resource('sqs')
s3 = boto3.resource('s3')
  • Each resource instance has many properties and methods. Conceptually, it can be divided into identifiers, attributes, actions, references, sub resources and collections;
  • Resources themselves can also be conceptually divided into service resources (such as SQS, s3, ec2, etc.) and single resources (such as sqs.Queue or s3.Bucket);
  • The service resource does not have an identifier or property. Otherwise, they share the same components.
  1. Identifiers and attributes
  • An identifier is a unique value used to invoke an operation on a resource. Resources must have at least one identifier, except for top-level service resources (such as sqs or s3);
  • The identifier is set when the instance is created. If all necessary identifiers are not provided during instantiation, an exception will be caused.
  • Identifier example:
# SQS Queue (url is an identifier)
queue = sqs.Queue(url='http://...')
print(queue.url)

# S3 Object (bucket_name and key are identifiers)
obj = s3.Object(bucket_name='boto3', key='test.py')
print(obj.bucket_name)
print(obj.key)

# Raises exception, missing identifier: key!
obj = s3.Object(bucket_name='boto3')
  • The identifier can also be passed as a location parameter:
# SQS Queue
queue = sqs.Queue('http://...')

# S3 Object
obj = s3.Object('boto3', 'test.py')

# Raises exception, missing key!
obj = s3.Object('boto3')
  • Identifiers also play a role in resource instance equality. For two instances of a resource to be considered equal, their identifiers must be equal:
>>> bucket1 = s3.Bucket('boto3')
>>> bucket2 = s3.Bucket('boto3')
>>> bucket3 = s3.Bucket('some-other-bucket')

>>> bucket1 == bucket2
True
>>> bucket1 == bucket3
False
  • Resources may also have properties, which are deferred load properties on instances. They can be set at creation based on the response to an operation on another resource, or they can be set at access or by explicitly invoking a load or reload operation.
  • Attribute example:
# SQS Message
message.body

# S3 Object
obj.last_modified
obj.e_tag
  • Warning:
    • Property may cause a load operation on the first access. If latency is a problem, manually invoking load will allow precise control over when to invoke the load operation (and latency). The document for each resource clearly lists its properties.

    • In addition, properties may be reloaded after an operation on a resource. For example, if the last of S3 object is loaded_ The modified attribute is then called the placement operation, then the next visit to last_ When modified, it reloads the metadata of the object.

  1. Actions

An action is a method that invokes a service. The operation may return a low-level response, a new resource instance, or a list of new resource instances. The action automatically sets the resource identifier as a parameter, but allows you to pass other parameters through keyword parameters.

  • Action example:
# SQS Queue
messages = queue.receive_messages()

# SQS Message
for message in messages:
    message.delete()

# S3 Object
obj = s3.Object(bucket_name='boto3', key='test.py')
response = obj.get()
data = response['Body'].read()
  • Example of sending additional parameters:
# SQS Service
queue = sqs.get_queue_by_name(QueueName='test')

# SQS Queue
queue.send_message(MessageBody='hello')
  • Parameters must be passed as keyword arguments. They will not work as positional arguments.
  1. Sub resources

A child resource is similar to a reference, but it is a related class rather than an instance. A child resource shares an identifier with its parent resource when instantiated. This is a strict parent-child relationship. In terms of relationships, these can be considered one to many.

  • Sub resource example:
# SQS
queue = sqs.Queue(url='...')
message = queue.Message(receipt_handle='...')
print(queue.url == message.queue_url)
print(message.receipt_handle)

# S3
obj = bucket.Object(key='new_file.txt')
print(obj.bucket_name)
print(obj.key)
  1. Waiters
  • Waiter is similar to an action. The waiter polls the state of the resource and pauses execution until the resource reaches the polling state or fails during polling. Waiters automatically sets the resource identifier as a parameter, but allows you to pass other parameters through keyword parameters.
  • Waiter's examples include:
# S3: Wait for a bucket to exist.
bucket.wait_until_exists()

# EC2: Wait for an instance to reach the running state.
instance.wait_until_running()
  1. Multithreading or multiprocessing with resources
  • Resource instances are not safe threads and should not be shared across threads or processes. These special classes contain additional metadata that cannot be shared.
  • Create a new resource for each thread or process:
import boto3
import boto3.session
import threading

class MyTask(threading.Thread):
    def run(self):
        # Here we create a new session per thread
        session = boto3.session.Session()

        # Next, we create a resource client using our thread's session object
        s3 = session.resource('s3')

        # Put your thread-safe code here
  • In the above example, each thread has its own Boto3 session and its own S3 resource instance;
  • This data can be modified because resources contain shared data when loading and invoking operations, accessing properties, or manually loading or reloading resources.

4, session

  1. Default session
  • Boto3 acts as a proxy for the default session. This is automatically created when a low-level client or resource client is created:
import boto3

# Using the default session
sqs = boto3.client('sqs')
s3 = boto3.resource('s3')
  1. Custom session
  • You can manage your own sessions and create low-level clients or resource clients from them:
import boto3
import boto3.session

# Create your own session
my_session = boto3.session.Session()

# Now we can create low-level clients or resource clients from our custom session
sqs = my_session.client('sqs')
s3 = my_session.resource('s3')
  1. Session configurations

Configure each session with specific credentials, AWS area information or configuration files;

  • The most common configurations are:
    • aws_access_key_id - the specific AWS access key ID.
    • aws_secret_access_key - specific AWS secret access key.
    • region_name - the AWS zone in which you want to create a new connection.
    • profile_name - the profile used when creating the session.
  • Set the profile only if the session requires a specific profile_name parameter. To use the default profile, do not set the profile at all_ Name parameter. If profile is not set_ Name parameter and there is no default configuration file, an empty configuration dictionary will be used.
  1. Using sessions for multithreading or multiprocessing
  • Similar to Resource objects, Session objects are not thread safe and should not be shared across threads and processes;
  • Create a new Session object for each thread or process:
import boto3
import boto3.session
import threading

class MyTask(threading.Thread):
    def run(self):
        # Here we create a new session per thread
        session = boto3.session.Session()

        # Next, we create a resource client using our thread's session object
        s3 = session.resource('s3')

        # Put your thread-safe code here

Topics: Python AWS