Siv Scripts

Solving Problems Using Code

Wed 21 November 2018

Patching Import Inside Function

Posted by Aly Sivji in Quick Hits   

I started my first tech job at a large corporation. We have lots of internal tools and systems that allow us to solve large-scale data problems. It's been an interesting change from startup life where platforms weren't as mature, where systems weren't as integrated into workflows and processes.

While I'm having a ton of fun building containerized data pipelines in Airflow, I still need to deal with the limitations of internal systems that were designed to support many different use cases. This part isn't so different from startup life; we all have to do things in a certain way because decisions that were made earlier hamper our efficiency and force us to use tools ahem creatively.

This is okay. Working around limitations is what programming is all about. We have to design creative solutions within the constraints of our problem space.

I was faced with such a problem at work this week.

Given the constraints of our internal platform which I will not go into here, I had to import an object within a function so as to not pollute the module's namespace. My function looked as follows:

def upload_bytes_to_azure(bytes_blob, year, month):
    from import BlockBlobService"Save to Azure Blob Storage -- start")

    # set parameters
    params = {"year": year, "month": month}
    filename = "status_{year}{month}.csv".format(**params)

    hook = BlockBlobService(account_name=account_name, account_key=account_key)
    hook.create_blob_from_bytes(container_name, blob_name=filename, blob=bytes_blob)"Save to Azure Blob Storage -- success")

While the above solution worked, it presented difficulty when I tried to test it. The Python community prefers monkey patching over dependency injection so this is the approach I started with.

I have an import inside of a function. Where does this object exist? How do I patch inside of a function? Tried introspecting the module, the function, but was unable to access the namespace inside of the function itself.

A coworker suggested monkey patching sys.modules.


What is sys.modules? Let's open up a REPL:

$ ipython
Python 3.7.0 (default, Oct 14 2018, 00:25:16)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.0.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import sys

In [2]: print(sys.modules)
{'sys': <module 'sys' (built-in)>, 'builtins': <module 'builtins' (built-in)>, '_frozen_importlib': <module 'importlib._bootstrap' (frozen)>, '_imp': <module '_imp' (built-in)>, '_thread': <module '_thread' (built-in)>, '_warnings': <module '_warnings' (built-in)>, '_weakref': <module '_weakref' (built-in)>, 'zipimport': <module 'zipimport' (built-in)>, '_frozen_importlib_external': <module 'importlib._bootstrap_external' (frozen)>, '_io': <module 'io' (built-in)>, 'marshal': <module 'marshal' (built-in)>, 'posix': <module 'posix' (built-in)>, 'encodings': <module 'encodings' from '/Users/alysivji/.pyenv/versions/3.7.0/lib/python3.7/encodings/'>, 'codecs': <module 'codecs' from '/Users/alysivji/.pyenv/versions/3.7.0/lib/python3.7/'>, '_codecs': <module '_codecs' (built-in)>, 'encodings.aliases': <module 'encodings.aliases' from '/Users/alysivji/.pyenv/versions/3.7.0/lib/python3.7/encodings/'>, 'encodings.utf_8': <module 'encodings.utf_8' from '/Users/alysivji/.pyenv/versions/3.7.0/lib/python3.7/encodings/'>, '_signal': <module '_signal' (built-in)>, '__main__': <module '__main__'>, 'encodings.latin_1': <module 'encodings.latin_1' from '/Users/alysivji/.pyenv/versions/3.7.0/lib/python3.7/encodings/'>, 'io': <module 'io' from '/Users/alysivji/.pyenv/versions/3.7.0/lib/python3.7/'>, 'abc': <module 'abc' from '/Users/alysivji/.pyenv/versions/3.7.0/lib/python3.7/'>, '_abc': <module '_abc' (built-in)>, 'site': <module 'site' from '/Users/alysivji/.pyenv/versions/3.7.0/lib/python3.7/'>, 'os': <module 'os' from '/Users/alysivji/.pyenv/versions/3.7.0/lib/python3.7/'>, 'stat': <module 'stat' from '/Users/alysivji/.pyenv/versions/3.7.0/lib/python3.7/'>, '_stat': <module '_stat' (built-in)>,

... output continues ...

So sys.modules is a python dict where the key is the module name and the value is the module object. When we import something into our Python runtime, we pull it from sys.modules. Patching the sys.modules dictionary with a modified dict will allow us to patch modules to make our tests deterministic.

This is a handy Python trick. [puts on David Beazley hat] Also an interesting way to import packages that we have not installed into site-packages.

Patching import inside of a function

Using patch.dict, I overwrote the module with a mock object that I created and defined, this is what makes the test deterministic.

def patched_azure(mocker):
    blob_service_mock = mocker.MagicMock(name="blob_service_mock")
    create_blob_mock = blob_service_mock.create_blob_from_bytes

    _module = mocker.MagicMock(name="azure_mock")
    _module.BlockBlobService.return_value = blob_service_mock
    mocker.patch.dict("sys.modules", {"": _module})

    yield create_blob_mock

Our test case can be written as follows:

def test_dag(patched_adapter, patched_azure):
    # Arrange
    blob = b"data"

    # Act

    # Assert
    name, args, kwargs = patched_azure.mock_calls[0]
    assert patched_azure.call_count == 1
    assert kwargs["blob"] == blob

Note: I did not use the @patch decorator in the example above. Stacking decorators to inject parameters as function arguments is not a Python pattern I like using. With pytest, we can create fixtures that can be inserted into our test case in any order. This pattern enables us to separate each patch into its own function resulting in tests that are easier to write, and more importantly, easier to read. This pattern also enables us to create complex test objects using composition.

Dependency Injection

Patching sys.modules is a bit too much for my delicate sensibilities. While the pattern works, it's not something I enjoy doing. Enter dependency injection.

Dependency Injection (DI) is a technique in which we pass dependencies into functions that require them. DI is widely used in the Java ecosystem; it's also how I learned to test so I find myself using it when monkey patching is awkward. While DI requires additional boilerplate, it forces us to think about our interface in more depth. I think the tradeoff is worth it.

We'll need to modify the above example to inject our Azure storage blob dependency into upload_bytes_to_azure:

def upload_bytes_to_azure(azure_blob, bytes_blob, year, month):"Save to Azure Blob Storage -- start")

    # set parameters
    params = {"year": year, "month": month}
    filename = "status_{year}{month}.csv".format(**params)

    hook = azure_blob.BlockBlobService(account_name=account_name, account_key=account_key)
    hook.create_blob_from_bytes(container_name, blob_name=filename, blob=bytes_blob)"Save to Azure Blob Storage -- success")

At runtime, the function requires an Azure instance as a function parameter (azure_blob). For testing purposes, we can pass in a test double without having to monkey patch. Dependency injection makes testing easy!

This is not to say that dependency injection is perfect. Every time we need to run a function, we have to pass in all of its requirements which leads to additional complexity. In the Java world, there are dependency injection frameworks built for this purpose. It's all about tradeoffs. Do you feel comfortable with the approach you took?

Aside: I recently saw a talk by Brandon Rhodes about Clean Architecture in Python. This design methodology eschews both monkey patching and dependency injection. By focusing on how data moves in our system, we can change how we structure our code. I need to spend more time thinking about Clean Architecture. Might even need to read Uncle Bob's book. Brandon's talk has changed how I approach design, development, and testing. Highly recommend a watch.