It is a little known fact, that docker was presented the first time to the world in a demo by Solomon Hykes at the PyCon US in Santa Clara in 2013. Furthermore, docker-compose (formerly known as Fig) is entirely written in python. This may lead to the impression, that docker and python is pure love and a match made in heaven. While for some use-cases this is true. But local development of python code inside a docker container is surprisingly broken, at least if you want to do it right. In this talk I will walk you through the proper setup of a local python development environment using docker. Including sane packaging, testing, automated setup in version control, IDE integration, docker-compose and more. I will share with you all the tiny surprises I encountered and you might have stumbled over yourself already too. As a spoiler, I will not be able to give nice solutions to all of them. But it might help you, to not feel stupid the next time you face one of them: Good news, probably it’s not you!
10. @sebineubauer
Docker *is* a virtual machine (on Mac & Windows!!!)
10
Docker on Mac is starting a special
VM which runs docker inside
Bottleneck is the mounted
filesystem inside the docker VM
11. @sebineubauer
Docker on Mac
11
Only mount into the VM what is needed (configurable in Docker for Mac)
Use as little volumes as possible, only mount what is needed in the containers
Use the docker mount flags for volumes:
- „delegated“: lowest consistency guarantees, but „fastest“, host lags behind
- „cached“: stronger guarantees as „delegated“, but not as fast, container lags behind
Use a different approach than mounting:
- docker-sync
- Rsync
12.
13. @sebineubauer
Virtual environment inside docker?
13
Better use virtual environments:
- Even in docker there is a full linux, the system python is for the OS:
- Python version is coupled to the OS
- Easy to break things, e.g. by installing a new system python version
- The system python installation (paths, names of binaries,..) might differ between linux distributions
- Isolate your application dependencies from the global dependencies:
- Easy to mess up the installed system python packages
- Easy to end up in strange bugs mixing system and application dependencies
Conda is an interesting alternative:
- A package manager + distribution + environment manager
- Keep track of non-python dependencies (e.g. underlying c-libraries used by numpy, scipy, etc...)
- Admittedly, it feels even more crazy, but it JustWorks™ :)
14. @sebineubauer
Vitualenv + Docker
14
Do not put the virtual environment inside a mounted volume!
- Put it e.g. in /tmp/venv
- It will speed-up a lot (cargo cult)
- Drawback might be, with every `docker run` you need to recreate the virtualenv
15. @sebineubauer
Vitualenv + Docker
15
Mount the pip cache into the container
- This way, you don‘t need to download all packages again and again
- Slowness of the mount does not hurt that much, as it is only accessed once for the install
- A bit tricky to get it working:
- To get it working you need to run once „chown“ to the correct UID and GID of the docker user (thanks @fjetter)
16. @sebineubauer
Vitualenv + Docker
16
Create the virtualenv already in the docker build
- This way it is baked inside the image and does not need to re-created again and again
- Drawback is, that you need to recreate the image for every new dependency
- Dependencies will easily out-date if not recreated regularily
ProTip:
- https://stackoverflow.com/questions/25305788/how-to-avoid-reinstalling-packages-when-building-docker-image-for-python-project
19. @sebineubauer
Docker vs. Docker-Compose
Docker-compose can make sense
even for single containers
The simple and declarative syntax
of the docker-compose.yaml
makes it easy to add additional
configurations on top of „raw
images“:
- Volumes
- Network
- Environment variables
It‘s „command line options in git“
23. @sebineubauer
Problem: pip install --editable
We need the editable flag to directly execute our latest code changes without
reinstalling
‘pip --editable‘ always creates a package.egg_info directory in the package root
We want to mount our source folder in the docker container, because our IDE
runs on the host
Problem:
- Either the egg_info directory vanishes by mounting the host folder (broken setup!)
- Or we need to delay the installation after the image is built and the volume is mounte:
- Prolongs startup time
- Need to build „tricky“ automation or do it manually
- In the host folder suddenly a “egg_info“ folder pops up (only a aestethic problem)
- Or we use legacy `python setup.py develop` and set PYTHONPATH manually
27. @sebineubauer
Problem: Python cache files
Python bytecode files (*.pyc) are stored next to the source files
But we mount the source files from the host
We might execute the code also on the host, or mount the same source folder in
other docker containers, with different OS, or python versions
32. @sebineubauer
How to improve the situation?
Awareness: No, it‘s not you! There are some really annoying dark corners.
Contribute: Yes, the threads are long and disappointing...But maybe we have to
jump in!
Share: Talk about what you know, write blog posts, etc... It helps a lot!
Zen: There should be one obvious way to do it!
Code: Even if the problems remain, we might be able to relieve the pain a bit
https://github.com/sebastianneubauer/dockerize-python-scaffold
Editor's Notes
Solomon Hykes
Pycon US in Santa Clara in 2013 (first pydata berlin was 2014)
Lightning talk 5 minutes (who attended lightning talk yesterday?)
Remarkable live demo: linux containers, but easy
most famous hello world typo so far
should go to a museum
one of the first decisions to make
Fun fact: docker compose written entirely in python
A serious warning for the end: Don‘t build a snowflake!!!
Why is all this so important: often the first contact to a project is a easy to access local dev environment using docker