2. What we will cover
● What is CouchDB?
– Access from Python though couchdbkit
– Key-value Store Functionality
– MapReduce Queries
– HTTP API
● When is CouchDB useful and when not?
– Multi-Master Replication
– Scaling up and down
● Pointers to other resources, CouchDB ecosystem
3. What we won't cover
● CouchApps – browser-based apps that are served by
CouchDB
● Detailled Security, Scaling and other operative issues
● Other functionality that didn't fit
4. Training Modes
● Code-Along
– Follow Examples, write your own code
– Small Scripts or REPL
● Learning-by-Watching
– Example Application at
https://github.com/stefankoegl/python-couchdb-examples
– Slides at
https://slideshare.net/skoegl/couch-db-pythonpyconpl2012
– Use example scripts and see what happens
– Submit Pull-Requests!
5. Contents
● Intro
– Contents
– CouchDB
– Example Application
● DB Initialization
● Key-Value Store
● Simple MapReduce Queries
● The _changes Feed
● Complex MapReduce Queries
● Replication
● Additional Features and the Couch Ecosystem
6. CouchDB
● Apache Project
● https://couchdb.apache.org/
● Current Version: 1.2
● Apache CouchDB™ is a database that uses JSON for
documents, JavaScript for MapReduce queries, and regular
HTTP for an API
7. Example Application
● Lending Database
– Stores Items that you might want to lend
– Stores when you have lent what to whom
● Stand-alone or distributed
● Small Scripts that do one task each
● Look at HTTP traffic
8. Contents
● Intro
● DB Initialization
– Setting Up CouchDB
– Installing couchdbkit
– Creating a Database
● Key-Value Store
● Simple MapReduce Queries
● The _changes Feed
● Complex MapReduce Queries
● Replication
● Additional Features and the Couch Ecosystem
9. Getting Set Up: CouchDB
● Provided by me (not valid anymore after the training)
● http://couch.skoegl.net:5984/<yourname>
● Authentication: username training, password training
● Setup your DB_URL in settings.py
● If you want to install your own
– Tutorials: https://wiki.apache.org/couchdb/Installation
– Ubuntu: https://launchpad.net/~longsleep/+archive/couchdb
– Mac, Windows: https://couchdb.apache.org/#download
10. Getting Set Up: couchdbkit
● http://couchdbkit.org/
● Python client library
# install with pip
pip install couchdbkit
# or from source
git clone git://github.com/benoitc/couchdbkit.git
cd couchdbkit
sudo python setup.py install
# and then you should be able to import
import couchdbkit
11. Contents
● Intro
● DB Initialization
– Setting Up CouchDB
– Installing couchdbkit
– Creating a Database
● Key-Value Store
● Simple MapReduce Queries
● Complex MapReduce Queries
● The _changes Feed
● Replication
● Additional Features and the Couch Ecosystem
12. Creating a Database
● What we have: a CouchDB server and its URL
eg http://127.0.0.1:5984
● What we want: a database there
eg http://127.0.0.1:5984/myname
● http://wiki.apache.org/couchdb/HTTP_database_API
16. Contents
● Intro
● DB Initialization
● Key-Value Store
– Modelling Documents
– Storing and Retrieving Documents
– Updating Documents
● Simple MapReduce Queries
● Complex MapReduce Queries
● The _changes Feed
● Replication
● Additional Features and the Couch Ecosystem
17. Key-Value Store
● Core of CouchDB
● Keys (_id): any valid JSON string
● Values (documents): any valid JSON objects
● Stored in B+-Trees
● http://guide.couchdb.org/draft/btree.html
18. Modelling a Thing
● A thing that we want to lend
– Name
– Owner
– Dynamic properties like
● Description
● Movie rating
● etc
19. Modelling a Thing
● In CouchDB documents are JSON objects
● You can store any dict
– Wrapped in couchdbkit's Document classes for convenience
● Documents can be serialized to JSON …
mydict = mydoc.to_json()
● … and deserialized from JSON
mydoc = DocClass.wrap(mydict)
20. Modelling a Thing
# models.py
from couchdbkit import Database, Document, StringProperty
class Thing(Document):
owner = StringProperty(required=True)
name = StringProperty(required=True)
db = Database(DB_URL)
Thing.set_db(db)
21. Storing a Document
● Document identified by _id
– Auto-assigned by Database (bad)
– Provided when storing the database (good)
– Think about lost responses
– couchdbkit does that for us
● couchdbkit adds property doc_type with value „Thing“
22. Internal Storage
● Database File /var/lib/couchdb/dbname.couch
● B+-Tree of _id
● Access: O(log n)
● Append-only storage
● Accessible in historic order (we'll come to that later)
25. Retrieving a Document
● Retrieve Documents by its _id
– Limited use
– Does not allow queries by other properties
# ldbgetthing.py
thing = Thing.get(thing_id)
26. Retrieving a Document
[Thu, 06 Sep 2012 19:45:30 GMT] [info] [<0.962.0>] 127.0.0.1 - -
GET /lendb/8f14ef7617b8492fdbd800f1101ebb35 200
27. Updating a Document
● Optimistic Concurrency Control
● Each Document has a revision
● Each Operation includes revision
● Operation fails if revision doesn't match
30. Contents
● Intro
● DB Initialization
● Key-Value Store
● Simple MapReduce Queries
– Create a View
– Query the View
● Complex MapReduce Queries
● The _changes Feed
● Replication
● Additional Features and the Couch Ecosystem
31. Views
● A specific „view“ on (parts of) the data in a database
● Indexed incrementally
● Query is just reading a range of a view sequentially
● Generated using MapReduce
32. MapReduce Views
● Map Function
– Called for each document
– Has to be side-effect free
– Emits zero or more intermediate key-value pairs
● Reduce Function (optional)
– Aggregates intermediate pairs
● View Results stored in B+-Tree
– Incrementally pre-computed at query-time
– Queries are just a O(log n)
33. List all Things
● Implemented as MapReduce View
● Contained in a Design Document
– Create
– Store
– Query
34. Create a Design Document
● Regular document, interpreted by the database
● Views Mapped to Filesystem by directory structure
_design/<ddoc name>/views/<view name>/{map,reduce}.js
● Written in JavaScript or Erlang
● Pluggable View Servers
– http://wiki.apache.org/couchdb/View_server
– http://packages.python.org/CouchDB/views.html
– Lisp, PHP, Ruby, Python, Clojure, Perl, etc
38. Reduced Results
● Result depends on group level
Key Value
[„stefan“, „couchguide“] 1
[„stefan“, „Polish Dictionary“] 1
[„marek“, „robot“] 1
Key Value
[„stefan“] 2
[„marek“] 1
Key Value
null 3
39. Synchronize Design Docs
● Upload the design document
● _id: _design/<ddoc name>
● couchdbkit syncs ddocs from filesystem
● We'll need this a few more times
– Put the following in its own script
– or run
$ ./ldbsyncddocs.py
45. From the Break
● Filtering by Price
– startkey = 5
– endkey = 10
● Structure: ddoc name / view name
– Logical Grouping
– Performance
46. Contents
● Intro
● DB Initialization
● Key-Value Store
● Simple MapReduce Queries
● The _changes Feed
– Accessing the _changes Feed
– Lending Objects
● Advanced MapReduce Queries
● Replication
● Additional Features and the Couch Ecosystem
47. Changes Sequence
● With every document update, a change is recorded
● local history, ordered by _seq value
● Only the latest _seq is kept
48. Changes Feed
● List of all documents, in the order they were last modified
● Possibility to
– React on changes
– Process all documents without skipping any
– Continue at some point with since parameter
● CouchDB as a distributed, persistent MQ
● http://guide.couchdb.org/draft/notifications.html
● http://wiki.apache.org/couchdb/HTTP_database_API#Changes
50. „Lending“ Objects
● Thing that is lent
● Who lent it (ie who is the owner of the thing)
● To whom it is lent
● When it was lent
● When it was returned
51. Modelling a „Lend“ Object
# models.py
class Lending(Document):
thing = StringProperty(required=True)
owner = StringProperty(required=True)
to_user = StringProperty(required=True)
lent = DateTimeProperty(default=datetime.now)
returned = DateTimeProperty()
Lending.set_db(db)
61. Contents
● Intro
● DB Initialization
● Key-Value Store
● Simple MapReduce Queries
● The _changes Feed
● Advanced MapReduce Queries
● Replication
– Setting up filters
– Find Friends and Replicate from them
– Eventual Consistency and Conflicts
● Additional Features and the Couch Ecosystem
62. Replication
● Replicate Things and their status from friends
● Don't replicate things from friends of friends
– we don't want to borrow anything from them
63. Replication
● Pull replication
– Pull documents from our friends, and store them locally
● There's also Push replication, but we won't use it
● Goes through the source's _changes feed
● Compares with local documents, updates or creates conflicts
64. Set up a Filter
● A Filter is a JavaScript function that takes
– a document
– a request object
● and returns
– true, if the document passes the filter
– false otherwise
● A filter is evaluated at the source
66. Replication
● Sync design docs to your own database!
● Find friends to borrow from
– Post your nickname and Database URL to
http://piratepad.net/pycouchpl
– Pick at least two friends
69. Replication
● Documents should be propagated into own database
● Views should contain both own and friends' things
70. Dealing with Conflicts
● Conflicts introduces by
– Replication
– „forcing“ a document update
● _rev calculated based on
– Previous _rev
– document content
● Conflict when two documents have
– The same _id
– Distinct _rev
71. Dealing with Conflicts
● Select a Winner
● Database can't do this for you
● Automatic strategy selects a (temporary) winner
– Deterministic: always the same winner on each node
– leaves them in conflict state
● View that contains all conflicts
● Resolve conflict programmatically
● http://guide.couchdb.org/draft/conflicts.html
● http://wiki.apache.org/couchdb/Replication_and_conflicts
72. Contents
● Intro
● DB Initialization
● Key-Value Store
● Simple MapReduce Queries
● The _changes Feed
● Advanced MapReduce Queries
● Replication
● Additional Features and the Couch Ecosystem
– Scaling and related Projects
– Fulltext Search
– Further Reading
73. Scaling Up / Out
● BigCouch
– Cluster of CouchDB nodes that appears as a single server
– http://bigcouch.cloudant.com/
– will be merged into CouchDB soon
● refuge
– Fully decentralized data platform based on CouchDB
– Includes fork of GeoCouch for spatial indexing
– http://refuge.io/
74. Scaling Down
● CouchDB-compatible Databases on a smaller scale
● PouchDB
– JavaScript library http://pouchdb.com/
● TouchDB
● IOS: https://github.com/couchbaselabs/TouchDB-iOS
● Android: https://github.com/couchbaselabs/TouchDB-Android
77. Further Features
● Update Handlers: JavaScript code that carries out update in
the database server
● External Processes: use CouchDB as a proxy to other
processes (eg search engines)
● Attachments: attach binary files to documents
● Update Validation: JavaScript code to validate doc updates
● CouchApps: Web-Apps served directly by CouchDB
● Bulk APIs: Several Updates in one Request
● List and Show Functions: Transforming responses before
serving them
78. Summing Up
● Apache CouchDB™ is a database that uses JSON for
documents, JavaScript for MapReduce queries, and regular
HTTP for an API
● couchdbkit is a a Python library providing access to Apache
CouchDB
79. Thanks!
Time for Questions and Discussion
Stefan Kögl
stefan@skoegl.net
@skoegl
Downloads
https://slideshare.net/skoegl/couch-db-pythonpyconpl2012
https://github.com/stefankoegl/python-couchdb-examples