Functional Programming and Django QuerySets

LA Django Meetup

July 5th, 2016

tiny.cc/jtaqs4

@johntellsall

Note

  • Senior dev/server guy; DevOps
  • 20 years experience with Python
  • john@johntellsall.com
  • first PyCon I went to had 40 people!

yes we're hiring

_images/theblacktux-site.png

Ideas

iterator/generator = "stream"

Functional Programming: programming with composition

QuerySet is a stream

Note

< https://en.wikipedia.org/wiki/Function_composition_%28computer_science%29 >`

combine simple functions to build more complicated ones

there will be code

_images/therewillbecodesuccesskid.jpg

Iterators

Iterator review

An iterator is a stream of data—sort of a restricted, compact list or cursor.

>>> list([1,2])
[1, 2]
>>> iter([1,2])
<listiterator object at 0x7f429d83c750>

Note

mdash: http://docutils.sourceforge.net/FAQ.html

iterators have a item and next and that's it - Preferred, because they take almost no space

You already use iterators

iterate across a stream of strings

>>> f = open('recipe.ini')
>>> for line in f:
    print line
# very tasty
[Old Fashioned]
1:1.5 oz whiskey
2:1 tsp water
3:0.5 tsp sugar
4:2 dash bitters

Note

you already use iterators

Ex: Database iterator

Lists/Iterators are very similar

for line in open('ing.txt'):
    print line

for num in iter([2,4,6,8]):
    print num

for num in [2,4,6,8]:
    print num

for name in glob.iglob('*.txt'):
    print name

Why use Iterators over Lists?

What can you do with a iterator?

>>> f = open('ing.txt')
>>> f.next()
'# Old Fashioned\n'
>>> f.next()
'1.5 oz whiskey\n'

What happens at the end?

>>> f = open('/dev/null')
>>> f.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> iter([]).next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

What can you not do with an iterator?

no slicing

>>> f = open('ing.txt')
>>> f[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'file' object has no attribute '__getitem__'

What can you not do with an iterator?

no length

>>> f = open('ing.txt')
>>> len(f)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'file' has no len()

List vs Iterator

feature list iterator
overall eager lazy
memory high low
len(x) yes no
slice x[:3] islice(x, 3)
addition x + y chain(x, y)
has items if x no
easy debug yes no

Note: Python 2/3 are quite different

Note

List are "eager" -- know everything about them all the time

Million item list can be rough, because they hold all million - have to deal with all items

Million item iter is no biggie, can proc a few

Common iterator functions

very important:

Note

tradeoff readability vs conciseness

Functional Programming

que?

_images/jimmy-2.jpg

Practical Advantages to FP

  • Modularity
  • Composability!
  • Ease of debugging and testing
  • Parallelization
  • Buzzwordy!

What is Functional Programming

functions that operate on functions

>>> def is_odd(num):
    return num % 2
>>> filter(is_odd, [1, 2, 3])
[1, 3]

programming paradigms

procedural

list of instructions

can modify caller's state

object oriented

object has state and functions to query/modify state

specialize by subclassing

FP vs Procedural programming

procedural: list of instructions

def upfile(inpath, outpath):
    with open(outpath, 'w') as outf:
        for line in open(inpath):
            outf.write( line.upper() )

upfile('ing.txt', '/dev/stdout')

Note

  • how can you test this?
  • run in parallel?

Note

[Many] Languages are procedural: programs are lists of instructions that tell the computer what to do with the program’s input.

FP vs Object Orientation

object oriented: Object has state and specific functions to query/modify state. Easy to specialize by subclassing.

class RWFile(list):
    def __init__(self, inpath):
        super(Upcase, self).__init__(open(path))
    def transform(self, line):
        return line
    def writelines(self, outpath):
        with open(outpath, 'w') as outf:
            for line in self:
                outf.write( self.transform(line) )

class UpFile(RWFile):
    def transform(self, line):
        return line.upper()

UpFile('recipe.ini').writelines('/dev/stdout')

Note

Object-oriented programs manipulate collections of objects. Objects have internal state and support methods that query or modify this internal state in some way. Smalltalk and Java are object-oriented languages. C++ and Python are languages that support object-oriented programming, but don’t force the use of object-oriented features. ["Object obsessive"]

Functional Programming

procedural
list of instructions
object oriented
object has state and functions to query/modify state specialize by subclassing
functional

functions operate on streams of objects

combine simple functions => complicated

preferably without internal state

food chain

_images/FoodChain.jpg

UpFile example in Functional Programming

using a generator expression

open('out.txt', 'w').writelines(
    line.upper() for line in open('in.txt')
)

Note

seed, then transforms recombine elements, vs specialize

UpFile in FP: map

specialize with named function and map

def upcase(line):
    return line.upper()

open('out.txt', 'w').writelines(
    map(upcase, open('in.txt'))
)

map-filter

map(func, iter) -- transform items using a function

def square(num):
    return num ** 2
>>> map(square, [1,2,3])
[1, 4, 9]

Note

function applies a passed-in function to each item in an iterable object and returns a list containing all the function call results.

map-filter

filter(func, iter) -- provide items matching a function

def is_odd(num):
    return num % 2
>>> filter(is_odd, [1, 2, 3])
[1, 3]

Note

The filter filters out items based on a test function which is a filter and apply functions to pairs of item and running result which is reduce.

Functional Programming examples

Example: Windows INI-file parser; aka ConfigParser

  1. stream of lines
  2. stream of valid lines (no comments, has key-value)
  3. stream of key-value match objects
  4. dictionary
  5. TBD: dict of dictionaries

parse1.py

# 1. stream of lines
import fileinput
lines = fileinput.input()
print ''.join( lines )
# very tasty
[Old Fashioned]
1:1.5 oz whiskey
2:1 tsp water
3:0.5 tsp sugar
4:2 dash bitters

parse2.py

# 2. stream of valid lines
import fileinput
from itertools import *
def has_comment(line):
    return line.startswith('#')
def has_keyvalue(line):
    return ':' in line
lines = ifilterfalse( has_comment, fileinput.input() )
lines = ifilter( has_keyvalue, lines )
print ''.join( lines )
1:1.5 oz whiskey
2:1 tsp water
3:0.5 tsp sugar
4:2 dash bitters

parse3.py

# 3. stream of key-value match objects
import fileinput, re
from itertools import *
def has_comment(line):
    return line.startswith('#')
def parse_keyvalue(line):
    m = re.match(r'(\S+):(.+)', line)
    if m:
        return m.groups()
    return None
matches = (parse_keyvalue(line) for line in fileinput.input())
keyvalues = ifilter(None, matches)
print '\n'.join( (str(kv) for kv in keyvalues) )
('1', '1.5 oz whiskey')
('2', '1 tsp water')
('3', '0.5 tsp sugar')
('4', '2 dash bitters')

parse4.py

# 4. dictionary
import fileinput, re
from itertools import *
def has_comment(line):
    return line.startswith('#')
def parse_keyvalue(line):
    m = re.match(r'(\S+):(.+)', line)
    if m:
        return m.groups()
    return None
lines = ifilterfalse(has_comment, fileinput.input())
matches = (parse_keyvalue(line) for line in lines)
keyvalues = ifilter(None, matches)
confdict = dict(keyvalues)
print confdict
{'1': '1.5 oz whiskey', '3': '0.5 tsp sugar', '2': '1 tsp water', '4': '2 dash bitters'}

itertools

  • chain()
  • compress()
  • count()
  • cycle()
  • dropwhile()
  • groupby()
  • ifilter()
  • ifilterfalse()
  • imap()
  • islice()
  • izip()
  • izip_longest()
  • repeat()
  • starmap()
  • takewhile()
  • tee()

chain -- only for iterators

chain(iter*) gives elements of each stream in order Equivalent to + for lists.

>>> [1,2] + [3]
[1, 2, 3]
>>> from itertools import *
>>> chain(iter([1,2]), iter([3]))
<itertools.chain object at 0x7f429d848510>
>>> list(_)
[1, 2, 3]

Note

stream of objects with state lazy vs eager

islice -- similar to list

islice(iter, num) -- return first few items

>>> list([1, 2, 3])[:2]
[1,2]
>>> iter([1, 2, 3])[:2]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'listiterator' object has no attribute '__getitem__'
>>> itertools.islice(iter([1, 2, 3]), 2)
<itertools.islice object at 0x7f429d7de9f0>
>>> list(_)
[1, 2]

Toolz example

>>> def stem(word):
...     """ Stem word to primitive form """
...     return word.lower().rstrip(",.!:;'-\"").lstrip("'\"")
>>> from toolz import compose, frequencies, partial
>>> from toolz.curried import map
>>> wordcount = compose(frequencies, map(stem), str.split)
>>> sentence = "This cat jumped over this other cat!"
>>> wordcount(sentence)
{'this': 2, 'cat': 2, 'jumped': 1, 'over': 1, 'other': 1}

Functional Programming & Python resources

Django QuerySets

represents a stream of rows from the database

Note

models.py

source: http://blog.etianen.com/blog/2013/06/08/django-querysets/

QuerySets are Django's way of getting and updating data

>>> from django.db import models
class Meeting(models.Model):
name = models.CharField(max_length=100)
meet_date = models.DateTimeField()

QuerySet review

>>> m = Meeting.objects.get(id=12)
<Meeting: Meeting object>
>>> vars( Meeting.objects.get(id=12) )
{'meet_date': datetime.datetime(2016, 7, 5, 7, 0, tzinfo=<UTC>),
'_state': <django.db.models.base.ModelState object at 0x2bd1050>,
'id': 3, 'name': u'LA Django Monthly Meeting'}
>>> x = Meeting.objects.filter(name__icontains='go')
>>> for a in x: print a.name
LA Django Monthly Meeting

QuerySet and iterators

QuerySets can be shifty

>>> x = Meeting.objects.filter(name='java')
>>> x
[]
>>> type(x)
<class 'django.db.models.query.QuerySet'>

Functional QuerySets

How can you tell if a list is empty or not?

  • an iterator?
  • a QuerySet?

Empty List?

Note

How can you tell if a list is empty or not?

A: Empty List

>>> bool([])
False
>>> bool(['beer'])
True

Note

Lists are eager -- always know everything

Empty Iterator?

Note

How can you tell if an iterator is empty or not?

A: Empty Iterator

>>> x=iter([1,2])
>>> bool(x)
True
>>> x=iter([])
>>> bool(x)
True

Note

Iterators are lazy -- don't know what they contain!

How can you tell if a QuerySet is empty or not?

QuerySet like Iterator

filter with QuerySet:

>>> from meetup.models import *
>>> Meeting.objects.filter(id=1)
[<Meeting: Meeting object>]

filter with list:

>>> filter(lambda d: d['id']==1, [{'id':1}, {'id':2}])
[{'id': 1}]

filter with iterator:

>>> list(ifilter(lambda d: d['id']==1, iter([{'id':1}, {'id':2}])))
[{'id': 1}]

Because QuerySet is an iterator

>>> from meetup.models import *
>>> Meeting.objects.filter(id=1)
[<Meeting: Meeting object>]
>>> type(Meeting.objects.filter(id=1))
<class 'django.db.models.query.QuerySet'>

Note

similar to iter: dynamic/lazy; list(qs)

diff: stream of objs, same class qs[:3] <=> islice(it, 3) bool(iter) vs qs.empty()

>>> a=iter([])
>>> bool(a)
True
>>> a=[] ; bool(a)
False

qs.count()

laziness is explicit: prefetch_related

qs.values(); qs.values_list(); qs.values-list(flat=True)

Can mix/match QS/iterators...

>>> Meeting.objects.all()[0].id
1
>>> islice( Meeting.objects.all(), 1).next().id
1
>>> from itertools import *
>>> islice( Meeting.objects.all(), 1)
<itertools.islice object at 0x2bb9ec0>
>>> list(islice( Meeting.objects.all(), 1))
[<Meeting: Meeting object>]

...but not always

How can you tell if a QuerySet is empty or not?

Answer

Use x.exists(), not bool(x) -- more efficient

Note

https://docs.djangoproject.com/en/1.9/ref/models/querysets/#exists

Both iterators and QuerySets are lazy

In functional programming, we have functions which operate on infinite-length streams.

With QuerySets, it's assumed we have many thousands of results, but we don't want to fetch all of them at once before returning to caller.

Database (and Django) does a query, then gives us a few items. Once that batch is done, QuerySet will ask the database for another batch of results.

This means that for both iterators and query sets, we can do a little work, then process a batch, without waiting for the entire list of results.

Proof: str(queryset.query) = SQL

>>> p=SourceLine.objects.filter(project='redis')

>>> str(p.query)

'SELECT "app_sourceline"."id", ... FROM "app_sourceline"
WHERE "app_sourceline"."project" = redis'

doesn't work for exists()!

>>> p.exists()
True

>>> str(p.exists().query)
AttributeError: 'bool' object has no attribute 'query'

Django SQL history

>>> p=SourceLine.objects.filter(project='redis').exists()
123

>>> from django.db import connection ; connection.queries[-1]

u'QUERY = u\'SELECT (1) AS "a" FROM "app_sourceline"
WHERE "app_sourceline"."project" = %s LIMIT 1\'
- PARAMS = (u\'redis',)'

Ideas

iterator/generator = "stream"

FP: functions operate on streams of immutable objects

QuerySet is a stream

Note

programming with composition

Questions?

Thanks

Elie Ceberio @eceberiotalener

Marcel Chastain @MarcelChastain @LADjango

Goz Inyama @notwitter

References

Can Your Programming Language Do This? by Joel Spolsky

http://www.joelonsoftware.com/items/2006/08/01.html

Wikipedia: Functional Programming

http://en.wikipedia.org/wiki/Functional_programming

Functional Programming HOWTO by Andy Kuchling

https://docs.python.org/2/howto/functional.html

Using Django querysets effectively by Dave Hall

(best blog title ever)

http://blog.etianen.com/blog/2013/06/08/django-querysets/