performance - Python: analyze a list comprehension with dis

Question

Welcome To Ask or Share your Answers For Others

performance - Python: analyze a list comprehension with dis

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

performance - Python: analyze a list comprehension with dis

Recently, I had a discussion on SO (see it for the context) about the two following pieces of code:

res = [d.get(next((k for k in d if k in s), None), s) for s in lst]

And:

res = [next((v for k,v in d.items() if k in s), s) for s in lst]

Both iterate through strings s in a list lst and look for s in a dict d. If s is found, then the associated value is returned, else s is returned. I'm pretty sure the second piece of code is faster than the first, because (for each s) there is no lookup in the dictionary, just an iteration on the (key, value) pairs.

The question is: How to check that this is really what happens under the hood?

I tried, for the first time, the dis module, but the result was disappointing (python 3.6.3):

>>> dis.dis("[d.get(next((k for k in d if k in s), None), s) for s in lst]")
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x7f8e302039c0, file "<dis>", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_NAME                0 (lst)
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE
>>> dis.dis("[next((v for k,v in d.items() if k in s), s) for s in lst]")
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x7f8e302038a0, file "<dis>", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_NAME                0 (lst)
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

How do I get a more detailed information?

EDIT As suggested by @abarnert in the first comment, I tried to timeit both solutions. I played around with the following code:

from faker import Faker
from timeit import timeit

fake = Faker()

d = {fake.word():fake.word() for _ in range(50000)}
lst = fake.words(500000)

def f():return [d.get(next((k for k in d if k in s), None), s) for s in lst]
def g():return [next((v for k,v in d.items() if k in s), s) for s in lst]

print(timeit(f, number=1))
print(timeit(g, number=1))

assert f() == g()

Maybe I missed something but, to my surprise, the first piece of code (f) was always faster than the second (g). Hence the secondary question: does anyone have an explanation?

EDIT 2 Here are the most interesting parts of the disassembled code (with a little formatting to insert the inner loop). For f:

2           0 BUILD_LIST               0
          2 LOAD_FAST                0 (.0)
    >>    4 FOR_ITER                36 (to 42)
          6 STORE_DEREF              0 (s)
          8 LOAD_GLOBAL              0 (d)
         10 LOAD_ATTR                1 (get)
         12 LOAD_GLOBAL              2 (next)
         14 LOAD_CLOSURE             0 (s)
         16 BUILD_TUPLE              1
         18 LOAD_CONST               0 (<code object <genexpr> at 0x7ff191b1d8a0, file "test.py", line 2>)
         2           0 LOAD_FAST                0 (.0)
               >>    2 FOR_ITER                18 (to 22)
                     4 STORE_FAST               1 (k)
                     6 LOAD_FAST                1 (k)
                     8 LOAD_DEREF               0 (s)
                    10 COMPARE_OP               6 (in)
                    12 POP_JUMP_IF_FALSE        2
                    14 LOAD_FAST                1 (k)
                    16 YIELD_VALUE
                    18 POP_TOP
                    20 JUMP_ABSOLUTE            2
               >>   22 LOAD_CONST               0 (None)
                    24 RETURN_VALUE
         20 LOAD_CONST               1 ('f.<locals>.<listcomp>.<genexpr>')
         22 MAKE_FUNCTION            8
         24 LOAD_GLOBAL              0 (d)
         26 GET_ITER
         28 CALL_FUNCTION            1
         30 LOAD_CONST               2 (None)
         32 CALL_FUNCTION            2
         34 LOAD_DEREF               0 (s)
         36 CALL_FUNCTION            2
         38 LIST_APPEND              2
         40 JUMP_ABSOLUTE            4
    >>   42 RETURN_VALUE

For g:

3           0 BUILD_LIST               0
          2 LOAD_FAST                0 (.0)
    >>    4 FOR_ITER                32 (to 38)
          6 STORE_DEREF              0 (s)
          8 LOAD_GLOBAL              0 (next)
         10 LOAD_CLOSURE             0 (s)
         12 BUILD_TUPLE              1
         14 LOAD_CONST               0 (<code object <genexpr> at 0x7ff1905171e0, file "test.py", line 3>)
         3           0 LOAD_FAST                0 (.0)
               >>    2 FOR_ITER                22 (to 26)
                     4 UNPACK_SEQUENCE          2
                     6 STORE_FAST               1 (k)
                     8 STORE_FAST               2 (v)
                    10 LOAD_FAST                1 (k)
                    12 LOAD_DEREF               0 (s)
                    14 COMPARE_OP               6 (in)
                    16 POP_JUMP_IF_FALSE        2
                    18 LOAD_FAST                2 (v)
                    20 YIELD_VALUE
                    22 POP_TOP
                    24 JUMP_ABSOLUTE            2
               >>   26 LOAD_CONST               0 (None)
                    28 RETURN_VALUE
         16 LOAD_CONST               1 ('g.<locals>.<listcomp>.<genexpr>')
         18 MAKE_FUNCTION            8
         20 LOAD_GLOBAL              1 (d)
         22 LOAD_ATTR                2 (items)
         24 CALL_FUNCTION            0
         26 GET_ITER
         28 CALL_FUNCTION            1
         30 LOAD_DEREF               0 (s)
         32 CALL_FUNCTION            2
         34 LIST_APPEND              2
         36 JUMP_ABSOLUTE            4
    >>   38 RETURN_VALUE

One can see that (again as suggested by @abarnert) the inner loop of g contains some extra cost:

(hidden) the construction of the 2-uples by the iterator on d.items()
an UNPACK_SEQUENCE 2 which unpacks those 2-uples and then puts k and v on the stack
two STORE_FAST which pop k and v from the stack to store them in co_varnames.

Before it finally loads k to compare it with s as in f. This inner loop is iterated |lst|*|d| and It seems that those operations make the difference.

If this was optimized as I thought it was, the d.items() iterator would have put first k on the stack to test k in s, and then, only if k in s was true, put v on the stack for the YIELD_VALUE.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:38:10+0000

You've already got all the detailed information there is about the code that evaluates the list comprehension.

But list comprehensions are equivalent to creating and then calling a function. (This is how they have their own scope, so they don't, e.g., leak loop variables into the outer scope.) So that automatically-generated function named <listcomp> is what you really want to see the code for.

If you want to disassemble it—well, notice that LOAD_CONST 0 says it's loading a <code object <listcomp> at 0x7f8e302038a0? That's what you want. But we can't get to it, because all we did was compile a string for the sake of disassembling it, then throw the result away, so the listcomp function isn't around anymore.

But it's pretty easy to see with real code:

>>> def f():
...     return [next((v for k,v in d.items() if k in s), s) for s in lst]
>>> dis.dis(f)
  2           0 LOAD_CONST               1 (<code object <listcomp> at 0x11da9c660, file "<ipython-input-942-698335d58585>", line 2>)
              2 LOAD_CONST               2 ('f.<locals>.<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_GLOBAL              0 (lst)
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

There's that code object const again—but now it's not just a const we compiled and immediately threw away, it's part of a function we can access.

How do we access it? Well, this is documented in the inspect module docs, which probably isn't the first place you'd look. Functions have a code object in their __code__ member, code objects have a sequence of constants in their co_consts member, and we're looking for constant #1, so:

>>> dis.dis(f.__code__.co_consts[1])
  2           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                32 (to 38)
              6 STORE_DEREF              0 (s)
              8 LOAD_GLOBAL              0 (next)
             10 LOAD_CLOSURE             0 (s)
             12 BUILD_TUPLE              1
             14 LOAD_CONST               0 (<code object <genexpr> at 0x11dd20030, file "<ipython-input-942-698335d58585>", line 2>)
             16 LOAD_CONST               1 ('f.<locals>.<listcomp>.<genexpr>')
             18 MAKE_FUNCTION            8
             20 LOAD_GLOBAL              1 (d)
             22 LOAD_ATTR                2 (items)
             24 CALL_FUNCTION            0
             26 GET_ITER
             28 CALL_FUNCTION            1
             30 LOAD_DEREF               0 (s)
             32 CALL_FUNCTION            2
             34 LIST_APPEND              2
             36 JUMP_ABSOLUTE            4
        >>   38 RETURN_VALUE

Of course you have a generator expression nested inside your list comprehension, and, as you can probably guess, that's also equivalent to creating and then calling a generator function. But that generator function's code is just as easy to find (if even more tedious to type out): f.__code__.co_consts[1].co_consts[0].

Categories

performance - Python: analyze a list comprehension with dis

performance - Python: analyze a list comprehension with dis

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags