Hello individuals! Lately, I used to be tremendous confused once I discovered that:
runs slower than:
I attempted to think about an acceptable reply however couldn’t discover any. I timed the execution of each of those statements utilizing the timeit module in Python3:
$ python3 -m timeit 'pow(3,89)' 500000 loops, better of 5: 688 nsec per loop $ python3 -m timeit '3**89' 500000 loops, better of 5: 519 nsec per loop
The distinction just isn’t massive. It’s only 0.1µs nevertheless it was nonetheless bugging me. If I can’t clarify one thing in programming, I normally find yourself having sleepless nights 😅
I discovered the reply via the Python IRC channel on Freenode. The explanation why pow is barely slower is that in CPython there’s an extra step of loading pow from the namespace. Whereas, in 3**9 there isn’t any such loading required. This additionally implies that the distinction will stay kind of fixed even when the enter numbers get greater and larger.
The speculation is true:
$ python3 -m timeit 'pow(3,9999)' 5000 loops, better of 5: 58.5 usec per loop $ python3 -m timeit '3**9999' 5000 loops, better of 5: 57.3 usec per loop
In the course of the technique of exploring the answer to this query I additionally bought to study in regards to the dis module. It permits you to decompile the Python Bytecode and examine it. This was a brilliant thrilling discovery primarily as a result of I’m studying extra about reverse engineering binaries now-a-days and this module suits proper in.
I disassembled the bytecode of the above statements like this in Python:
>>> import dis >>> dis.dis('pow(3,89)') # 1 0 LOAD_NAME 0 (pow) # 2 LOAD_CONST 0 (3) # 4 LOAD_CONST 1 (89) # 6 CALL_FUNCTION 2 # 8 RETURN_VALUE >>> dis.dis('3**64') # 1 0 LOAD_CONST 0 (3433683820292512484657849089281) # 2 RETURN_VALUE >>> dis.dis('3**65') # 1 0 LOAD_CONST 0 (3) # 2 LOAD_CONST 1 (65) # 4 BINARY_POWER # 6 RETURN_VALUE
You may find out about find out how to perceive the output of dis.dis by studying this reply on Stackoverflow
Okay again to the code. The disassembly of pow is sensible. It’s loading pow from the namespace after which loading 3 and 89 to registers and eventually calling the pow operate. However why does the output of the subsequent two disassemblies differ from one another? The one factor we modified is the exponent worth from 64 to 65!
This query launched me to a different new idea of “fixed folding“. It simply implies that when we’ve got a continuing expression Python evaluates the worth of that expression at compile time in order that whenever you truly run this system it doesn’t take as lengthy to run as a result of Python makes use of the already computed worth. Consider it like this:
def one_plue_one(): return 1+1 # --vs-- def one_plue_one(): return 2
Python compiles the primary operate to the second and makes use of that whenever you run the code. Neat, eh?
So why is fixed folding working for 3**64 and never 3**65? Nicely, I don’t know. This most likely has one thing to do with a restrict to what number of powers the system has pre-computed in reminiscence. I will be completely incorrect. The subsequent step in my thoughts is to dabble within the Python supply code in my free time and check out to determine what is occurring. I’m nonetheless attempting to determine a solution for this so you probably have any options please drop them within the feedback part beneath.
What I would like you to remove from this publish is that you must search options to fundamental questions. You by no means know the place the solutions will lead you. You may find yourself studying one thing solely new (like I simply did)! I hope you guys preserve this flame of curiosity burning. So long 🙂
PS: If you wish to study extra about Python Bytecode and find out how to mess around with it, somebody (njs on #python) instructed me this discuss. I personally haven’t watched it however I most likely will as soon as I get some free time.