import_that: XKCD guy flying with Python (Default)
2015-05-09 12:31 pm

(no subject)

> So what is the difference between a python name and a pointer? I'm a bit
> fuzzy on that.

In one sense -- not a very helpful sense, I'll grant, but it is a sense
-- both names and pointers are just examples of the generic concept of
"a reference to a thing", so they are the same thing.

In another sense -- also not very helpful -- they are completely
different and utterly unrelated.

The truth is somewhere in between, but takes some time to explain.

We need a way to *refer* to actual concrete things (or values, if you
like) in a generic fashion. If I want you to move your car, I can't
always drag you by the hand to the car, bang your head against it a few
times, and say "move it". I need a way to refer to the car indirectly,
without access to the object. Saying "this particular object made of
metal and plastic and glass, with four tyres, an engine, blah blah blah"
is a bit wordy, so I refer to it using a name or descriptive phrase
("the car") or a location ("that one over there, blocking my driveway").
Both names and locations are specific kinds of "references", as they
refer to a value in a generic fashion.

So it goes in programming languages: there are two ways of refering to
values:

- the value currently known as 'the_car'

- the value at location 123456

If you are old enough, you may remember programmable calculators where
the only variables you had were numbered, not named, so you had to do
things like "STO 4" and "RCL 3" to store and recall values from a
register. But while computers are really good at dealing with numbered
registers like that, people aren't, so programming languages usually
give locations *names* to make them easier to refer to:

- the value at location 'the_car'

The compiler then has an internal table mapping name to location:

the_car --> 123456
myint --> 890002
mystr --> 557018

etc. And, in principle at least, the names are not needed when the
program actually runs: the compiler may keep them, for error reporting
and debugging, or it may throw them away.

With such memory location languages, the concept of a pointer is simple:
it's just a single numeric redirection:

the_car_ptr --> 468108 # the location of the pointer itself

where the *contents* of location 468108 are the address of the_car,
namely 123456.

In contrast, the first version needs to have some sort of table mapping
names to *values*, not locations:

the_car --> red Ford Pinto with a dint
myint --> 99

and those names must be available at runtime. But there is no concept of
a *pointer*, since there are no locations to point to.[1]

So you can see the similarity between *memory location* references and
*name binding* references, as well as the differences.

Another important difference is that when you are working with
locations, one needs to know how much space to leave for each value
(lest one value accidentally overwrites part of another value), hence
languages that work with memory locations typically have static types.
The compiler knows that only car objects can appear at location
'the_car' (a.k.a. 123456), and it knows that car objects are always 30
bytes in size, so it can reserve spots 123456 to 123485 for a car
object.

Hence memory location languages like C, Pascal, Haskell and similar
tend to have static types known at compile time, and you often have to

declare the type of every variable.

(More modern languages like Haskell can use *type inference*, a form of
artificial intelligence: if it sees that you add 1 to x, the compiler
can infer that x must be a number, since only numbers support adding 1.)

But, name binding languages don't care how big the value is, since they
don't care what location it is. So name binding languages can use
dynamic types, and names can refer to any kind of value at all: first it
is an int, then it's a string, now it's a float.

In practice, due to the way our computer hardware works, all programming
languages *fundamentally* must end up dealing with memory locations,
because that's the only way the hardware works. In principle, though,
that is not necessarily the case. A sufficiently dedicated and clever
engineer could build a computer out of clockwork, or water flowing
through pipes, or using a giant simulation of Conway's Game Of Life
(look it up). Or run it inside the human brain.

But ignoring these cases and let us stick to electronic computers with
memory chips that you can buy today. Our Python compiler needs somehow
to turn that name binding:

the_car --> red Ford Pinto with a dint

into an actual memory location ("where the hell is that car object?") in
order to do calculations with it. But that is abstracted away, as part
of the language implementation, not part of the language itself.

In Python's case, that implementation may use C pointers. When Python
tries to resolve the name "the_car", the first thing it does is *search*
the current namespace. A successful search returns the value (the red
Pinto) which in terms of the implementation means a pointer to the
memory location where the value is found. Under the hood of the Python
compiler/interpreter, it uses pointers.

What's a namespace? It's just a data structure which the interpreter
knows how to search. In Python, that is typically a dictionary, and
searching dictionaries isn't that time consuming, but it isn't
instantaneous either. Python can pull a few optimizations to make it

faster, e.g. inside functions, it actually uses something remarkably
similar to C's memory location model, or perhaps "named registers" may
be a better description. But that's an implementation detail and is not
part of the language specification.

Name binding languages like Python defer knowledge of *where* values
actually are in memory until the code is running, not when it is being
compiled. That means that a sufficiently clever interpreter can move
values around, when needed, to free up blocks of memory as needed.
Memory location languages can't do that, or at least not easily.

The standard Python interpreter written in C doesn't move values around,
but Jython (written in Java) and IronPython (written in C# for the .Net
runtime engine) do. PyPy can take that one step further: it can actually
destroy and recreate objects as needed, turning them into low-level
machine integers or floats for speed, then back again before your Python
code gets access to them. So there are also sorts of complications once
we start delving into the implementation.

But the basic principle is simple:

* Pointers in the C sense record the memory location of a variable;

* Names in the Python sense do not;

* Low-level memory location languages like C typically let you
manipulate addresses directly, for good or bad (until recently, more
bugs were caused by incorrect memory addressing than any other fault);

* High-level name binding languages like Python typically do not give
you any way to manipulate memory addresses, which makes them much
safer (you can't execute random machine code in Python);

* Memory location languages need to know how much space to reserve for
each variable, hence variables themselves have a type;

* Name binding languages don't need to reserve space for variables, and
so variables have no type associated with them; only the values bound
to the variable has a type.


Some languages, like Java, mix both models together. Big complicated
types are objects, and Java uses a name binding system similar to
Python's (with a few notable differences, which I won't go into). But
"simple" values like numbers are, by default, represented as low-level
machine integers or floats, so-called "unboxed" values that use the
memory location model. For those, you have to explicitly ask the
compiler to "box" them into an object when needed.




[1] In principle, one could use an analogous "alias" concept, where one
name indirectly refers to another *name*, not a value:

the_car --> red Ford Pinto with a dint
the_lemon --> the_car

but Python doesn't have anything like that, and I don't know any
language which does. If it did, we could write things like this:


x = 23
y := x # make an alias
# much later on
y = 9999 # y is another name for the *variable* x
print x # prints 9999

I'm not sure whether that would be a good thing or a bad thing.
import_that: snippet of Python code (code)
2015-04-05 09:34 am

Bookmarked links

I got sick and tired of googling for the same terms over and over again, so here I have collected a number of influential and important programming blog posts, articles and references which I find myself coming back to frequently.

Steve Yegge's Execution In The Kingdom Of Nouns.

P.J. Eby's Python is not Java, Java is not Python either, and Python interfaces are not Java.

How to ask smart questions, and
Short, Self-Contained, Correct Example.

Unix as an IDE, IDE culture versus the Unix philosophy and Java shop politics.

Chris Smith's article What To Know Before Debating Type Systems, and a mirror in case it disappears again.

Raymond Hettinger's super considered super, and a counter-view super considered harmful.

Three examples of PyPy being as fast as, or faster than, C:


Tav made an admirable, but ultimately failed, attempt to secure the Python interpreter:


Ken Thompson's classic essay Reflections On Trusting Trust (pdf).

Why monkey-patching is destroying Ruby.

PHP, a fractal of bad design.

Unicode:


Jack Diederich's talk Stop Writing Classes (video), also found here. And Armin Ronacher's counter-view, Start Writing Classes.

Paul Graham's famous essay on Blub languages.

Nick Coghlan on why most suggested changes to Python go nowhere, and a discussion on the speed with which Python changes.

Floating point issues:


XKCD on people being wrong on the Internet, and little Bobby Tables.

Making wrong code look wrong and code smells. How to write unmaintainable code, and Confessions of a terrible programmer, and signs that you are a bad programmer.

Architecture Astronauts:


Two more classic essays from Joel Spolsky: Back To Basics and Leaky Abstractions.

David Beazley's Curious Course on Coroutines.
import_that: African rock python (snake)
2014-07-05 06:00 pm
Entry tags:

A tale of yak shaving

This is a couple of years old now, but still interesting: Barry Warsaw, one of the Python core developers, shares the tale of an hairy debugging experience and the yak shaving needed to solve it:

Everyone who reported the problem said the TypeError was getting thrown on the for-statement line. The exception message indicated that Python was getting some object that it was trying to convert to an integer, but was failing. How could you possible get that exception when either making a copy of a list or iterating over that copy? Was the list corrupted? Was it not actually a list but some list-like object that was somehow returning non-integers for its min and max indexes?
import_that: XKCD guy flying with Python (Default)
2014-05-02 10:20 pm
Entry tags:

Software vulnerabilities in medical devices

Last September, there was a fascinating interview with Karen Sandler by Linux Format. Karen Sandler is the Executive Director of the Gnome Foundation, and she spoke about learning just how vulnerable the software controlling medical devices is, and her efforts to be permitted to audit her implanted heart defibrillator's software.

I was so freaked out about this. I kept trying to talk to doctors about it and they wouldn’t listen to me, or they just didn’t know how to handle the conversation with me. I had one electrophysiologist who I talked to who just hung up the phone on me.


You would probably freak out too if you learned that any script kiddie with an iPhone could take control of your pacemaker and deliver a fatal electric shock. But it wasn't until the late, brilliant, Barnaby Jack and University of Massachusetts associative professor Kevin Fu demonstrated how to take remote control over medical implants fitted with wi-fi that people started to take Karen's concerns seriously.

Wireless medical implants that will talk to any device that says hello. What could possibly go wrong?

Karen continues:

I realised that it’s not just my medical device; it’s not just our lives that are relying on this software: it’s our cars, and our voting machines, and our stock markets and now our phones in the way that we communicate with one another. We’re building all this infrastructure, and it’s putting so much trust in the hands of individual corporations, in software that we can’t review and we can’t control. Terrifying.


And:

25% of all medical device recalls in the last few years have been due to software failure.


Karen's argument is that independent, public review of the source code is the best way to guarantee that bugs and security vulnerabilities are found and corrected as quickly as possible. It's not that open source software is necessarily bug-free, but that there is more opportunity for bugs to be found and fixed, and less opportunity for manufacturers to stick their head in the sand and deny there is a problem. Sunlight is the best disinfectant, and openness and transparency are essential for security. Keeping source code secret doesn't make it more secure. If secrecy were all it took, Windows would be free of viruses and malware. In fact, secrecy is often counter-productive:

I used to decry secret security systems as "security by obscurity." I now say it more strongly: "obscurity means insecurity." — Bruce Schneier


When the television series "Homeland" first aired an episode involving a plot to commit assassination by remote-controlling a pacemaker, it was widely derided as being unrealistic. That was until former American Vice-President Dick Chaney publicly acknowledged that the risk of remote exploits was seriously considered when he was fitted for a pacemaker.

Unless we take treat the security of medical devices and other complex systems seriously, it is only a matter of time before somebody is murdered by remote control. In fact, it may even have already happened. No less than former "security czar" Richard Clarke has warned that the death of investigative journalist Michael Hastings mere hours after he wrote to friends that he was going "off the radar" was completely consistent with a remote attack on his car.
import_that: XKCD guy flying with Python (Default)
2014-05-02 12:51 pm
Entry tags:

Changing Python's prompt

I have a mild dislike of Python's default prompt, ">>>". Not that the prompt itself is bad, but when you copy from an interactive session and paste into email or a Usenet post, the prompt clashes with the standard > email quote marker. So I've changed my first level prompt to "py>" to distinguish interactive sessions from email quotes. Doing so is very simple:

import sys
sys.ps1 = 'py> '


Note the space at the end.

You can change the second level prompt (by default, "...") by assigning to sys.ps2, but I haven't bothered. Both prompts support arbitrary objects, not just strings, which you can use to implement dynamic prompts similar to those iPython uses. Here's a simple example of numbering the prompts:

class Prompt:
    def __init__(self):
        self.count = 0
    def __str__(self):
        self.count += 1
        return "[%4d] " % self.count

sys.ps1 = Prompt()


If you're trying to use coloured prompts, there are some subtitles to be aware of. You have to escape any non-printing characters. See this bug report for details.

You can have Python automatically use your custom prompt by setting it your startup file. If the environment variable PYTHONSTARTUPFILE is set, Python will run the file named in that environment variable when you start the interactive interpreter. As I am using Linux for my desktop, I have the following line in my .bashrc file to set the environment variable each time I log in:

export PYTHONSTARTUP=/home/steve/python/startup.py

and the startup file itself then sets the prompt, as shown above.
import_that: XKCD guy flying with Python (Default)
2014-04-28 11:20 am
Entry tags:

try-finally oddity

There's a curious oddity with Python's treatment of return inside try...finally blocks:

py> def test():
...     try:
...         return 23
...     finally:
...         return 42
...
py> test()
42


While it seems a little odd, I don't think we can really call it a "gotcha", as it shouldn't be all that surprising. The finally block is guaranteed to run when the try block is left, however it is left. The first return sets the return value to 23, the second resets it to 42.

It should be no surprise that finally can raise an exception:

py> def test():
...     try: return 23
...     finally: raise ValueError
...
py> test()
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 3, in test
ValueError


A little less obvious is that it can also swallow exceptions:

py> def test():
...     try: raise ValueError
...     finally: return 42
...
py> test()
42


Earlier, I wrote that the finally block is guaranteed to run. That's not quite true. The finally block won't run if control never leaves the try block:

try:
    while True:
        pass
finally:
    print('this never gets reached')


It also won't run if Python is killed by the operating system, e.g. in response to kill -9, or following loss of power and other such catastrophic failures.

There are also two official ways to force Python to exit without running cleanup code, including code in finally blocks: os.abort and os._exit. Both should be used only for specialized purposes. Normally you should exit your Python programs by:

  1. Falling out the bottom of the program. When there is no more code to run, Python exits cleanly.

  2. Calling sys.exit.

  3. Raising SystemExit.
import_that: XKCD guy flying with Python (Default)
2014-04-27 11:57 pm
Entry tags:

The importance of RAID

Backups are important, right? Really important. Computers die. Hard drives die. If you don't have backups, data may be lost for ever.

But making backups is a nuisance, it's a chore, one of those things that you feel virtuous for doing a few times and then get distracted or bored and stop doing. Especially with home systems, it's easy to be slack.

About a week ago, I had a hard drive suddenly die in my home server. And I had no backups. Oops.

Fortunately, I did have RAID.

Although one drive had died, the second hard drive in the RAID array was okay, with a complete copy of all my data, including a working operating system, and my server just kept going. After a couple of days, I got a new hard drive, moved furniture around so I could actually get to the server, replaced the hard drive (and the long-dead DVD drive as well), moved everything back, and ... the damn server wouldn't boot.

Hmmmm.

As far as I am concerned, RAID is fantastic. It's not really practical in a laptop, but in a desktop or server, I couldn't do without it. RAID isn't really designed as a backup system, but it behaves as a poor man's backup. Or perhaps a slack person's backup. It lets you keep going even in the face of an otherwise catastrophic hard drive failure. But it does have one horrible flaw: the boot loader isn't included in the RAIDed partition. So I had a situation like this:

Before the disk died:

    +--------+-----------------------------+
hda |  GRUB  |      RAIDed Partitions      |  <== Boots from this drive.
    +--------+-----------------------------+

    +--------+-----------------------------+
hdb | blank  |      RAIDed Partitions      |
    +--------+-----------------------------+


After the disk was replaced:

    +--------+-----------------------------+
hda | blank  |      RAIDed Partitions      |  <== The former hdb, moved.
    +--------+-----------------------------+

    +--------+-----------------------------+
hdb | blank  |      blank                  |  <== The replacement drive.
    +--------+-----------------------------+


So here I was with two good disks and no working computers (all my desktops mount their home from the server via NFS). Since neither disk had GRUB installed, there was no way to boot from either of them. After making an attempt to fix the situation with the Centos recovery system, I soon decided that this was beyond my level of expertise. (I might administer my own system, but I have no illusions that I'm a system administrator. A man's got to know his limitations.) Fortunately I was able to get one of the sys admins that I work with to re-install GRUB (thanks David!), this time on both hard drives, and configure RAID for the new drive.

The moral of this story:

  • Backups are important.

  • RAID makes a nifty backup for slackers.

  • But when you configure RAID, your Linux installer probably won't install GRUB on both drives. You need to do it yourself.
import_that: XKCD guy flying with Python (Default)
2014-04-27 11:36 pm
Entry tags:

Sometimes Firefox is too smart

Sometimes Firefox is too smart for its own good. I have been annoyed now for a very long time, months if not years, that when saving images with Firefox, the default Save As location will unexpectedly change. One moment I'm saving assorted images to directory X, the next the directory has changed to Y.

It turns out that Firefox remembers what Save As location you last used on a per-domain basis. I'm not the only person this feature has annoyed, and Firefox has a hidden preference to turn it off:

  1. Open about:config. (If clicking the link doesn't work, type it in the location bar.)

  2. Firefox may warn you that you're about to destroy the Universe void your warranty. Continue regardless.

  3. Search for browser.download.lastDir.savePerSite, and set it to false.

  4. If that preference doesn't already exist, create it by right-clicking on the blank space, then choosing New > Boolean from the context menu to create it.
import_that: XKCD guy flying with Python (Default)
2014-04-27 07:06 pm

Minus 2000 lines of code

There's an entertaining anecdote from the early days of the Apple Lisa and Macintosh computers relating to counting lines of code as a measure of productivity. The story involves Bill Atkinson, the creator of Quickdraw and Hypercard. Apple management had asked their programmers to fill out a form each week stating how many lines of code they had written that week:

Bill Atkinson, the author of Quickdraw and the main user interface designer, who was by far the most important Lisa implementor, thought that lines of code was a silly measure of software productivity. He thought his goal was to write as small and fast a program as possible, and that the lines of code metric only encouraged writing sloppy, bloated, broken code.


After completely re-writing Quickdraw's region calculation routines, making them six times faster while saving 2000 lines of code, Bill was asked to fill out the weekly productivity form. So he dutifully wrote "-2000" as the lines of code written.

A few weeks after that, management stopped asking him to fill out the form.

Trying to measure programmer productivity is a hard problem. Any objective metric, like lines of code, number of tickets serviced, bug reports closed, etc., can either be gamed by the programmer or is vulnerable to social manipulation.
import_that: Bar and line graph (graph)
2014-04-26 07:24 pm
Entry tags:

Statistics quote

Quote of the day, via John Cook:

Statistics: A subject which most statisticians find difficult but in which nearly all physicians are expert. — Stephen Senn
import_that: XKCD guy flying with Python (Default)
2014-04-23 02:20 am
Entry tags:

Classic and new classes

One of the historical oddities of Python is that there are two different kinds of classes, so-called "classic" or "old-style" classes, and "new-style" classes or types. To understand the reason, we have to go back to the dawn of time and very early Python.

Back in the early Python 1.x days, built-in types like int, str and list were completely separate from classes you created with the class keyword. Built-in types were written in C, classes were written in Python, and the two could not be combined. So in Python 1.5, you couldn't inherit from built-in types:

>>> class MyInt(int):
...     pass
...
Traceback (innermost last):
  File "", line 1, in ?
TypeError: base is not a class object


If you wanted to inherit behaviour from a built-in type, the only way to do so was by using delegation, a powerful and useful technique that is unfortunately underused today. At the time though, it was the only way to solve the problem of subclassing from built-in types, leading to useful recipes like this one from Alex Martelli.

But in Python 2.2, classes were unified with built-in types. This involved a surprising number of changes to the behaviour of classes, so for backwards-compatibility, both the old and the new behaviour was kept. Rather than introduce a new keyword, Python 2.2 used a simple set of rules:

  • If you define a class that doesn't inherit from anything, it is an old-style classic class.

  • If your class inherits from another classic class, it too is a classic class.

  • But if you inherit from a built-in type (e.g. dict or list) then it is a new-style class and the new rules apply.

  • Since Python supports multiple inheritance, you might inherit from both new-style and old-style classes; in that case, your class will be new-style.


To aid with this, a new built-in type was created, object. object is the parent type of all new-style classes and types, but not old-style classes. Curiously, isinstance(classic_instance, object) still returns True, even though classic_instance doesn't actually inherit from object. At least that is the case in Python 2.4 and above, I haven't tested 2.2 or 2.3.

There are a few technical differences in behaviour between the old- and new-style classes, including:

  • super only works in new-style classes.

  • Likewise for classmethod and staticmethod.

  • Properties appear to work in classic classes, but actually don't. They seem to work so long as you only read from them, but if you assign to a property, the setter method is not called and the property is replaced.

  • In general, descriptors of any sort only work with new-style classes.

  • New-style classes support __slots__ as a memory optimization.

  • The rules for resolving inheritance of old- and new-style classes are slightly different, with old-style classes' method resolution order being inconsistent under certain circumstances.

  • New-style classes optimize operator overloading by skipping dunder methods like __add__ which are defined on the instance rather than the class.


The complexity of having two kinds of class was always intended to be a temporary measure, and in Python 3, classic classes were finally removed for good. Now, all classes inherit from object, even if you just write class MyClass: with no explicit bases.
import_that: XKCD guy flying with Python (Default)
2014-04-21 02:29 pm
Entry tags:

More on language popularity

Recently, I wrote about the various ways of measuring language popularity, and I thought I'd add another two.

TrendySkills measures popularity of IT technologies (not just languages) by extracting information from job advertisements. It's currently showing Python at number 9, just ahead of C and just behind HTML5. Despite the name, the site doesn't appear to measure trends as such (what technologies are becoming more popular or less popular), but only snapshots of current popularity. It also mixes data collected from numerous countries, including the USA, Spain and Sweden. I don't think the job market is truly world-wide, not even in IT, so that seems a weakness to me: just because a technology is popular in one country doesn't mean it will be equally popular in another.

RedMonk periodically posts a graph of language popularity based on GitHub and StackOverflow. They find Python in position 5, sandwiched between C# and C++.

However you measure it, there's no doubt that Python is one of the most popular and influential languages around.
import_that: 2.8 inside a red circle with a line crossing it out (no-python2.8)
2014-04-15 01:05 pm
Entry tags:

Another 2.8 proposal

There's yet another call for Python 2.8, or at least for discussion for 2.8. Martijn Faassen discusses what Python 2.8 should be, even though it won't be. Frankly, I'm not entirely sure what motivates Martijn to write this post — he seems to have accepted that there won't be a Python 2.8 and isn't asking for that decision to be rethought., so I'm not entirely sure why he wants to discuss 2.8, but let's treat this as a serious suggestion.

Martijn suggests that, paradoxically, the best way to handle the Python 2.x to 3.x transition is for 2.8 to break backwards compatibility with 2.7. I thought we had that version. Isn't it called Python 3.x? Not according to Martijn, who wants to add — or rather since he explicitly says he won't be doing the work, he wants somebody else to add — a whole series of extra compiler options in the form of from __future3__ and from __past__ imports. Like __future__, they will presumably behave like compiler directives and change the behaviour of Python.
Read more... )
import_that: Bar and line graph (graph)
2014-04-09 11:37 pm
Entry tags:

More on variance

Earlier, I discussed some of the terminology and different uses for the two variance functions in Python 3.4's new statistics module. That was partly motivated by a question from Andrew Szeto, who asked me why the variance functions took optional mu or xbar parameters.

The two standard deviation functions stdev and pstdev are just thin wrappers that return the square root of the variance, so they have the same signature as the variance functions.

I had a few motives for including the second parameter, in no particular order:

  1. The reason given in the PEP was that I took the idea from the GNU Scientific Library. Perhaps they know something I don't? (I actually thought of the idea independently, but when I was writing PEP 450 I expected this to be controversial, and was pleased to find prior art.)

  2. It allows a neat micro-optimization to avoid having to recalculate the mean if you've already calculated it. If you have a large data set, or one with custom numeric types where the __add__ method is expensive, calculating the mean once instead of twice may save some time.

  3. Mathematically, the variance is in some sense a function dependent on μ (mu) or x̄ (xbar). Making them parameters of the Python functions reflects that sense.

  4. Variance has a nice interpretation in physics: it's the moment of inertia around the centre of mass. We can calculate the moment of inertia around any point, not just the centre — might we not also calculate the "variance" around some point other than the mean? If you want to abuse the variance function by passing (say) the median instead of the mean as xbar or mu, you can. But if you do, you're responsible for ensuring that the result is physically meaningful. Don't come complaining to me if you get a negative variance or some other wacky value.

  5. It also allows you to calculate an improved sample variance by passing the known population mean to the pvariance function — see my earlier post or Wikipedia for details.


Individually, none of theses were especially strong, and probably wouldn't have justified the additional complexity on their own. But taken together I think they justified including the optional parameters.

Surprisingly (at least to me), I don't recall much if any opposition to these mu/xbar parameters. I expected this feature would be a lot more controversial than it turned out to be. If I recall correctly, there was more bike-shedding about what to call the parameters (I initially just called them "m", for mu/mean) than whether or not to include them.
import_that: Bar and line graph (graph)
2014-03-20 03:02 pm
Entry tags:

Population and sample variance

I am the author of PEP 450 and the statistics module. That module offers four different functions related to statistical variance, and some people may not quite understand what the difference between them.

[Disclaimer: statistical variance is complicated, and my discussion here is quite simplified. In particular, most of what I say only applies to "reasonable" data sets which aren't too skewed or unusual, and samples which are random and representative. If your sample data is not representative of the population from which it is drawn, then all bets are off.]

The statistics module offers two variance functions, pvariance and variance, and two corresponding versions of the standard deviation, pstdev and stdev. The standard deviation functions are just thin wrappers which take the square root of the appropriate variance function, so there's not a lot to say about them. Except where noted differently, everything I say about the (p)variance functions also applies to the (p)stdev functions, so for brevity I will only talk about variance.

The two versions of variance give obviously different results:

py> import statistics
py> data = [1, 2, 3, 3, 3, 5, 8]
py> statistics.pvariance(data)
4.53061224489796
py> statistics.variance(data)
5.2857142857142865


So which should you use? In a nutshell, two simple rules apply:

  • If you are dealing with the entire population, use pvariance.

  • If you are working with a sample, use variance instead.


If you remember those two rules, you won't go badly wrong. Or at least, no more badly than most naive users of statistical functions. You want to be better than them, don't you? Then read on...

Read more... )
import_that: XKCD guy flying with Python (Default)
2014-03-19 12:30 pm
Entry tags:

Bike-shedding

Nathaniel Smith describes the culture of the python-ideas and python-dev mailing lists:

We're more of the love, bikeshedding, and rhetoric school. Well, we can do you bikeshedding and love without the rhetoric, and we can do you bikeshedding and rhetoric without the love, and we can do you all three concurrent or consecutive. But we can't give you love and rhetoric without the bikeshedding. Bikeshedding is compulsory.


Bike-shedding gets a bad rap, and deservedly so. But bike-shedding can also be a sign of passion and attention to detail. Sometimes it really does matter what colour the bike-shed is: "colour" (syntax) can have functional and practical consequences, even for real-life bike-sheds. Dark colours tend to absorb and retain more heat than light colours. Syntax matters. Languages which feel like a harmonious whole — even if the design sometimes only makes sense if you are Dutch — require that the designers care about the little details of syntax and spelling. Even though functionally there would be little difference, Python would be a very different language indeed if we spelled this:

def func(x, y):
    return x/(x+y)


as this:

: func OVER + / ;


It would in fact be Forth.

So let's hear it for a little bit of bike-shedding — but not too much!
import_that: XKCD guy flying with Python (Default)
2014-03-05 11:48 am
Entry tags:

Language popularity

What's the most popular programming language in the world? Or at least those parts of the English-speaking world which are easily found on the Internet? C, C++, Java, PHP, Javascript, VB? Is Python on the rise on in decline? How do we know?

There are a few websites which make the attempt to measure programming language popularity, for some definition of "popularity". Since they all have different methods of measuring popularity, and choose different proxies to measure (things like the number of job ads or the number of on-line tutorials for a language), they give different results — sometimes quite radically different, which is a strong indicator that even if language popularity has a single objective definition (and it probably doesn't) none of these methods are measuring it.

So keeping in mind that any discussion of language popularity should be taken with a considerable pinch of salt, let's have a look at four well-known sites that try to measure popularity.

Read more... )
import_that: XKCD guy flying with Python (Default)
2014-02-07 07:43 pm
Entry tags:

Does Python pass by reference or value?

One topic which comes up from time to time, usually generating a lot of heat and not much light, is the question of how Python passes values to functions and methods. Usually the question is posed as "Does Python use pass-by-reference or pass-by-value?".

The answer is, neither. Python, like most modern object-oriented languages, uses an argument passing strategy first named by one of the pioneers of object-oriented programming, Barbara Liskov, in 1974 for the language CLU. Liskov named it pass-by-object-sharing, or just pass-by-sharing, but the actual strategy is a lot older, being the same as how Lisp passes arguments. Despite this august background, most people outside of Python circles have never heard of pass-by-sharing, and consequently there is a lot of confusion about argument passing terminology.

Let's start by looking at how people get confused. Let's start by "proving" that Python is pass-by-value (also know as call-by-value), then we'll "prove" that Python is pass-by-reference (call-by-reference). It's actually neither, but if you think that there are only two ways to pass arguments to a function, you might be fooled into thinking Python uses both. Read more... )
import_that: XKCD guy flying with Python (Default)
2014-01-15 11:11 pm
Entry tags:

Lies in code

Quote of the week:

At Resolver we've found it useful to short-circuit any doubt and just refer to comments in code as 'lies'.


-- Michael Foord paraphrases Christian Muirhead on python-dev.
import_that: XKCD guy flying with Python (Default)
2014-01-15 08:06 pm
Entry tags:

When to use assert

Python's assert statement is a very useful feature that unfortunately often gets misused. assert takes an expression and an optional error message, evaluates the expression, and if it gives a true value, does nothing. If the expression evaluates to a false value, it raises an AssertionError exception with optional error message. For example:

py> x = 23
py> assert x > 0, "x is zero or negative"
py> assert x%2 == 0, "x is an odd number"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError: x is an odd number


Many people use assertions as a quick and easy way to raise an exception if an argument is given the wrong value. But this is wrong, badly wrong, for two reasons. Read more... )