import_that: XKCD guy flying with Python (Default)
[personal profile] import_that
> So what is the difference between a python name and a pointer? I'm a bit
> fuzzy on that.

In one sense -- not a very helpful sense, I'll grant, but it is a sense
-- both names and pointers are just examples of the generic concept of
"a reference to a thing", so they are the same thing.

In another sense -- also not very helpful -- they are completely
different and utterly unrelated.

The truth is somewhere in between, but takes some time to explain.

We need a way to *refer* to actual concrete things (or values, if you
like) in a generic fashion. If I want you to move your car, I can't
always drag you by the hand to the car, bang your head against it a few
times, and say "move it". I need a way to refer to the car indirectly,
without access to the object. Saying "this particular object made of
metal and plastic and glass, with four tyres, an engine, blah blah blah"
is a bit wordy, so I refer to it using a name or descriptive phrase
("the car") or a location ("that one over there, blocking my driveway").
Both names and locations are specific kinds of "references", as they
refer to a value in a generic fashion.

So it goes in programming languages: there are two ways of refering to

- the value currently known as 'the_car'

- the value at location 123456

If you are old enough, you may remember programmable calculators where
the only variables you had were numbered, not named, so you had to do
things like "STO 4" and "RCL 3" to store and recall values from a
register. But while computers are really good at dealing with numbered
registers like that, people aren't, so programming languages usually
give locations *names* to make them easier to refer to:

- the value at location 'the_car'

The compiler then has an internal table mapping name to location:

the_car --> 123456
myint --> 890002
mystr --> 557018

etc. And, in principle at least, the names are not needed when the
program actually runs: the compiler may keep them, for error reporting
and debugging, or it may throw them away.

With such memory location languages, the concept of a pointer is simple:
it's just a single numeric redirection:

the_car_ptr --> 468108 # the location of the pointer itself

where the *contents* of location 468108 are the address of the_car,
namely 123456.

In contrast, the first version needs to have some sort of table mapping
names to *values*, not locations:

the_car --> red Ford Pinto with a dint
myint --> 99

and those names must be available at runtime. But there is no concept of
a *pointer*, since there are no locations to point to.[1]

So you can see the similarity between *memory location* references and
*name binding* references, as well as the differences.

Another important difference is that when you are working with
locations, one needs to know how much space to leave for each value
(lest one value accidentally overwrites part of another value), hence
languages that work with memory locations typically have static types.
The compiler knows that only car objects can appear at location
'the_car' (a.k.a. 123456), and it knows that car objects are always 30
bytes in size, so it can reserve spots 123456 to 123485 for a car

Hence memory location languages like C, Pascal, Haskell and similar
tend to have static types known at compile time, and you often have to

declare the type of every variable.

(More modern languages like Haskell can use *type inference*, a form of
artificial intelligence: if it sees that you add 1 to x, the compiler
can infer that x must be a number, since only numbers support adding 1.)

But, name binding languages don't care how big the value is, since they
don't care what location it is. So name binding languages can use
dynamic types, and names can refer to any kind of value at all: first it
is an int, then it's a string, now it's a float.

In practice, due to the way our computer hardware works, all programming
languages *fundamentally* must end up dealing with memory locations,
because that's the only way the hardware works. In principle, though,
that is not necessarily the case. A sufficiently dedicated and clever
engineer could build a computer out of clockwork, or water flowing
through pipes, or using a giant simulation of Conway's Game Of Life
(look it up). Or run it inside the human brain.

But ignoring these cases and let us stick to electronic computers with
memory chips that you can buy today. Our Python compiler needs somehow
to turn that name binding:

the_car --> red Ford Pinto with a dint

into an actual memory location ("where the hell is that car object?") in
order to do calculations with it. But that is abstracted away, as part
of the language implementation, not part of the language itself.

In Python's case, that implementation may use C pointers. When Python
tries to resolve the name "the_car", the first thing it does is *search*
the current namespace. A successful search returns the value (the red
Pinto) which in terms of the implementation means a pointer to the
memory location where the value is found. Under the hood of the Python
compiler/interpreter, it uses pointers.

What's a namespace? It's just a data structure which the interpreter
knows how to search. In Python, that is typically a dictionary, and
searching dictionaries isn't that time consuming, but it isn't
instantaneous either. Python can pull a few optimizations to make it

faster, e.g. inside functions, it actually uses something remarkably
similar to C's memory location model, or perhaps "named registers" may
be a better description. But that's an implementation detail and is not
part of the language specification.

Name binding languages like Python defer knowledge of *where* values
actually are in memory until the code is running, not when it is being
compiled. That means that a sufficiently clever interpreter can move
values around, when needed, to free up blocks of memory as needed.
Memory location languages can't do that, or at least not easily.

The standard Python interpreter written in C doesn't move values around,
but Jython (written in Java) and IronPython (written in C# for the .Net
runtime engine) do. PyPy can take that one step further: it can actually
destroy and recreate objects as needed, turning them into low-level
machine integers or floats for speed, then back again before your Python
code gets access to them. So there are also sorts of complications once
we start delving into the implementation.

But the basic principle is simple:

* Pointers in the C sense record the memory location of a variable;

* Names in the Python sense do not;

* Low-level memory location languages like C typically let you
manipulate addresses directly, for good or bad (until recently, more
bugs were caused by incorrect memory addressing than any other fault);

* High-level name binding languages like Python typically do not give
you any way to manipulate memory addresses, which makes them much
safer (you can't execute random machine code in Python);

* Memory location languages need to know how much space to reserve for
each variable, hence variables themselves have a type;

* Name binding languages don't need to reserve space for variables, and
so variables have no type associated with them; only the values bound
to the variable has a type.

Some languages, like Java, mix both models together. Big complicated
types are objects, and Java uses a name binding system similar to
Python's (with a few notable differences, which I won't go into). But
"simple" values like numbers are, by default, represented as low-level
machine integers or floats, so-called "unboxed" values that use the
memory location model. For those, you have to explicitly ask the
compiler to "box" them into an object when needed.

[1] In principle, one could use an analogous "alias" concept, where one
name indirectly refers to another *name*, not a value:

the_car --> red Ford Pinto with a dint
the_lemon --> the_car

but Python doesn't have anything like that, and I don't know any
language which does. If it did, we could write things like this:

x = 23
y := x # make an alias
# much later on
y = 9999 # y is another name for the *variable* x
print x # prints 9999

I'm not sure whether that would be a good thing or a bad thing.


import_that: XKCD guy flying with Python (Default)
Steven D'Aprano

May 2015

345678 9

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags