How-To Guide for Descriptors

Posted by Jeffye | 9:02 PM

How-To Guide for Descriptors

Author: Raymond Hettinger
Contact: <python at rcn dot com>
Copyright: Copyright (c) 2003, 2004 Python Software Foundation. All rights reserved.

Abstract

Defines descriptors, summarizes the protocol, and shows how descriptors are called. Examines a custom descriptor and several built-in python descriptors including functions, properties, static methods, and class methods. Shows how each works by giving a pure Python equivalent and a sample application.
Learning about descriptors not only provides access to a larger toolset, it creates a deeper understanding of how Python works and an appreciation for the elegance of its design.

Definition and Introduction

In general, a descriptor is an object attribute with "binding behavior", one whose attribute access has been overridden by methods in the descriptor protocol. Those methods are __get____set__, and __delete__. If any of those methods are defined for an object, it is said to be a descriptor.
The default behavior for attribute access is to get, set, or delete the attribute from an object's dictionary. For instance, a.x has a lookup chain starting with a.__dict__['x'], then type(a).__dict__['x'], and continuing through the base classes of type(a)excluding metaclasses. If the looked-up value is an object defining one of the descriptor methods, then Python may override the default behavior and invoke the descriptor method instead. Where this occurs in the precedence chain depends on which descriptor methods were defined. Note that descriptors are only invoked for new style objects or classes (a class is new style if it inherits fromobject or type).
Descriptors are a powerful, general purpose protocol. They are the mechanism behind properties, methods, static methods, class methods, and super(). They are used used throughout Python itself to implement the new style classes introduced in version 2.2. Descriptors simplify the underlying C-code and offer a flexible set of new tools for everyday Python programs.

Descriptor Protocol

descr.__get__(self, obj, type=None) --> value
descr.__set__(self, obj, value) --> None
descr.__delete__(self, obj) --> None
That is all there is to it. Define any of these methods and an object is considered a descriptor and can override default behavior upon being looked up as an attribute.
If an object defines both __get__ and __set__, it is considered a data descriptor. Descriptors that only define __get__ are called non-data descriptors (they are typically used for methods but other uses are possible).
Data and non-data descriptors differ in how overrides are calculated with respect to entries in an instance's dictionary. If an instance's dictionary has an entry with the same name as a data descriptor, the data descriptor takes precedence. If an instance's dictionary has an entry with the same name as a non-data descriptor, the dictionary entry takes precedence.
To make a read-only data descriptor, define both __get__ and __set__ with the __set__ raising an AttributeError when called. Defining the __set__ method with an exception raising placeholder is enough to make it a data descriptor.

Invoking Descriptors

A descriptor can be called directly by its method name. For example, d.__get__(obj).
Alternatively, it is more common for a descriptor to be invoked automatically upon attribute access. For example, obj.d looks up din the dictionary of obj. If d defines the method __get__, then d.__get__(obj) is invoked according to the precedence rules listed below.
The details of invocation depend on whether obj is an object or a class. Either way, descriptors only work for new style objects and classes. A class is new style if it is a subclass of object.
For objects, the machinery is in object.__getattribute__ which transforms b.x into type(b).__dict__['x'].__get__(b, type(b)). The implementation works through a precedence chain that gives data descriptors priority over instance variables, instance variables priority over non-data descriptors, and assigns lowest priority to __getattr__ if provided. The full C implementation can be found in PyObject_GenericGetAttr() in Objects/object.c.
For classes, the machinery is in type.__getattribute__ which transforms B.x into B.__dict__['x'].__get__(None, B). In pure Python, it looks like:
def __getattribute__(self, key):
    "Emulate type_getattro() in Objects/typeobject.c"
    v = object.__getattribute__(self, key)
    if hasattr(v, '__get__'):
       return v.__get__(None, self)
    return v
The important points to remember are:
  • descriptors are invoked by the __getattribute__ method
  • overriding __getattribute__ prevents automatic descriptor calls
  • __getattribute__ is only available with new style classes and objects
  • object.__getattribute__ and type.__getattribute__ make different calls to __get__.
  • data descriptors always override instance dictionaries.
  • non-data descriptors may be overridden by instance dictionaries.
The object returned by super() also has a custom __getattribute__ method for invoking descriptors. The call super(B, obj).m() searches obj.__class__.__mro__ for the base class A immediately following B and then returnsA.__dict__['m'].__get__(obj, A). If not a descriptor, m is returned unchanged. If not in the dictionary, m reverts to a search usingobject.__getattribute__.
Note, in Python 2.2, super(B, obj).m() would only invoke __get__ if m was a data descriptor. In Python 2.3, non-data descriptors also get invoked unless an old-style class is involved. The implementation details are in super_getattro() inObjects/typeobject.c and a pure Python equivalent can be found in Guido's Tutorial.
The details above show that the mechanism for descriptors is embedded in the __getattribute__() methods for objecttype, andsuper. Classes inherit this machinery when they derive from object or if they have a meta-class providing similar functionality. Likewise, classes can turn-off descriptor invocation by overriding __getattribute__().

Descriptor Example

The following code creates a class whose objects are data descriptors which print a message for each get or set. Overriding__getattribute__ is alternate approach that could do this for every attribute. However, this descriptor is useful for monitoring just a few chosen attributes:
class RevealAccess(object):
    """A data descriptor that sets and returns values
       normally and prints a message logging their access.
    """

    def __init__(self, initval=None, name='var'):
        self.val = initval
        self.name = name

    def __get__(self, obj, objtype):
        print 'Retrieving', self.name
        return self.val

    def __set__(self, obj, val):
        print 'Updating' , self.name
        self.val = val

>>> class MyClass(object):
    x = RevealAccess(10, 'var "x"')
    y = 5

>>> m = MyClass()
>>> m.x
Retrieving var "x"
10
>>> m.x = 20
Updating var "x"
>>> m.x
Retrieving var "x"
20
>>> m.y
5
The protocol is simple and offers exciting possibilities. Several use cases are so common that they have been packaged into individual function calls. Properties, bound and unbound methods, static methods, and class methods are all based on the descriptor protocol.

Properties

Calling property() is a succinct way of building a data descriptor that triggers function calls upon access to an attribute. Its signature is:
property(fget=None, fset=None, fdel=None, doc=None) -> property attribute
The documentation shows a typical use to define a managed attribute x:
class C(object):
    def getx(self): return self.__x
    def setx(self, value): self.__x = value
    def delx(self): del self.__x
    x = property(getx, setx, delx, "I'm the 'x' property.")
To see how property() is implemented in terms of the descriptor protocol, here is a pure Python equivalent:
class Property(object):
    "Emulate PyProperty_Type() in Objects/descrobject.c"

    def __init__(self, fget=None, fset=None, fdel=None, doc=None):
        self.fget = fget
        self.fset = fset
        self.fdel = fdel
        self.__doc__ = doc

    def __get__(self, obj, objtype=None):
        if obj is None:
            return self         
        if self.fget is None:
            raise AttributeError, "unreadable attribute"
        return self.fget(obj)

    def __set__(self, obj, value):
        if self.fset is None:
            raise AttributeError, "can't set attribute"
        self.fset(obj, value)

    def __delete__(self, obj):
        if self.fdel is None:
            raise AttributeError, "can't delete attribute"
        self.fdel(obj)
The property() builtin helps whenever a user interface has granted attribute access and then subsequent changes require the intervention of a method.
For instance, a spreadsheet class may grant access to a cell value through Cell('b10').value. Subsequent improvements to the program require the cell to be recalculated on every access; however, the programmer does not want to affect existing client code accessing the attribute directly. The solution is to wrap access to the value attribute in a property() data descriptor:
class Cell(object):
    . . .
    def getvalue(self, obj):
        "Recalculate cell before returning value"
        self.recalc()
        return obj._value
    value = property(getvalue)

Functions and Methods

Python's object oriented features are built upon a function based environment. Using non-data descriptors, the two are merged seamlessly.
Class dictionaries store methods as functions. In a class definition, methods are written using def and lambda, the usual tools for creating functions. The only difference from regular functions is that the first argument is reserved for the object instance. By Python convention, the instance reference is called self but may be called this or any other variable name.
To support method calls, functions include the __get__ method for binding methods during attribute access. This means that all functions are non-data descriptors which return bound or unbound methods depending whether they are invoked from an object or a class. In pure python, it works like this:
class Function(object):
    . . .
    def __get__(self, obj, objtype=None):
        "Simulate func_descr_get() in Objects/funcobject.c"
        return types.MethodType(self, obj, objtype)
Running the interpreter shows how the function descriptor works in practice:
>>> class D(object):
     def f(self, x):
          return x

>>> d = D()
>>> D.__dict__['f'] # Stored internally as a function
<function f at 0x00C45070>
>>> D.f             # Get from a class becomes an unbound method
<unbound method D.f>
>>> d.f             # Get from an instance becomes a bound method
<bound method D.f of <__main__.D object at 0x00B18C90>>
The output suggests that bound and unbound methods are two different types. While they could have been implemented that way, the actual C implemention of PyMethod_Type in Objects/classobject.c is a single object with two different representations depending on whether the im_self field is set or is NULL (the C equivalent of None).
Likewise, the effects of calling a method object depend on the im_self field. If set (meaning bound), the original function (stored in the im_func field) is called as expected with the first argument set to the instance. If unbound, all of the arguments are passed unchanged to the original function. The actual C implementation of instancemethod_call() is only slightly more complex in that it includes some type checking.

Static Methods and Class Methods

Non-data descriptors provide a simple mechanism for variations on the usual patterns of binding functions into methods.
To recap, functions have a __get__ method so that they can be converted to a method when accessed as attributes. The non-data descriptor transforms a obj.f(*args) call into f(obj, *args). Calling klass.f(*args) becomes f(*args).
This chart summarizes the binding and its two most useful variants:
Transformation Called from an ObjectCalled from a Class
Descriptorfunctionf(obj, *args)f(*args)
staticmethodf(*args)f(*args)
classmethodf(type(obj), *args)f(klass, *args)
Static methods return the underlying function without changes. Calling either c.f or C.f is the equivalent of a direct lookup intoobject.__getattribute__(c, "f") or object.__getattribute__(C, "f"). As a result, the function becomes identically accessible from either an object or a class.
Good candidates for static methods are methods that do not reference the self variable.
For instance, a statistics package may include a container class for experimental data. The class provides normal methods for computing the average, mean, median, and other descriptive statistics that depend on the data. However, there may be useful functions which are conceptually related but do not depend on the data. For instance, erf(x) is handy conversion routine that comes up in statistical work but does not directly depend on a particular data set. It can be called either from an object or the class:s.erf(1.5) --> .9332 or Sample.erf(1.5) --> .9332.
Since staticmethods return the underlying function with no changes, the example calls are unexciting:
>>> class E(object):
     def f(x):
          print x
     f = staticmethod(f)

>>> print E.f(3)
3
>>> print E().f(3)
3
Using the non-data descriptor protocol, a pure Python version of staticmethod() would look like this:
class StaticMethod(object):
 "Emulate PyStaticMethod_Type() in Objects/funcobject.c"

 def __init__(self, f):
      self.f = f

 def __get__(self, obj, objtype=None):
      return self.f
Unlike static methods, class methods prepend the class reference to the argument list before calling the function. This format is the same for whether the caller is an object or a class:
>>> class E(object):
     def f(klass, x):
          return klass.__name__, x
     f = classmethod(f)

>>> print E.f(3)
('E', 3)
>>> print E().f(3)
('E', 3)
This behavior is useful whenever the function only needs to have a class reference and does not care about any underlying data. One use for classmethods is to create alternate class constructors. In Python 2.3, the classmethod dict.fromkeys() creates a new dictionary from a list of keys. The pure Python equivalent is:
class Dict:
    . . .
    def fromkeys(klass, iterable, value=None):
        "Emulate dict_fromkeys() in Objects/dictobject.c"
        d = klass()
        for key in iterable:
            d[key] = value
        return d
    fromkeys = classmethod(fromkeys)
Now a new dictionary of unique keys can be constructed like this:
>>> Dict.fromkeys('abracadabra')
{'a': None, 'r': None, 'b': None, 'c': None, 'd': None}
Using the non-data descriptor protocol, a pure Python version of classmethod() would look like this:
class ClassMethod(object):
     "Emulate PyClassMethod_Type() in Objects/funcobject.c"

     def __init__(self, f):
          self.f = f

     def __get__(self, obj, klass=None):
          if klass is None:
               klass = type(obj)
          def newfunc(*args):
               return self.f(klass, *args)
          return newfunc

abc – Abstract Base Classes

Purpose:Define and use abstract base classes for API checks in your code.
Available In:2.6

Why use Abstract Base Classes?

Abstract base classes are a form of interface checking more strict than individual hasattr()checks for particular methods. By defining an abstract base class, you can define a common API for a set of subclasses. This capability is especially useful in situations where a third-party is going to provide implementations, such as with plugins to an application, but can also aid you when working on a large team or with a large code-base where keeping all classes in your head at the same time is difficult or not possible.

How ABCs Work

abc works by marking methods of the base class as abstract, and then registering concrete classes as implementations of the abstract base. If your code requires a particular API, you can use issubclass() or isinstance() to check an object against the abstract class.
Let’s start by defining an abstract base class to represent the API of a set of plugins for saving and loading data.
import abc

class PluginBase(object):
    __metaclass__ = abc.ABCMeta
    
    @abc.abstractmethod
    def load(self, input):
        """Retrieve data from the input source and return an object."""
        return
    
    @abc.abstractmethod
    def save(self, output, data):
        """Save the data object to the output."""
        return

Registering a Concrete Class

There are two ways to indicate that a concrete class implements an abstract: register the class with the abc or subclass directly from the abc.
import abc
from abc_base import PluginBase

class RegisteredImplementation(object):
    
    def load(self, input):
        return input.read()
    
    def save(self, output, data):
        return output.write(data)

PluginBase.register(RegisteredImplementation)

if __name__ == '__main__':
    print 'Subclass:', issubclass(RegisteredImplementation, PluginBase)
    print 'Instance:', isinstance(RegisteredImplementation(), PluginBase)
In this example the PluginImplementation is not derived from PluginBase, but is registered as implementing the PluginBase API.
$ python abc_register.py

Subclass: True
Instance: True

Implementation Through Subclassing

By subclassing directly from the base, we can avoid the need to register the class explicitly.
import abc
from abc_base import PluginBase

class SubclassImplementation(PluginBase):
    
    def load(self, input):
        return input.read()
    
    def save(self, output, data):
        return output.write(data)

if __name__ == '__main__':
    print 'Subclass:', issubclass(SubclassImplementation, PluginBase)
    print 'Instance:', isinstance(SubclassImplementation(), PluginBase)
In this case the normal Python class management is used to recognizePluginImplementation as implementing the abstract PluginBase.
$ python abc_subclass.py

Subclass: True
Instance: True
A side-effect of using direct subclassing is it is possible to find all of the implementations of your plugin by asking the base class for the list of known classes derived from it (this is not an abc feature, all classes can do this).
import abc
from abc_base import PluginBase
import abc_subclass
import abc_register

for sc in PluginBase.__subclasses__():
    print sc.__name__
Notice that even though abc_register is imported, RegisteredImplementation is not among the list of subclasses because it is not actually derived from the base.
$ python abc_find_subclasses.py

SubclassImplementation
Dr. André Roberge has described using this capability to discover plugins by importing all of the modules in a directory dynamically and then looking at the subclass list to find the implementation classes.

Incomplete Implementations

Another benefit of subclassing directly from your abstract base class is that the subclass cannot be instantiated unless it fully implements the abstract portion of the API. This can keep half-baked implementations from triggering unexpected errors at runtime.
import abc
from abc_base import PluginBase

class IncompleteImplementation(PluginBase):
    
    def save(self, output, data):
        return output.write(data)

PluginBase.register(IncompleteImplementation)

if __name__ == '__main__':
    print 'Subclass:', issubclass(IncompleteImplementation, PluginBase)
    print 'Instance:', isinstance(IncompleteImplementation(), PluginBase)
$ python abc_incomplete.py

Subclass: True
Instance:
Traceback (most recent call last):
  File "abc_incomplete.py", line 22, in <module>
    print 'Instance:', isinstance(IncompleteImplementation(), PluginBase)
TypeError: Can't instantiate abstract class IncompleteImplementation with abstract methods load

Concrete Methods in ABCs

Although a concrete class must provide an implementation of an abstract methods, the abstract base class can also provide an implementation that can be invoked via super(). This lets you re-use common logic by placing it in the base class, but force subclasses to provide an overriding method with (potentially) custom logic.
import abc
from cStringIO import StringIO

class ABCWithConcreteImplementation(object):
    __metaclass__ = abc.ABCMeta
    
    @abc.abstractmethod
    def retrieve_values(self, input):
        print 'base class reading data'
        return input.read()

class ConcreteOverride(ABCWithConcreteImplementation):
    
    def retrieve_values(self, input):
        base_data = super(ConcreteOverride, self).retrieve_values(input)
        print 'subclass sorting data'
        response = sorted(base_data.splitlines())
        return response

input = StringIO("""line one
line two
line three
""")

reader = ConcreteOverride()
print reader.retrieve_values(input)
print
Since ABCWithConcreteImplementation is an abstract base class, it isn’t possible to instantiate it to use it directly. Subclasses must provide an override for retrieve_values(), and in this case the concrete class massages the data before returning it at all.
$ python abc_concrete_method.py

base class reading data
subclass sorting data
['line one', 'line three', 'line two']

Abstract Properties

If your API specification includes attributes in addition to methods, you can require the attributes in concrete classes by defining them with @abstractproperty.
import abc

class Base(object):
    __metaclass__ = abc.ABCMeta
    
    @abc.abstractproperty
    def value(self):
        return 'Should never get here'


class Implementation(Base):
    
    @property
    def value(self):
        return 'concrete property'


try:
    b = Base()
    print 'Base.value:', b.value
except Exception, err:
    print 'ERROR:', str(err)

i = Implementation()
print 'Implementation.value:', i.value
The Base class in the example cannot be instantiated because it has only an abstract version of the property getter method.
$ python abc_abstractproperty.py

ERROR: Can't instantiate abstract class Base with abstract methods value
Implementation.value: concrete property
You can also define abstract read/write properties.
import abc

class Base(object):
    __metaclass__ = abc.ABCMeta
    
    def value_getter(self):
        return 'Should never see this'
    
    def value_setter(self, newvalue):
        return

    value = abc.abstractproperty(value_getter, value_setter)


class PartialImplementation(Base):
    
    @abc.abstractproperty
    def value(self):
        return 'Read-only'


class Implementation(Base):
    
    _value = 'Default value'
    
    def value_getter(self):
        return self._value

    def value_setter(self, newvalue):
        self._value = newvalue

    value = property(value_getter, value_setter)


try:
    b = Base()
    print 'Base.value:', b.value
except Exception, err:
    print 'ERROR:', str(err)

try:
    p = PartialImplementation()
    print 'PartialImplementation.value:', p.value
except Exception, err:
    print 'ERROR:', str(err)

i = Implementation()
print 'Implementation.value:', i.value

i.value = 'New value'
print 'Changed value:', i.value
Notice that the concrete property must be defined the same way as the abstract property. Trying to override a read/write property in PartialImplementation with one that is read-only does not work.
$ python abc_abstractproperty_rw.py

ERROR: Can't instantiate abstract class Base with abstract methods value
ERROR: Can't instantiate abstract class PartialImplementation with abstract methods value
Implementation.value: Default value
Changed value: New value
To use the decorator syntax does with read/write abstract properties, the methods to get and set the value should be named the same.
import abc

class Base(object):
    __metaclass__ = abc.ABCMeta
    
    @abc.abstractproperty
    def value(self):
        return 'Should never see this'
    
    @value.setter
    def value(self, newvalue):
        return


class Implementation(Base):
    
    _value = 'Default value'
    
    @property
    def value(self):
        return self._value

    @value.setter
    def value(self, newvalue):
        self._value = newvalue


i = Implementation()
print 'Implementation.value:', i.value

i.value = 'New value'
print 'Changed value:', i.value
Notice that both methods in the Base and Implementation classes are named value(), although they have different signatures.
$ python abc_abstractproperty_rw_deco.py

Implementation.value: Default value
Changed value: New value

Collection Types

The collections module defines several abstract base classes related to container (and containable) types.
General container classes:
  • Container
  • Sized
Iterator and Sequence classes:
  • Iterable
  • Iterator
  • Sequence
  • MutableSequence
Unique values:
  • Hashable
  • Set
  • MutableSet
Mappings:
  • Mapping
  • MutableMapping
  • MappingView
  • KeysView
  • ItemsView
  • ValuesView
Miscelaneous:
  • Callable
In addition to serving as detailed real-world examples of abstract base classes, Python’s built-in types are automatically registered to these classes when you import collections. This means you can safely use isinstance() to check parameters in your code to ensure that they support the API you need. The base classes can also be used to define your own collection types, since many of them provide concrete implementations of the internals and only need a few methods overridden. Refer to the standard library docs for collections for more details.
See also:
abc
The standard library documentation for this module.
PEP 3119
Introducing Abstract Base Classes
collections
The collections module includes abstract base classes for several collection types.
collections
The standard library documentation for collections.
PEP 3141
A Type Hierarchy for Numbers
Wikipedia: Strategy Pattern
Description and examples of the strategy pattern.
Plugins and monkeypatching
PyCon 2009 presentation by Dr. André Roberge

__new__() in python

Posted by Jeffye | 9:06 PM

__new__() in python

By : Akshar Raaj from http://agiliq.com/blog/2012/06/__new__-python/
Lately I started looking into Django code and wish to write about internals of Django. I started with Django models and will be writing about it soon. For understanding how Django models work, I had to understand what metaclasses are and how metaclasses work. Metaclasses use method "__new__" and so I looked at what "__new__" does.

As __new__ is a static method, we will see a lttle bit about static methods and then __new__ in detail.

  1. Understanding static methods.
  2. Understanding method "__new__" of any class. We will see how to override method __new__ in a class.
Also, I will be trying all the code we write here on Ipython and I suggest you to try everything on Ipython as well.

Static methods

A little bit about instance methods first. Let's write a class.
In [1]: class A(object):
   ...:     def met(self, a, b):
   ...:         print a, b
   ...:
In this case, met() is an instance method. So, it is expected that we pass an instance of A as the first argument to met.
Let's create an object and call met() on the created object and pass two arguments to met().
In [4]: obj = A()
In [5]: obj.met(1,2)
1 2                #output

What happened here?

When we called met(), we passed two arguments although met() expects three argument as per its definition. When we wrote obj.met(1, 2), interpreter took care of sending instance obj as the first argument to met() and 1 and 2 were passed as second and third arguments respectively.
Let's try calling met() without an instance or in other words let's call the method using class.
In [6]: A.met(1,2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/home/akshar/branding_git/netconference/<ipython-input-6-e8b323dba928> in <module>()
----> 1 A.met(1,2)
TypeError: unbound method met() must be called with A instance as first argument (got int instance instead)
We defined met() as an instance method of class A, so it expected an instance of A as the first argument. As its very clear from the error, met() expected an instance of A as the first argument but got an 'int' instead.
If we pass an instance of A as the first argument, it will work as expected.
In [7]: A.met(obj, 3, 4)
3 4                  #output
Notice that we called the method on class A and not on an instance of A. But we took care of sending an instance of A as the first argument to met() and it worked as expected.
Let's see static method now.
In [8]: class B(object):
   ...:     @staticmethod
   ...:     def met(a, b):
   ...:         print a,b
   ...:

What does @staticmethod above the method definition do?

Its a decorator which changes a method to static method. It means the method is no longer an instance method, which means that the method does not expect its first argument to be an instance. So, for our method definition, the method does not expect its first argument to be an instance of B. Even if we call the method on an instance of B, current instance will not be passed as the first argument to this method, since its a static method. For instance method that we saw earlier the current instance was passed as the first argument.
In [9]: B.met(5,6)
5 6                     #output
Here we were able to call the method on the class and were not required to pass an instance of B as the first argument.
Let's call this method on an instance.
In [10]: b = B()
In [11]: b.met(5,6)
5 6
Here we called the method on an instance and passed two arguments. Since its a static method, the current instance i.e b was not passed as the first argument to met(). Had it not been a static method, current instance would have been passed as first argument, 5 as second argument and 6 as third argument.

Understanding method __new__ of a class. We will also see how to override method __new__ in a class.

First let's see a little bit about __init__
In [13]: class A(object):
   ....:     def __init__(self, a, b):
   ....:         print "init gets called"
   ....:         print "self is", self
   ....:         self.a, self.b = a,b
   ....:
In __init__, we print something as we enter the method which is to validate that __init__ has been called. Then we print the first argument which is self and then we perform some assignments. __init__ is an instance method and expects the first argument to be an instance.
Let's call the class passing it two arguments. Keep in mind the part call the class, we are going to again use it in next few lines. If you have any confusion about the part calling the class, it will be clear in next few lines.
In [16]: a = A(1,2)
init gets called                                    #output
self is <__main__.A object at 0x3357210>            #output
Notice the second line of output which is "self is <__main__.A object at 0x3357210>". As apparent from second line of output, when __init__ is entered, object/instance has already been created by that time. Its only the assignment which is done in __init__, althought you could do some other stuff as well. But __init__ doesn't create the instance. __init__ receives the created instance as the first argument.

What creates the object?

Method __new__() creates the object.

What is __new__?

  1. __new__ is a static method which creates an instance. We will see the method signature soon. One reason i could think of having __new__ as a static method is because the instance has not been created yet when __new__ is called. So, they could not have had it as an instance method.
  2. __new__ gets called when you call the classCall the class means issuing the statement "a=A(1,2)". Here A(1,2) is like calling the class. A is a class and we put two parenthesis in front of it and put some arguments between the parenthesis. So, its like "calling the class" similar to calling a method.
  3. __new__ must return the created object.
  4. Only when __new__ returns the created instance then __init__ gets called. If __new__ does not return an instance then __init__ would not be called. Remember __new__ is always called before __init__.
  5. __new__ gets passed all the arguments that we pass while calling the class. Also, it gets passed one extra argument that we will see soon.

How was the instance created in the last example when we didn't define __new__?

class A extends from object(Here we mean the class named object) i.e subclasses from object. object defines a method __new__, so A gets this method from object since its extending object. This inherited __new__ created the instance of A.

Method signature of __new__

__new__ receives the class whose instance need to be created as the first argument. This statement could be a little confusing, just continue reading and see the next example and again read it after seeing the example, it will be clear. The other arguments received by __new__ are same as what were passed while calling the class.
So, __new__ receives all the arguments that we pass while calling the class. Also, it receives one extra argument. This extra argument is the class whose instance need to be created and it will be received as first argument by __new__.
So, signature of __new__ could be written as:
__new__(cls, *args, **kwargs)
Let's see an example.
In [22]: class A(object):
   ....:     def __new__(cls, *args, **kwargs):
   ....:         print cls
   ....:         print "args is", args
   ....:         print "kwargs is", kwargs
   ....:
Here we override __new__ that we inherit from the superclass. We are printing all the arguments that this method receives so that we can check what gets passed to __new__. Let's try to create an instance of A by calling the class.
In [23]: a=A()
<class '__main__.A'>                              #output
args is ()                                        #output
kwargs is {}                                      #output
As we mentioned earlier, __new__ gets called when we call the class. As is apparent from the output __new__ was called and it printed three lines of output.
First line of output prints the first argument received by __new__. As we can see, it is class A itself. We tried to create an instance of A and __new__ of A received class A itself as the first argument. This is what we meant when we said "__new__ receives the class whose instance need to be created as the first argument". Now go back to the section "Method signature of __new__" and read it again.
While calling the class we did not pass any arguments. So our output shows that args and kwargs did not receive anything.
You can verify that all the arguments passed while calling the class gets sent to __new__. Just call the class passing it some arguments.
In [25]: a=A(1,2,named=5)
<class '__main__.A'>                              #output
args is (1, 2)                                    #output
kwargs is {'named': 5}                            #output
So, whatever arguments we passed while calling the class were passed to __new__ and were received by args and kwargs in __new__.
Let's check whether an object really gets created with how we have currently overridden __new__.
In [26]: a = A(1,2)
<class '__main__.A'>                              #output
args is (1, 2)                                    #output
kwargs is {}                                      #output
In [27]: print a
None                                              #output
We tried to create an instance and then tried printing the instance. But an instance of A was not created as apparent from the last print statement which printed None.

Why did this happen?

As we know if we don't return any value from a method, it implicitly returns None. Under the section "What is __new__", we mentioned that __new__ must return the created instance. Here we did not return the created instance from __new__, so None was implicitly returned and was assigned to a.
Let's combine __new__ and __init__.
In [29]: class A(object):
   ....:     def __new__(cls, *args, **kwargs):
   ....:         print cls
   ....:         print args
   ....:         print kwargs
   ....:     def __init__(self, a, b):
   ....:         print "init gets called"
   ....:         print "self is", self
   ....:         self.a, self.b = a, b
   ....:
Let's try to create an instance of A.
In [31]: a=A(1,2)
<class '__main__.A'>                              #output
(1, 2)                                            #output
{}                                                #output
As we mentioned earlier when a class gets called, first __new__ is called. Only when __new__ returns an instance then __init__ is called.
In our previous example __new__ did not return an instance. So __init__ was not called. Had __init__ been called we would have seen the print statements that we have inside __init__.
Also since __new__ did not return an instance, a will still be None. Verify that.
In [32]: print a
None                                              #output
Let's redefine the class to make it proper. We should return an instance from __new__, so that __init__ gets called and we get the desired behaviour.
In case we don't override __new__, __new__ of parent class creates the instance and then __init__ gets called. In case we are overriding __new__, we should call the __new__ of parent class to get the created instance. However if you know how object creation works at the low level and you can implement it in your overridden __new__, you don't need to call parent __new__ to get the created instance. I don't know such details of how object creation work and will use the parent __new__ to get the created instance.
Once we get the created instance we can perform any extra operations we wish before returning the instance from __new__ method.
For demonstration purpose, let us take a weird example where we need to add an attribute named 'created_at' to the created instance. For our case lets consider it needs to be done inside the __new__ method, althought we could have done it inside __init__.
In [33]: import datetime
In [35]: class A(object):
   ....:     def __new__(cls, *args, **kwargs):
   ....:         new_instance = object.__new__(cls, *args, **kwargs)
   ....:         setattr(new_instance, 'created_at', datetime.datetime.now())
   ....:         return new_instance
   ....:     def __init__(self, a, b):
   ....:         print "inside init"
   ....:         self.a, self.b = a, b
   ....:
In the first line of __new__, we called the __new__ of parent class to get the created instance. __new__ of parent class should be passed the same arguments that we received in the overridden __new__. __new__ of parent class i.e __new__ of classobject knows how to create an instance and it returns the created instance.
In the next line, we used inbuilt method setattr() to set an attribute 'created_at' on the newly created instance. The value we set for this attribute is the current time. This line is equivalent to writing new_instance.created_at=datetime.datetime.now().
In the final line we returned the newly created instance. Since we are returning an instance from __new__, __init__ will be called passing it whatever arguments were used in the class call. Let's verify this.
In [36]: obj1 = A(1,2)
inside init                                    #output
This statement suggests that __init__ was called. Let's print the created instance.
In [37]: obj1
Out[37]: <__main__.A at 0x3357390>
Notice that earlier when we were not returning anything from __new__ and were trying to print it, we were getting output as None. But this time the output shows that obj1 refers to an instance of A.
We can verify that obj1 has an attribute 'created_at' and __init__ was properly executed by printing the three attributes of obj1.
In [37]: print obj1.created_at
2012-06-09 22:44:30.376914                    #output
In [38]: print obj1.a, obj1.b
1 2                                           #output
Let's see our final example.
In [60]: class B(object):
   ....:     pass
   ....:
In [61]: class A(object):
   ....:     def __new__(cls, *args, **kwargs):
   ....:         new_instance = object.__new__(B, *args, **kwargs)
   ....:         return new_instance
   ....:
Pay attention to first line of A's new. Instead of passing cls as the first argument toobject.__new__, we pass class B as first argument. Let's see what happens in such case.
In [62]: a = A()
In [63]: print a
<__main__.B object at 0x7f912c036750>                    #Output. Tried creating an instance of A but got an instance of B
We tried to create an instance of A. But when we printed it, we realise that an instance of B has been created.
This happened because we passed class B as the first argument to object.__new__. This shows that whatever class we pass to superclass' __new__, an instance of that class will be created.
Remember __new__ receives the class whose instance need to be created as first argument. So for any __new__, the first argument (which is cls for our case) will always refer to the class inside which __new__ is defined. So, for our case, cls will be class A.
Here we wanted to create an instance of A. So, class A must be passed as first argument to object.__new__. Inside __new__ of class Acls refers to class A. So, we need to pass cls as first argument to object.__new__.
That's why if we want proper behaviour we need to pass the same arguments to the superclass' __new__ as it was received by the overridden __new__.
We can make that single line change in A's __new__ and our code will behave as expected.
In [64]: class A(object):
   ....:     def __new__(cls, *args, **kwargs):
   ....:         new_instance = object.__new__(cls, *args, **kwargs)
   ....:         return new_instance
   ....:
In [65]: a=A()
In [66]: print a
<__main__.A object at 0x7f912c0368d0>                    #Output. We got an instance of A
That was all about method __new__. Hopefully next post would be about metaclasses and there we can see some more useful uses of __new__.

Hope you liked the post.

Popular Posts