Showing posts with label tutorials. Show all posts
Showing posts with label tutorials. Show all posts

This is an interesting figure that shows the entire process of bash initialization.


Python -- MISC

Posted by Jeffye | 7:18 AM

A list of online materials that you can refer to.


Learn Python:

Tutorials


start ipython

ipython -p sh

Map function with multiple variable


def func1(a, b, c):
        return a+b+c
map(lambda x: func1(*x), [[1,2,3],[4,5,6],[7,8,9]])


useful packages

  • psutil : providing an interface for retrieving information on all running processes and system utilization (CPU, disk, memory, network) in a portable way by using Python
  • imp
  • sys
  • os
  • re
  • Pexpect
Pexpect is a pure Python module that makes Python a better tool for controlling and automating other programs. Pexpect is similar to the Don Libes `Expect` system, but Pexpect as a different interface that is easier to understand. Pexpect is basically a pattern matching system. It runs programs and watches output. When output matches a given pattern Pexpect can respond as if a human were typing responses. Pexpect can be used for automation, testing, and screen scraping. Pexpect can be used for automating interactive console applications such as ssh, ftp, passwd, telnet, etc. It can also be used to control web applications via `lynx`, `w3m`, or some other text-based web browser. Pexpect is pure Python. Unlike other Expect-like modules for Python Pexpect does not require TCL or Expect nor does it require C extensions to be compiled. It should work on any platform that supports the standard Python pty module.
  • Pyro4 -- Pyro means PYthon Remote Objects. It is a library that enables you to build applications in which objects can talk to eachother over the network, with minimal programming effort.
  • TkInter  Tkinter is Python's de-facto standard GUI (Graphical User Interface) package. It is a thin object-oriented layer on top of Tcl/Tk. Tkinter is not the only GuiProgramming toolkit for Python. It is however the most commonly used one. CameronLaird calls the yearly decision to keep TkInter "one of the minor traditions of the Python world."

Simplifying Python script arguments

import sys
print sys.argv
and you enter:
       python showargs.py a b c d e


run a program from within Python

  • execfile
  • subprocess.call
if you want to add augments. use the following:
subprocess.call(['./abc.py', arg1, arg2])
subprocess.call([sys.executable, 'abc.py', 'argument1', 'argument2'])
  • subprocess.Popen
The former can be done by importing the file you're interested in. execfile is similar to importing but it simply evaluates the file rather than creates a module out of it. Similar to "sourcing" in a shell script.
The latter can be done using the subprocess module. You spawn off another instance of the interpreter and pass whatever parameters you want to that. This is similar to shelling out in a shell script using backticks.

[edit]install python module with root previlege

mkdir -p ${HOME}/opt/lib/python2.4/site-packages/
echo "PYTHONPATH=\$PYTHONPATH:\${HOME}/opt/lib/python2.4/site-packages/" >> ~/.bashrc
echo "export PYTHONPATH" >> ~/.bashrc
echo "export PATH=\$PATH:\${HOME}/opt/bin" >> ~/.bashrc
source ~/.bashrc
easy_install --prefix=${HOME}/opt MySQL-python

How do you append directories to your Python path?

     Your path (i.e. the list of directories Python goes through to search for modules and files) is stored in the path attribute of the sys module. Since path is a list, you can use the append method to add new directories to the path.
    For instance, to add the directory /home/me/mypy to the path, just do:
    import sys 
    sys.path.append("/home/me/mypy") 
               sys.path.insert(0 , "path") #such that python will search it first. 

    How did you install the wxPython bindings? By rpm?

    As for once you know where the modules are located, you can stick something similar to the following example in $HOME/.bash_profile (or whatever the similar syntax is for your particular shell's startup scripts):

    Code:
    export PYTHONPATH=$PYTHONPATH:$HOME/lib/python:$HOME/lib/misc

    What is __init__.py used for?

      Files named __init__.py are used to mark directories on disk as a Python package directories. If you have the files
      mydir/spam/__init__.py
      mydir/spam/module.py
      and mydir is on your path, you can import the code in module.py as:
      import spam.module
      or
      from spam import module
      If you remove the __init__.py file, Python will no longer look for submodules inside that directory, so attempts to import the module will fail.
      The __init__.py file is usually empty, but can be used to export selected portions of the package under more convenient names, hold convenience functions, etc. Given the example above, the contents of the __init__ module can be accessed as
        import spam 

      Python Regular Expressions

      Formatting

      %s Represents a value as a string
      %i Integer 
      %d Decimal integer 
      %u Unsigned integer
      %o Octal integer
      %x/%X Hexadecimal integer 
       %e/%E Float exponent
      %f/%F Floa
      %C ASCII character 

      Fancier Output Formatting for official site

      String % Dictionary
      Monica = { 
                       "Occupation": "Chef",
                       "Name" : "Monica", 
                       "Dating" : "Chandler",
                       "Income" : 40000 
                        } 
      With %(Income)d, this is expressed as
      "%(Name)s %(Income)d" % Monica 
      '40000'
      More: http://www.informit.com/articles/article.aspx?p=28790&seqNum=2

      Tips on python Collections: 

      http://alexmarandon.com/articles/python_collections_tips/


      Operating on Sequence Types

      We can iterate over the items in a sequence s in a variety of useful ways: 

      Table : Various ways to iterate over sequences
      Python ExpressionComment
      for item in siterate over the items of s
      for item in sorted(s)iterate over the items of s in order
      for item in set(s)iterate over unique elements of s
      for item in reversed(s)iterate over elements of s in reverse
      for item in set(s).difference(t)iterate over elements of s not in t
      for item in random.shuffle(s)iterate over elements of s in random order

      What is spell checking? 

      In computing, a spell checker (or spell check) is an application program that flags words in a document that may not be spelled correctly. Spell checkers may be stand-alone, capable of operating on a block of text, or as part of a larger application, such as a word processor, email client, electronic dictionary, or search engine.

      1. using the popen2 module in python to call the commandlines



      Spell4Py is a Wrapper for Hunspell library. Org.keyphrene has been 
      splited to several simple projects.
      There is a tutorial for Hunspell here: 

      2. using a pure python program.

      This page provides a very simple pure python spell checking program.
      You can train it with a dictionary or a textual corpus off-the-shelf with
      an accuracy around 70%. 
      I also deploy an application online, click here as an example.  You can also try other words for testing. 

      3. Google API




      import httplibimport xml.dom.minidom
      
      data = """
      0" ignoredups="0" ignoredigits="1" ignoreallcaps="1">
       %s 

      """
      def spellCheck(word_to_spell):

          con 
      = httplib.HTTPSConnection("www.google.com")
          con
      .request("POST", "/tbproxy/spell?lang=en", data % word_to_spell)
          response 
      = con.getresponse()

          dom 
      = xml.dom.minidom.parseString(response.read())
          dom_data 
      = dom.getElementsByTagName('spellresult')[0]

          
      if dom_data.childNodes:
              
      for child_node in dom_data.childNodes:
                  result 
      = child_node.firstChild.data.split()
              
      for word in result:
                  
      if word_to_spell.upper() == word.upper():
                      
      return True;
              
      return False;
          
      else:
              
      return True;



      In my opinion the Python community is split into 3 groups. There's the Python 2.x group, the 3.x group, and the PyPy group. The schism basically boils down to library compatibility issues and speed. This post is going to focus on some general code optimization tricks as well as breaking out into C to for significant performance improvements. I'll also show the run times of the 3 major python groups. My goal isn't to prove one better than the other, just to give you an idea of how these particular examples compare with each other under different circumstances.

      Using Generators

      One commonly overlooked memory optimization is the use of generators. Generators allow us to create a function that returns one item at a time rather than all the items at once. If you're using Python 2.x this is the reason for using xrange instead of range or ifilter instead of filter. A great example of this is creating a large list of numbers and adding them together.
      1. import timeit
      2. import random
      3.  
      4. def generate(num):
      5. while num:
      6. yield random.randrange(10)
      7. num -= 1
      8.  
      9. def create_list(num):
      10. numbers = []
      11. while num:
      12. numbers.append(random.randrange(10))
      13. num -= 1
      14. return numbers
      15. print(timeit.timeit("sum(generate(999))", setup="from __main__ import generate", number=1000))
      16. >>> 0.88098192215 #Python 2.7
      17. >>> 1.416813850402832 #Python 3.2
      18. print(timeit.timeit("sum(create_list(999))", setup="from __main__ import create_list", number=1000))
      19. >>> 0.924163103104 #Python 2.7
      20. >>> 1.5026731491088867 #Python 3.2
      Not only is it slightly faster but you also avoid storing the entire list in memory!

      Introducing Ctypes

      For performance critical code Python natively provides us with an API to call C functions. This is done through ctypes. You can actually take advantage of ctypes without writing any C code of your own. By default Python comes with the standard c library precompiled for you. We can go back to our generator example to see just how much more ctypes will speed up our code.
      1. import timeit
      2. from ctypes import cdll
      3.  
      4. def generate_c(num):
      5. #Load standard C library
      6. libc = cdll.LoadLibrary("libc.so.6") #Linux
      7. #libc = cdll.msvcrt #Windows
      8. while num:
      9. yield libc.rand() % 10
      10. num -= 1
      11.  
      12. print(timeit.timeit("sum(generate_c(999))", setup="from __main__ import generate_c", number=1000))
      13. >>> 0.434374809265 #Python 2.7
      14. >>> 0.7084300518035889 #Python 3.2
      Just by switching to the C random function we cut our run time in half! Now what if I told you we could do better?

      Introducing Cython

      Cython is a superset of Python that allows for the calling of C functions as well as declaring types on variables to increase performance. To try this out we'll need to install Cython.
      sudo pip install cython
      Cython is essentially a fork of another similar library called Pyrex which is no longer under development. It will compile our Python-like code into a C library that we can call from within a Python file. Use .pyx instead of .py for your python files. Let's see how Cython works with our generator code.
      1. #cython_generator.pyx
      2. import random
      3.  
      4. def generate(num):
      5. while num:
      6. yield random.randrange(10)
      7. num -= 1
      We also need to create a setup.py so that we can get Cython to compile our function.
      1. from distutils.core import setup
      2. from distutils.extension import Extension
      3. from Cython.Distutils import build_ext
      4.  
      5. setup(
      6. cmdclass = {'build_ext': build_ext},
      7. ext_modules = [Extension("generator", ["cython_generator.pyx"])]
      8. )
      Compile using:
      python setup.py build_ext --inplace
      You should now see a cython_generator.c file and a generator.so file. We can test our program by doing:
      1. import timeit
      2. print(timeit.timeit("sum(generator.generate(999))", setup="import generator", number=1000))
      3. >>> 0.835658073425
      Not too bad but let's see if we can improve on this. We can start by stating that our "num" variable is an int. Then we can import the C standard library to take care of our random function.
      1. #cython_generator.pyx
      2. cdef extern from "stdlib.h":
      3. int c_libc_rand "rand"()
      4.  
      5. def generate(int num):
      6. while num:
      7. yield c_libc_rand() % 10
      8. num -= 1
      If we compile and run again we now see a really awesome number.
      >>> 0.033586025238
      Not bad at all for making just a few changes. However, sometimes these changes can be a bit tedious. So let's see how we can do with just regular ole Python.

      Introducing PyPy

      PyPy is a just-in-time compiler for Python 2.7.3 which in layman's terms means that it makes your code run really fast (usually). Quora runs PyPy in production. PyPy has some installation instructions on their download page but if you're running Ubuntu you can just install it through apt-get. It also runs out of the box so there are no crazy bash or make files to run, just download and run. Let's see how our original generator code performs under PyPy.
      1. import timeit
      2. import random
      3.  
      4. def generate(num):
      5. while num:
      6. yield random.randrange(10)
      7. num -= 1
      8.  
      9. def create_list(num):
      10. numbers = []
      11. while num:
      12. numbers.append(random.randrange(10))
      13. num -= 1
      14. return numbers
      15. print(timeit.timeit("sum(generate(999))", setup="from __main__ import generate", number=1000))
      16. >>> 0.115154981613 #PyPy 1.9
      17. >>> 0.118431091309 #PyPy 2.0b1
      18. print(timeit.timeit("sum(create_list(999))", setup="from __main__ import create_list", number=1000))
      19. >>> 0.140175104141 #PyPy 1.9
      20. >>> 0.140514850616 #PyPy 2.0b1
      Wow! Without touching the code it is now running at an 8th of the speed as the pure python implementation.

      Further Examination

      Why bother examining futher? PyPy is king! Well not quite. While most programs will run on PyPy there are still some libraries that aren't fully supported. It may also be easier to pitch a C extension for your project rather than switching compilers. Let's dive further into ctypes to see how we can create our own C libraries to talk to Python. We're going to examine the performance gains from a merge sort as well as a calculation from a Fibonacci sequence. Here is the C code (functions.c) that we will be using.
      1. /* functions.c */
      2. #include "stdio.h"
      3. #include "stdlib.h"
      4. #include "string.h"
      5.  
      6. /* http://rosettacode.org/wiki/Sorting_algorithms/Merge_sort#C */
      7. inline
      8. void merge(int *left, int l_len, int *right, int r_len, int *out)
      9. {
      10. int i, j, k;
      11. for (i = j = k = 0; i < l_len && j < r_len; )
      12. out[k++] = left[i] < right[j] ? left[i++] : right[j++];
      13. while (i < l_len) out[k++] = left[i++];
      14. while (j < r_len) out[k++] = right[j++];
      15. }
      16. /* inner recursion of merge sort */
      17. void recur(int *buf, int *tmp, int len)
      18. {
      19. int l = len / 2;
      20. if (len <= 1) return;
      21. /* note that buf and tmp are swapped */
      22. recur(tmp, buf, l);
      23. recur(tmp + l, buf + l, len - l);
      24. merge(tmp, l, tmp + l, len - l, buf);
      25. }
      26. /* preparation work before recursion */
      27. void merge_sort(int *buf, int len)
      28. {
      29. /* call alloc, copy and free only once */
      30. int *tmp = malloc(sizeof(int) * len);
      31. memcpy(tmp, buf, sizeof(int) * len);
      32. recur(buf, tmp, len);
      33. free(tmp);
      34. }
      35.  
      36. int fibRec(int n){
      37. if(n < 2)
      38. return n;
      39. else
      40. return fibRec(n-1) + fibRec(n-2);
      41. }
      On Linux we can compile this to a shared library that Python can access by doing:
      gcc -Wall -fPIC -c functions.c
      gcc -shared -o libfunctions.so functions.o
      Using ctypes we can now access the functions by loading the "libfunctions.so" library like we did for the standard C library earlier. Here we can compare a native Python implementation vs. one done in C. Let's start with the Fibonacci sequence calculation.
      1. #functions.py
      2. from ctypes import *
      3. import time
      4.  
      5. libfunctions = cdll.LoadLibrary("./libfunctions.so")
      6.  
      7. def fibRec(n):
      8. if n < 2:
      9. return n
      10. else:
      11. return fibRec(n-1) + fibRec(n-2)
      12.  
      13. start = time.time()
      14. fibRec(32)
      15. finish = time.time()
      16. print("Python: " + str(finish - start))
      17.  
      18. #C Fibonacci
      19. start = time.time()
      20. x = libfunctions.fibRec(32)
      21. finish = time.time()
      22. print("C: " + str(finish - start))
      Python: 1.18783187866 #Python 2.7
      Python: 1.272292137145996 #Python 3.2
      Python: 0.563600063324 #PyPy 1.9
      Python: 0.567229032516 #PyPy 2.0b1
      C: 0.043830871582 #Python 2.7 + ctypes
      C: 0.04574108123779297 #Python 3.2 + ctypes
      C: 0.0481240749359 #PyPy 1.9 + ctypes
      C: 0.046403169632 #PyPy 2.0b1 + ctypes
      As expected C is the fastest followed by PyPy and Python. We can also do the same kind of comparison with a merge sort.
      We haven't really dug too deep into ctypes yet so this example will show off some of the cool features. Ctypes have a few standard types such as ints, char arrays, floats, bytes, etc. One thing they don't have by default is integer arrays. However, by multiplying a c_int (ctype type for int) by a number we can create an array of size number. This is what line 7 below is doing. We're creating a c_int array the size of our numbers array and unpacking the numbers array into the c_int array.
      It's important to remember that in C you can't return an array, nor would you really want to. Instead we pass around pointers for functions to modify. In order to pass our c_numbers array to our merge_sort function we have to pass by reference. After the merge_sort runs our c_numbers array will be sorted. I've appended the below code to my functions.py file since we already have our imports setup there.
      1. #Python Merge Sort
      2. from random import shuffle, sample
      3.  
      4. #Generate 9999 random numbers between 0 and 100000
      5. numbers = sample(range(100000), 9999)
      6. shuffle(numbers)
      7. c_numbers = (c_int * len(numbers))(*numbers)
      8.  
      9. from heapq import merge
      10. def merge_sort(m):
      11. if len(m) <= 1:
      12. return m
      13. middle = len(m) // 2
      14. left = m[:middle]
      15. right = m[middle:]
      16. left = merge_sort(left)
      17. right = merge_sort(right)
      18. return list(merge(left, right))
      19.  
      20. start = time.time()
      21. numbers = merge_sort(numbers)
      22. finish = time.time()
      23. print("Python: " + str(finish - start))
      24.  
      25. #C Merge Sort
      26. start = time.time()
      27. libfunctions.merge_sort(byref(c_numbers), len(numbers))
      28. finish = time.time()
      29. print("C: " + str(finish - start))
      Python: 0.190635919571 #Python 2.7
      Python: 0.11785483360290527 #Python 3.2
      Python: 0.266992092133 #PyPy 1.9
      Python: 0.265724897385 #PyPy 2.0b1
      C: 0.00201296806335 #Python 2.7 + ctypes
      C: 0.0019741058349609375 #Python 3.2 + ctypes
      C: 0.0029308795929 #PyPy 1.9 + ctypes
      C: 0.00287103652954 #PyPy 2.0b1 + ctypes
      Here is a chart and table comparing the various results.
      Bar chart comparing the various program run times
      Merge SortFibonacci
      Python 2.70.1911.187
      Python 2.7 + ctypes0.0020.044
      Python 3.20.1181.272
      Python 3.2 + ctypes0.0020.046
      PyPy 1.90.2670.564
      PyPy 1.9 + ctypes0.0030.048
      PyPy 2.0b10.2660.567
      PyPy 2.0b1 + ctypes0.0030.046
      Hopefully you found this post informative and a good stepping stone into optimizing your Python code with C and PyPy. As always if you have any feedback or questions feel free to drop them in the comments below or contact me privately on my contact page. Thanks for reading!
      P.S. If your company is looking to hire an awesome soon-to-be college graduate (May 2013) let me know!

      This post is originally from: http://maxburstein.com/blog/speeding-up-your-python-code/ by Max Burstein



      Popular Posts