Python and Web Development Tutor: March 2013

redis (error) ERR operation not permitted

Posted by Jeffye | 3:54 PM

what cause the problem of redis (error) ERR operation not permitted?

It depends on what your client is.

If you are using the command line on the same machine. It is probably that you forget to input password. In your redis.conf file - is there a 'requirepass' field that is set (and not commented out with hash tags) - if so, you need to authenticate first like this:

AUTH Pwd (replace Pwd with actual password phrase)

$redis-cli redis 127.0.0.1:6379> AUTH foobared

Once authenticated, you can issue new commands

3 HTML text extractors in Python

Posted by Jeffye | 9:38 PM

python, tools

0 comments

python html to text

I'd like to introduce a few HTML text extractors written in Python.
I have tested all of them. Each one has its own advantages and disadvantages.

1. python-readability

This is a python port of a ruby port of arc90's readability project

http://lab.arc90.com/experiments/readability/

In few words,
Given a html document, it pulls out the main body text and cleans it up.
It also can clean up title based on latest readability.js code.

Based on:
 - Latest readability.js ( https://github.com/MHordecki/readability-redux/blob/master/readability/readability.js )
 - Ruby port by starrhorne and iterationlabs
 - Python port by gfxmonk ( https://github.com/gfxmonk/python-readability , based on BeautifulSoup )
 - Decruft effort to move to lxml ( http://www.minvolai.com/blog/decruft-arc90s-readability-in-python/ )
 - "BR to P" fix from readability.js which improves quality for smaller texts.
 - Github users contributions.

Download: https://github.com/buriy/python-readability

2. python-boilerpipe

A python wrapper for Boilerpipe, an excellent Java library for boilerplate removal and fulltext extraction from HTML pages.

Configuration

Dependencies: jpype, chardet

The boilerpipe jar files will get fetched and included automatically when building the package.

Usage

Be sure to have set JAVA_HOME properly since jpype depends on this setting.

The constructor takes a keyword argment extractor, being one of the available boilerpipe extractor types:

DefaultExtractor
ArticleExtractor
ArticleSentencesExtractor
KeepEverythingExtractor
KeepEverythingWithMinKWordsExtractor
LargestContentExtractor
NumWordsRulesExtractor
CanolaExtractor

If no extractor is passed the DefaultExtractor will be used by default. Additional keyword arguments are either html for HTML text or url.

from boilerpipe.extract import Extractor
extractor = Extractor(extractor='ArticleExtractor', url=your_url)

Then, to extract relevant content:

extracted_text = extractor.getText()

extracted_html = extractor.getHTML()

Download: https://github.com/misja/python-boilerpipe

3. python-goose

Goose was originally an article extractor written in Java that has most recently (aug2011) converted to a scala project by Gravity.com

This is a complete rewrite in python. The aim of the software is to take any news article or article type web page and not only extract what is the main body of the article but also all meta data and most probable image candidate.

Goose will try to extract the following information:

Main text of an article
Main image of article
Any Youtube/Vimeo movies embedded in article (TODO)
Meta Description
Meta tags

Originally, Goose was open sourced by Gravity.com in 2011

Lead Programmer: Jim Plush (Gravity.com)
Contributers: Robbie Coleman (Gravity.com)

The python version was rewrite by:

Xavier Grangier (Recrutae.com)

For more information, please go to its github homepage: https://github.com/xgdlm/python-goose

Auto correct algorithm in python

Posted by Jeffye | 8:10 AM

python, tools

0 comments

Today, I would like to introduce an open source spell checking package in python.

Actually, this is the second one introduced in this website. The first one: how-to-use-spell-checking-in-python

For the second one, you can find it on github: https://github.com/garytse89/Python-Exercises/tree/master/autoCorrect

Something more detail about this tool as follows:

Auto correct algorithm:

Author: Gary Tse Start Date: March 14, 2013

Input: user enters a word Output: prints out the autocorrected version of the word, if not corrected then it will print out "No suggestion" Sources: wordlist.txt, testcases.txt

main.py = the code using normal O(n) search, n being the amount of words in the dictionary file main2.py = verbose version of main.py autoCorrect.py = updated, hash table using search

all three use regular expressions for the majority of searches, and hash table lookup for the initial search re.findall is faster than O(n), shown in Search speed comparisons.png, but I'd say the bottleneck is when the input word is very complex and requires multiple iterations to fix

matchWord.py = old slow search matchWordBadImplementation.py = a failed idea at trying to take index numbers of all words starting with different alphabets in the dictionary list, which would make it O(n/26) but slow nonetheless newMatchWord.py = current working hash table lookup for search removeRepeats.py = test file for that function

Pseudocode: Setup-- 1. reads wordlist.txt into memory Program flow-- 1. reads words from stdin 2a. IF autocorrrection is found, print that word out 2b. ELSE print out "No suggestion"

What doesn't work: jjoobbb Proper nouns will print out as all lower case

Resources consulted:

Peter Norvig's spell checker: http://norvig.com/spell-correct.html Regex: https://developers.google.com/edu/python/regular-expressions http://stackoverflow.com/questions/10017808/best-data-structure-for-dictionary-implementation

considered using tries: http://www.billdimmick.com/devjournal/using-a-trie-in-python.html
considered implementing BK tree: http://code.activestate.com/recipes/572156-bk-tree/

Python -- MISC

Posted by Jeffye | 7:18 AM

python, tutorials

0 comments

A list of online materials that you can refer to.

Python.org

Jython.org: Python with a Java interpreter engine. This allows Python run on any machine which has a Java interpreter.

Sourceforge: Jython

ModPython.org - Apache module for Python

Web Template Frameworks:

EZT - Use Python EZT module to process HTML templates (.ezt files).
EZT syntax tutorial

Genshi - generate HTML or XML

ClearSilver - written in C to support Python, Ruby, Perl and Java as a module.

IDE:

PyDev - Eclipse plug-in

Netbeans for Python

Learn Python:

Python.org: Documentation - Tutorial

Text processing in Python

Tkinter: A Tk GUI for Python

Thinking in Tkinter

GUI programming with Tkinter

Tutorials

understanding python

Python Introduction

start ipython

ipython -p sh

Map function with multiple variable

def func1(a, b, c):
return a+b+c
map(lambda x: func1(*x), [[1,2,3],[4,5,6],[7,8,9]])

useful packages

psutil : providing an interface for retrieving information on all running processes and system utilization (CPU, disk, memory, network) in a portable way by using Python

imp
sys
os
re
Pexpect

Pexpect is a pure Python module that makes Python a better tool for controlling and automating other programs. Pexpect is similar to the Don Libes `Expect` system, but Pexpect as a different interface that is easier to understand. Pexpect is basically a pattern matching system. It runs programs and watches output. When output matches a given pattern Pexpect can respond as if a human were typing responses. Pexpect can be used for automation, testing, and screen scraping. Pexpect can be used for automating interactive console applications such as ssh, ftp, passwd, telnet, etc. It can also be used to control web applications via `lynx`, `w3m`, or some other text-based web browser. Pexpect is pure Python. Unlike other Expect-like modules for Python Pexpect does not require TCL or Expect nor does it require C extensions to be compiled. It should work on any platform that supports the standard Python pty module.

Pyro4 -- Pyro means PYthon Remote Objects. It is a library that enables you to build applications in which objects can talk to eachother over the network, with minimal programming effort.
TkInter Tkinter is Python's de-facto standard GUI (Graphical User Interface) package. It is a thin object-oriented layer on top of Tcl/Tk. Tkinter is not the only GuiProgramming toolkit for Python. It is however the most commonly used one. CameronLaird calls the yearly decision to keep TkInter "one of the minor traditions of the Python world."

http://wiki.woodpecker.org.cn/moin/%E9%A6%96%E9%A1%B5

Simplifying Python script arguments

import sys

print sys.argv

and you enter:

       python showargs.py a b c d e

run a program from within Python

execfile

subprocess.call

if you want to add augments. use the following:

subprocess.call(['./abc.py', arg1, arg2])

subprocess.call([sys.executable, 'abc.py', 'argument1', 'argument2'])

subprocess.Popen

The former can be done by importing the file you're interested in. execfile is similar to importing but it simply evaluates the file rather than creates a module out of it. Similar to "sourcing" in a shell script.

The latter can be done using the subprocess module. You spawn off another instance of the interpreter and pass whatever parameters you want to that. This is similar to shelling out in a shell script using backticks.

[edit]install python module with root previlege

mkdir -p ${HOME}/opt/lib/python2.4/site-packages/
echo "PYTHONPATH=\$PYTHONPATH:\${HOME}/opt/lib/python2.4/site-packages/" >> ~/.bashrc
echo "export PYTHONPATH" >> ~/.bashrc
echo "export PATH=\$PATH:\${HOME}/opt/bin" >> ~/.bashrc
source ~/.bashrc
easy_install --prefix=${HOME}/opt MySQL-python

How do you append directories to your Python path?

Your path (i.e. the list of directories Python goes through to search for modules and files) is stored in the path attribute of the sys module. Since path is a list, you can use the append method to add new directories to the path.

For instance, to add the directory /home/me/mypy to the path, just do:

import sys

sys.path.append("/home/me/mypy")

sys.path.insert(0 , "path") #such that python will search it first.

How did you install the wxPython bindings? By rpm?

As for once you know where the modules are located, you can stick something similar to the following example in $HOME/.bash_profile (or whatever the similar syntax is for your particular shell's startup scripts):

Code:

export PYTHONPATH=$PYTHONPATH:$HOME/lib/python:$HOME/lib/misc

What is init.py used for?

Files named __init__.py are used to mark directories on disk as a Python package directories. If you have the files

mydir/spam/__init__.py
mydir/spam/module.py

and mydir is on your path, you can import the code in module.py as:

import spam.module

from spam import module

If you remove the __init__.py file, Python will no longer look for submodules inside that directory, so attempts to import the module will fail.

The __init__.py file is usually empty, but can be used to export selected portions of the package under more convenient names, hold convenience functions, etc. Given the example above, the contents of the __init__ module can be accessed as

import spam

Python Regular Expressions

Formatting

%s Represents a value as a string
%i Integer
%d Decimal integer
%u Unsigned integer
%o Octal integer
%x/%X Hexadecimal integer
%e/%E Float exponent
%f/%F Floa
%C ASCII character

Fancier Output Formatting for official site

String % Dictionary

Monica = { 
                 "Occupation": "Chef",
                 "Name" : "Monica", 
                 "Dating" : "Chandler",
                 "Income" : 40000 
                  }

With %(Income)d, this is expressed as

"%(Name)s %(Income)d" % Monica 
'40000'

More: http://www.informit.com/articles/article.aspx?p=28790&seqNum=2

Tips on python Collections:

http://alexmarandon.com/articles/python_collections_tips/

Mixing Java and Python

Posted by Jeffye | 9:52 PM

python

0 comments

1. Invoking python script from Java

import java.io.*;

public class Foo
{
public static void main(String[] args)
{
try
{
Runtime r = Runtime.getRuntime();
Process p = r.exec("python foo.py");
BufferedReader br = new BufferedReader(new InputStreamReader(p.getInputStream()));
p.waitFor();
String line = "";
while (br.ready())
System.out.println(br.readLine());

}
catch (Exception e)
{
String cause = e.getMessage();
if (cause.equals("python: not found"))
System.out.println("No python interpreter found.");
}
}
}

The only problem with this method is that it requires the user to have a python interpreter installed on their system (and in the PATH).

A solution for this problem is to install jython as a jar file and now I can just invoke it by including it in my class path. Here is a sample file:

import org.python.util.PythonInterpreter;

public class Foo
{
public static void main (String[] args)
{
try
{
PythonInterpreter.initialize(System.getProperties(), System.getProperties(), new String[0]);
PythonInterpreter interp = new PythonInterpreter();
interp.execfile("foo.py");
}
catch (Exception e)
{
e.printStackTrace();
}
}
}

Note: Just make sure jython.jar is in your classpath.

Inter-process communication of Java and Python

2. Using Socket

3. Using jsonrpc

JSON-RPC + IDL = Barrister RPC
This is simple. Just define your interface in a human readable IDL.
Actually, this is the one I ended up with.

In my cases, I use python code in the http server (Flask), and java as the client.
The best thing is you even do not have to write any codes to call the python codes.
Barrister RPC does everything for you since it could generate all the client codes automatically according the contract defined in the IDL. In addition, you do not have to do the data format conversion yourself. That is tedious. Barrister also does this for you in its automatically generated codes. Really cool, isn't it?

other solutions can be found in the following presentation:

http://elib.dlr.de/59394/1/Mixing_Python_and_Java.pdf

Operating on Sequence Types

Posted by Jeffye | 7:59 PM

python, tutorials

0 comments

Operating on Sequence Types

We can iterate over the items in a sequence s in a variety of useful ways:

Table : Various ways to iterate over sequences

Python Expression	Comment
`for item in s`	iterate over the items of `s`
`for item in sorted(s)`	iterate over the items of `s` in order
`for item in set(s)`	iterate over unique elements of `s`
`for item in reversed(s)`	iterate over elements of `s` in reverse
`for item in set(s).difference(t)`	iterate over elements of `s` not in `t`
`for item in random.shuffle(s)`	iterate over elements of `s` in random order

How to use Spell checking in Python

Posted by Jeffye | 6:31 PM

python, tutorials

0 comments

What is spell checking?

In computing, a spell checker (or spell check) is an application program that flags words in a document that may not be spelled correctly. Spell checkers may be stand-alone, capable of operating on a block of text, or as part of a larger application, such as a word processor, email client, electronic dictionary, or search engine.

1. using the popen2 module in python to call the commandlines

http://code.activestate.com/recipes/117221/

http://blog.quibb.org/2009/04/spell-checking-in-python/

Spell4Py is a Wrapper for Hunspell library. Org.keyphrene has been

splited to several simple projects.

There is a tutorial for Hunspell here:

http://blog.keyphrene.com/keyphrene/index.php/Tutorialorgkeyphrene

2. using a pure python program.

http://norvig.com/spell-correct.html

This page provides a very simple pure python spell checking program.

You can train it with a dictionary or a textual corpus off-the-shelf with

an accuracy around 70%.

I also deploy an application online, click here as an example. You can also try other words for testing.

3. Google API

http://developer.51cto.com/art/201103/252396.htm

http://stackoverflow.com/questions/8428767/

how-to-implement-python-spell-checker-using-googles-did-you-mean

import httplibimport xml.dom.minidom

data = """
0" ignoredups="0" ignoredigits="1" ignoreallcaps="1">
 %s

"""
def spellCheck(word_to_spell):

con = httplib.HTTPSConnection("www.google.com")
con.request("POST", "/tbproxy/spell?lang=en", data % word_to_spell)
response = con.getresponse()

dom = xml.dom.minidom.parseString(response.read())
dom_data = dom.getElementsByTagName('spellresult')[0]

  if dom_data.childNodes:
  for child_node in dom_data.childNodes:
result = child_node.firstChild.data.split()
  for word in result:
  if word_to_spell.upper() == word.upper():
  return True;
  return False;
  else:
  return True;

redis (error) ERR operation not permitted

3 HTML text extractors in Python

python html to text

1. python-readability

2. python-boilerpipe

Configuration

Usage

3. python-goose

Auto correct algorithm in python

Python -- MISC

Tutorials

start ipython

Map function with multiple variable

useful packages

Simplifying Python script arguments

run a program from within Python

[edit]install python module with root previlege

How do you append directories to your Python path?

What is init.py used for?

Python Regular Expressions

Formatting

Tips on python Collections:

Mixing Java and Python

1. Invoking python script from Java

Inter-process communication of Java and Python

2. Using Socket

3. Using jsonrpc

Operating on Sequence Types

Operating on Sequence Types

How to use Spell checking in Python

What is spell checking?

1. using the popen2 module in python to call the commandlines

2. using a pure python program.

3. Google API

Popular Posts

Pages

Categories

My Blog List

Blog Archive

Total Pageviews

python html to text

1. python-readability

2. python-boilerpipe

Configuration

Usage

3. python-goose

Tutorials

start ipython

Map function with multiple variable

useful packages

Simplifying Python script arguments

run a program from within Python

[edit]install python module with root previlege

How do you append directories to your Python path?

What is __init__.py used for?

Formatting

Tips on python Collections:

1. Invoking python script from Java

Inter-process communication of Java and Python

2. Using Socket

3. Using jsonrpc

Operating on Sequence Types

What is spell checking?

1. using the popen2 module in python to call the commandlines

2. using a pure python program.

3. Google API

Popular Posts

Pages

Categories

My Blog List

Blog Archive

Total Pageviews

What is init.py used for?