what cause the problem of redis (error) ERR operation not permitted?

It depends on what your client is.

If you are using the command line on the same machine. It is probably that you forget to input password. In your redis.conf file - is there a 'requirepass' field that is set (and not commented out with hash tags) - if so, you need to authenticate first like this:

AUTH Pwd (replace Pwd with actual password phrase)

$redis-cli redis> AUTH foobared

Once authenticated, you can issue new commands

python html to text

I'd like to introduce a few HTML text extractors written in Python.
I have tested all of them. Each one has its own advantages and disadvantages.

1.  python-readability

This is a python port of a ruby port of arc90's readability project


In few words,
Given a html document, it pulls out the main body text and cleans it up.
It also can clean up title based on latest readability.js code.

Based on:
 - Latest readability.js ( https://github.com/MHordecki/readability-redux/blob/master/readability/readability.js )
 - Ruby port by starrhorne and iterationlabs
 - Python port by gfxmonk ( https://github.com/gfxmonk/python-readability , based on BeautifulSoup )
 - Decruft effort to move to lxml ( http://www.minvolai.com/blog/decruft-arc90s-readability-in-python/ )
 - "BR to P" fix from readability.js which improves quality for smaller texts.
 - Github users contributions.

Download: https://github.com/buriy/python-readability

2.  python-boilerpipe

A python wrapper for Boilerpipe, an excellent Java library for boilerplate removal and fulltext extraction from HTML pages.


Dependencies: jpype, chardet
The boilerpipe jar files will get fetched and included automatically when building the package.


Be sure to have set JAVA_HOME properly since jpype depends on this setting.
The constructor takes a keyword argment extractor, being one of the available boilerpipe extractor types:
  • DefaultExtractor
  • ArticleExtractor
  • ArticleSentencesExtractor
  • KeepEverythingExtractor
  • KeepEverythingWithMinKWordsExtractor
  • LargestContentExtractor
  • NumWordsRulesExtractor
  • CanolaExtractor
If no extractor is passed the DefaultExtractor will be used by default. Additional keyword arguments are either html for HTML text or url.
from boilerpipe.extract import Extractor
extractor = Extractor(extractor='ArticleExtractor', url=your_url)
Then, to extract relevant content:
extracted_text = extractor.getText()

extracted_html = extractor.getHTML()

Download: https://github.com/misja/python-boilerpipe

3. python-goose

Goose was originally an article extractor written in Java that has most recently (aug2011) converted to a scala project by Gravity.com
This is a complete rewrite in python. The aim of the software is to take any news article or article type web page and not only extract what is the main body of the article but also all meta data and most probable image candidate.
Goose will try to extract the following information:
  • Main text of an article
  • Main image of article
  • Any Youtube/Vimeo movies embedded in article (TODO)
  • Meta Description
  • Meta tags
Originally, Goose was open sourced by Gravity.com in 2011
  • Lead Programmer: Jim Plush (Gravity.com)
  • Contributers: Robbie Coleman (Gravity.com)
The python version was rewrite by:
  • Xavier Grangier (Recrutae.com)
For more information, please go to its github homepage: https://github.com/xgdlm/python-goose

Today, I would like to introduce an open source spell checking package in python.

Actually, this is the second one introduced in this website. The first one: how-to-use-spell-checking-in-python

For the second one, you can find it on github: https://github.com/garytse89/Python-Exercises/tree/master/autoCorrect

Something more detail about this tool as follows: 

Auto correct algorithm:
Author: Gary Tse Start Date: March 14, 2013
Input: user enters a word Output: prints out the autocorrected version of the word, if not corrected then it will print out "No suggestion" Sources: wordlist.txt, testcases.txt
main.py = the code using normal O(n) search, n being the amount of words in the dictionary file main2.py = verbose version of main.py autoCorrect.py = updated, hash table using search
  • all three use regular expressions for the majority of searches, and hash table lookup for the initial search re.findall is faster than O(n), shown in Search speed comparisons.png, but I'd say the bottleneck is when the input word is very complex and requires multiple iterations to fix
matchWord.py = old slow search matchWordBadImplementation.py = a failed idea at trying to take index numbers of all words starting with different alphabets in the dictionary list, which would make it O(n/26) but slow nonetheless newMatchWord.py = current working hash table lookup for search removeRepeats.py = test file for that function
Pseudocode: Setup-- 1. reads wordlist.txt into memory Program flow-- 1. reads words from stdin 2a. IF autocorrrection is found, print that word out 2b. ELSE print out "No suggestion"
What doesn't work: jjoobbb Proper nouns will print out as all lower case

Resources consulted:

Python -- MISC

Posted by Jeffery Yee | 7:18 AM

A list of online materials that you can refer to.

Learn Python:


start ipython

ipython -p sh

Map function with multiple variable

def func1(a, b, c):
        return a+b+c
map(lambda x: func1(*x), [[1,2,3],[4,5,6],[7,8,9]])

useful packages

  • psutil : providing an interface for retrieving information on all running processes and system utilization (CPU, disk, memory, network) in a portable way by using Python
  • imp
  • sys
  • os
  • re
  • Pexpect
Pexpect is a pure Python module that makes Python a better tool for controlling and automating other programs. Pexpect is similar to the Don Libes `Expect` system, but Pexpect as a different interface that is easier to understand. Pexpect is basically a pattern matching system. It runs programs and watches output. When output matches a given pattern Pexpect can respond as if a human were typing responses. Pexpect can be used for automation, testing, and screen scraping. Pexpect can be used for automating interactive console applications such as ssh, ftp, passwd, telnet, etc. It can also be used to control web applications via `lynx`, `w3m`, or some other text-based web browser. Pexpect is pure Python. Unlike other Expect-like modules for Python Pexpect does not require TCL or Expect nor does it require C extensions to be compiled. It should work on any platform that supports the standard Python pty module.
  • Pyro4 -- Pyro means PYthon Remote Objects. It is a library that enables you to build applications in which objects can talk to eachother over the network, with minimal programming effort.
  • TkInter  Tkinter is Python's de-facto standard GUI (Graphical User Interface) package. It is a thin object-oriented layer on top of Tcl/Tk. Tkinter is not the only GuiProgramming toolkit for Python. It is however the most commonly used one. CameronLaird calls the yearly decision to keep TkInter "one of the minor traditions of the Python world."

Simplifying Python script arguments

import sys
print sys.argv
and you enter:
       python showargs.py a b c d e

run a program from within Python

  • execfile
  • subprocess.call
if you want to add augments. use the following:
subprocess.call(['./abc.py', arg1, arg2])
subprocess.call([sys.executable, 'abc.py', 'argument1', 'argument2'])
  • subprocess.Popen
The former can be done by importing the file you're interested in. execfile is similar to importing but it simply evaluates the file rather than creates a module out of it. Similar to "sourcing" in a shell script.
The latter can be done using the subprocess module. You spawn off another instance of the interpreter and pass whatever parameters you want to that. This is similar to shelling out in a shell script using backticks.

[edit]install python module with root previlege

mkdir -p ${HOME}/opt/lib/python2.4/site-packages/
echo "PYTHONPATH=\$PYTHONPATH:\${HOME}/opt/lib/python2.4/site-packages/" >> ~/.bashrc
echo "export PYTHONPATH" >> ~/.bashrc
echo "export PATH=\$PATH:\${HOME}/opt/bin" >> ~/.bashrc
source ~/.bashrc
easy_install --prefix=${HOME}/opt MySQL-python

How do you append directories to your Python path?

     Your path (i.e. the list of directories Python goes through to search for modules and files) is stored in the path attribute of the sys module. Since path is a list, you can use the append method to add new directories to the path.
    For instance, to add the directory /home/me/mypy to the path, just do:
    import sys 
               sys.path.insert(0 , "path") #such that python will search it first. 

    How did you install the wxPython bindings? By rpm?

    As for once you know where the modules are located, you can stick something similar to the following example in $HOME/.bash_profile (or whatever the similar syntax is for your particular shell's startup scripts):

    export PYTHONPATH=$PYTHONPATH:$HOME/lib/python:$HOME/lib/misc

    What is __init__.py used for?

      Files named __init__.py are used to mark directories on disk as a Python package directories. If you have the files
      and mydir is on your path, you can import the code in module.py as:
      import spam.module
      from spam import module
      If you remove the __init__.py file, Python will no longer look for submodules inside that directory, so attempts to import the module will fail.
      The __init__.py file is usually empty, but can be used to export selected portions of the package under more convenient names, hold convenience functions, etc. Given the example above, the contents of the __init__ module can be accessed as
        import spam 

      Python Regular Expressions


      %s Represents a value as a string
      %i Integer 
      %d Decimal integer 
      %u Unsigned integer
      %o Octal integer
      %x/%X Hexadecimal integer 
       %e/%E Float exponent
      %f/%F Floa
      %C ASCII character 

      Fancier Output Formatting for official site

      String % Dictionary
      Monica = { 
                       "Occupation": "Chef",
                       "Name" : "Monica", 
                       "Dating" : "Chandler",
                       "Income" : 40000 
      With %(Income)d, this is expressed as
      "%(Name)s %(Income)d" % Monica 
      More: http://www.informit.com/articles/article.aspx?p=28790&seqNum=2

      Tips on python Collections: 


      Mixing Java and Python

      Posted by Jeffery Yee | 9:52 PM

      1. Invoking python script from Java 

      import java.io.*;

      public class Foo

          public static void main(String[] args)
                  Runtime r = Runtime.getRuntime();
                  Process p = r.exec("python foo.py");
                  BufferedReader br = new BufferedReader(new InputStreamReader(p.getInputStream()));
                  String line = "";
                  while (br.ready())


              catch (Exception e)
      String cause = e.getMessage();
      if (cause.equals("python: not found"))
      System.out.println("No python interpreter found.");

      The only problem with this method is that it requires the user to have a python interpreter installed on their system (and in the PATH). 

      A solution for this problem is to install jython as a jar file and now I can just invoke it by including it in my class path. Here is a sample file: 

      import org.python.util.PythonInterpreter;

      public class Foo

      public static void main (String[] args)
      PythonInterpreter.initialize(System.getProperties(), System.getProperties(), new String[0]);
      PythonInterpreter interp = new PythonInterpreter();
      catch (Exception e)

      Note: Just make sure jython.jar is in your classpath. 

      Inter-process communication of Java and Python

      2. Using Socket 

      3. Using jsonrpc

      JSON-RPC + IDL = Barrister RPC 
      This is simple. Just define your interface in a human readable IDL. 
      Actually, this is the one I ended up with. 

      In my cases, I use python code in the http server (Flask), and java as the client. 

      The best thing is you even do not have to write any codes to call the python codes. 
      Barrister RPC does everything for you since it could generate all the client codes automatically according the contract defined in the IDL. In addition, you do not have to do the data format conversion yourself. That is tedious. Barrister also does this for you in its automatically generated codes. Really cool, isn't it? 

      other solutions can be found in the following presentation: 


      Operating on Sequence Types

      Posted by Jeffery Yee | 7:59 PM

      Operating on Sequence Types

      We can iterate over the items in a sequence s in a variety of useful ways: 

      Table : Various ways to iterate over sequences
      Python ExpressionComment
      for item in siterate over the items of s
      for item in sorted(s)iterate over the items of s in order
      for item in set(s)iterate over unique elements of s
      for item in reversed(s)iterate over elements of s in reverse
      for item in set(s).difference(t)iterate over elements of s not in t
      for item in random.shuffle(s)iterate over elements of s in random order

      What is spell checking? 

      In computing, a spell checker (or spell check) is an application program that flags words in a document that may not be spelled correctly. Spell checkers may be stand-alone, capable of operating on a block of text, or as part of a larger application, such as a word processor, email client, electronic dictionary, or search engine.

      1. using the popen2 module in python to call the commandlines

      Spell4Py is a Wrapper for Hunspell library. Org.keyphrene has been 
      splited to several simple projects.
      There is a tutorial for Hunspell here: 

      2. using a pure python program.

      This page provides a very simple pure python spell checking program.
      You can train it with a dictionary or a textual corpus off-the-shelf with
      an accuracy around 70%. 
      I also deploy an application online, click here as an example.  You can also try other words for testing. 

      3. Google API

      import httplibimport xml.dom.minidom
      data = """
      0" ignoredups="0" ignoredigits="1" ignoreallcaps="1">

      def spellCheck(word_to_spell):

      = httplib.HTTPSConnection("www.google.com")
      .request("POST", "/tbproxy/spell?lang=en", data % word_to_spell)
      = con.getresponse()

      = xml.dom.minidom.parseString(response.read())
      = dom.getElementsByTagName('spellresult')[0]

      if dom_data.childNodes:
      for child_node in dom_data.childNodes:
      = child_node.firstChild.data.split()
      for word in result:
      if word_to_spell.upper() == word.upper():
      return True;
      return False;
      return True;

      Popular Posts