In this post, I'd like to share a few awesome machine-learning toolkits developed in python.

I tested a lot similar tools and ended up with the following ones:

--------------------------------------------------------

Gensim – Topic Modelling for Humans

Gensim is a

*free*Python framework designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible.Some other packages from Blei's website. online lda, tmve(online) --Topic Model Visualization Engine,

## PyML – machine learning in Python

PyML
is an interactive object oriented framework for machine learning
written in Python. PyML focuses on SVMs and other kernel methods. It is
supported on Linux and Mac OS X.

-----------------------------------------------------------------------------------------------

### mlpy

**mlpy**is a Python module for

**Machine Learning**built on top of NumPy/SciPy and the GNU Scientific Libraries.

mlpy provides a wide range of state-of-the-art machine learning methods for

**supervised**and**unsupervised**problems and it is aimed at finding a reasonable compromise among modularity, maintainability, reproducibility, usability and efficiency. mlpy is**multiplatform**, it works with**Python 2**and**3**and it is**Open Source**, distributed under the GNU General Public License version 3.## scikit-learn: machine learning in Python

`scikit-learn`is a Python module integrating classic machine learning algorithms in the tightly-knit scientific

Python world (numpy, scipy, matplotlib). It aims to provide simple and efficient solutions to learning problems,

accessible to everybody and reusable in various contexts:

**machine-learning as a versatile tool for science and engineering**.

# NLTK Package

The
Natural Language Toolkit (NLTK) is an open source Python library for
Natural Language Processing. A free online book is available. (If you
use the library for academic research, please cite the book.)

Steven Bird, Ewan Klein, and Edward Loper (2009). Natural Language Processing with Python. O’Reilly Media Inc. http://nltk.org/book

## Theano

Theano
is a Python library that allows you to define, optimize, and evaluate
mathematical expressions involving multi-dimensional arrays efficiently.
Theano features:

**tight integration with numpy**– Use numpy.ndarray in Theano-compiled functions.**transparent use of a GPU**– Perform data-intensive calculations up to 140x faster than with CPU.(float32 only)**efficient symbolic differentiation**– Theano does your derivatives for function with one or many inputs.**speed and stability optimizations**– Get the right answer for`log(1+x)`even when`x`is really tiny.**dynamic C code generation**– Evaluate expressions faster.**extensive unit-testing and self-verification**– Detect and diagnose many types of mistake.

Check out how Theano can be used for Machine Learning: Deep Learning Tutorials and

the source code here

the source code here

# sluggerml

SluggerML- baseball stats!Using datasets from retrosheet.org and seanlahman.com/baseball-archive/statistics.

Depends on scikit-learn and python-nltk (and reportlab if you're building reports).On Debian,

requires the following to build the aforementioned:python-devpython-numpypython-scipy

##
**MDP**

**The Modular toolkit for Data Processing (MDP)**package is a library of widely used data processing algorithms, and the possibility to combine them together to form pipelines for building more complex data processing software.

From the user’s perspective, MDP consists of a collection of

*units*, which process data. For example, these include algorithms for supervised and unsupervised learning, principal and independent components analysis and classification.
From
the developer’s perspective, MDP is a framework that makes the
implementation of new supervised and unsupervised learning algorithms
easy and straightforward. The basic class,

`Node`, takes care of**tedious**tasks like numerical type and dimensionality checking, leaving the developer free to concentrate on the implementation of the learning and execution phases.# Pattern

Pattern is a web mining module for the Python programming language. It bundles tools for:

- Data Mining: Google + Twitter + Wikipedia API, web spider, HTML DOM parser
- Natural Language Processing: tagger/chunker, n-gram search, sentiment analysis, WordNet
- Machine Learning: vector space model,
*k*-means clustering, Naive Bayes +*k*-NN + SVM classiﬁers - Network Analysis: graph centrality and visualization.

It
is well documented and bundled with 30+ examples and 350+ unit tests.
The source code is licensed under BSD and available from

0 commentsPost a Comment