In this post, I'd like to share a few awesome machine-learning toolkits developed in python.
I tested a lot similar tools and ended up with the following ones:
--------------------------------------------------------

Gensim – Topic Modelling for Humans

Gensim is a free Python framework designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible.

Some other packages from Blei's website. online ldatmve(online) --Topic Model Visualization Engine, 

PyML – machine learning in Python

PyML is an interactive object oriented framework for machine learning written in Python. PyML focuses on SVMs and other kernel methods. It is supported on Linux and Mac OS X.
-----------------------------------------------------------------------------------------------

mlpy

mlpy is a Python module for Machine Learning built on top of NumPy/SciPy and the GNU Scientific Libraries.
mlpy provides a wide range of state-of-the-art machine learning methods for supervisedand unsupervised problems and it is aimed at finding a reasonable compromise among modularity, maintainability, reproducibility, usability and efficiency. mlpy is multiplatform, it works with Python 2 and 3 and it is Open Source, distributed under the GNU General Public License version 3.

scikit-learn: machine learning in Python

scikit-learn is a Python module integrating classic machine learning algorithms in the tightly-knit scientific
Python world (numpyscipymatplotlib). It aims to provide simple and efficient solutions to learning problems,
accessible to everybody and reusable in various contexts:
machine-learning as a versatile tool for science and engineering.

NLTK Package

The Natural Language Toolkit (NLTK) is an open source Python library for Natural Language Processing. A free online book is available. (If you use the library for academic research, please cite the book.)
Steven Bird, Ewan Klein, and Edward Loper (2009). Natural Language Processing with Python. O’Reilly Media Inc. http://nltk.org/book

Theano

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Theano features:
  • tight integration with numpy – Use numpy.ndarray in Theano-compiled functions.
  • transparent use of a GPU – Perform data-intensive calculations up to 140x faster than with CPU.(float32 only)
  • efficient symbolic differentiation – Theano does your derivatives for function with one or many inputs.
  • speed and stability optimizations – Get the right answer for log(1+x) even when x is really tiny.
  • dynamic C code generation – Evaluate expressions faster.
  • extensive unit-testing and self-verification – Detect and diagnose many types of mistake.
Check out how Theano can be used for Machine Learning: Deep Learning Tutorials and
the source code here

sluggerml

SluggerML- baseball stats!Using datasets from retrosheet.org and seanlahman.com/baseball-archive/statistics.
Depends on scikit-learn and python-nltk (and reportlab if you're building reports).On Debian, 
requires the following to build the aforementioned:python-devpython-numpypython-scipy


MDP

The Modular toolkit for Data Processing (MDP) package is a library of widely used data processing algorithms, and the possibility to combine them together to form pipelines for building more complex data processing software.

From the user’s perspective, MDP consists of a collection of units, which process data. For example, these include algorithms for supervised and unsupervised learning, principal and independent components analysis and classification.

From the developer’s perspective, MDP is a framework that makes the implementation of new supervised and unsupervised learning algorithms easy and straightforward. The basic class, Node, takes care of tedious tasks like numerical type and dimensionality checking, leaving the developer free to concentrate on the implementation of the learning and execution phases.

Pattern

Pattern is a web mining module for the Python programming language. It bundles tools for:
  • Data Mining: Google + Twitter + Wikipedia API, web spider, HTML DOM parser
  • Natural Language Processing: tagger/chunker, n-gram search, sentiment analysis, WordNet
  • Machine Learning: vector space model, k-means clustering, Naive Bayes + k-NN + SVM classifiers
  • Network Analysis: graph centrality and visualization.
It is well documented and bundled with 30+ examples and 350+ unit tests. The source code is licensed under BSD and available from
Pattern example workflow

0 comments

Popular Posts

无觅相关文章插件,迅速提升网站流量