In this post, I'd like to share a few awesome machine-learning toolkits developed in python.
I tested a lot similar tools and ended up with the following ones:
--------------------------------------------------------
Gensim – Topic Modelling for Humans
Gensim is a free Python
framework designed to automatically extract semantic topics from
documents, as efficiently (computer-wise) and painlessly (human-wise) as
possible.
Some other packages from Blei's website. online lda, tmve(online) --Topic Model Visualization Engine,
PyML – machine learning in Python
PyML
is an interactive object oriented framework for machine learning
written in Python. PyML focuses on SVMs and other kernel methods. It is
supported on Linux and Mac OS X.
-----------------------------------------------------------------------------------------------
mlpy
mlpy is a Python module for Machine Learning built on top of NumPy/SciPy and the GNU Scientific Libraries.
mlpy provides a wide range of state-of-the-art machine learning methods for supervisedand unsupervised problems
and it is aimed at finding a reasonable compromise among modularity,
maintainability, reproducibility, usability and efficiency. mlpy is multiplatform, it works with Python 2 and 3 and it is Open Source, distributed under the GNU General Public License version 3.
scikit-learn: machine learning in Python
scikit-learn is a Python module integrating classic machine learning algorithms in the tightly-knit scientific
Python world (numpy, scipy, matplotlib). It aims to provide simple and efficient solutions to learning problems,
accessible to everybody and reusable in various contexts:
machine-learning as a versatile tool for science and engineering.
Python world (numpy, scipy, matplotlib). It aims to provide simple and efficient solutions to learning problems,
accessible to everybody and reusable in various contexts:
machine-learning as a versatile tool for science and engineering.
NLTK Package
The
Natural Language Toolkit (NLTK) is an open source Python library for
Natural Language Processing. A free online book is available. (If you
use the library for academic research, please cite the book.)
Steven Bird, Ewan Klein, and Edward Loper (2009). Natural Language Processing with Python. O’Reilly Media Inc. http://nltk.org/book
Theano
Theano
is a Python library that allows you to define, optimize, and evaluate
mathematical expressions involving multi-dimensional arrays efficiently.
Theano features:
- tight integration with numpy – Use numpy.ndarray in Theano-compiled functions.
- transparent use of a GPU – Perform data-intensive calculations up to 140x faster than with CPU.(float32 only)
- efficient symbolic differentiation – Theano does your derivatives for function with one or many inputs.
- speed and stability optimizations – Get the right answer for log(1+x) even when x is really tiny.
- dynamic C code generation – Evaluate expressions faster.
- extensive unit-testing and self-verification – Detect and diagnose many types of mistake.
Check out how Theano can be used for Machine Learning: Deep Learning Tutorials and
the source code here
the source code here
sluggerml
SluggerML- baseball stats!Using datasets from retrosheet.org and seanlahman.com/baseball-archive/statistics.
Depends on scikit-learn and python-nltk (and reportlab if you're building reports).On Debian,
requires the following to build the aforementioned:python-devpython-numpypython-scipy
MDP
The Modular toolkit for Data Processing (MDP) package
is a library of widely used data processing algorithms, and the
possibility to combine them together to form pipelines for building more
complex data processing software.
From the user’s perspective, MDP consists of a collection of units,
which process data. For example, these include algorithms for
supervised and unsupervised learning, principal and independent
components analysis and classification.
From
the developer’s perspective, MDP is a framework that makes the
implementation of new supervised and unsupervised learning algorithms
easy and straightforward. The basic class, Node, takes care of tedious tasks
like numerical type and dimensionality checking, leaving the developer
free to concentrate on the implementation of the learning and execution
phases.
Pattern
Pattern is a web mining module for the Python programming language. It bundles tools for:
- Data Mining: Google + Twitter + Wikipedia API, web spider, HTML DOM parser
- Natural Language Processing: tagger/chunker, n-gram search, sentiment analysis, WordNet
- Machine Learning: vector space model, k-means clustering, Naive Bayes + k-NN + SVM classifiers
- Network Analysis: graph centrality and visualization.
It
is well documented and bundled with 30+ examples and 350+ unit tests.
The source code is licensed under BSD and available from
Post a Comment