|
Metadata-Version: 2.1
|
|
Name: pandas
|
|
Version: 0.23.4
|
|
Summary: Powerful data structures for data analysis, time series, and statistics
|
|
Home-page: http://pandas.pydata.org
|
|
Maintainer: The PyData Development Team
|
|
Maintainer-email: pydata@googlegroups.com
|
|
License: BSD
|
|
Platform: any
|
|
Classifier: Development Status :: 5 - Production/Stable
|
|
Classifier: Environment :: Console
|
|
Classifier: Operating System :: OS Independent
|
|
Classifier: Intended Audience :: Science/Research
|
|
Classifier: Programming Language :: Python
|
|
Classifier: Programming Language :: Python :: 2
|
|
Classifier: Programming Language :: Python :: 3
|
|
Classifier: Programming Language :: Python :: 2.7
|
|
Classifier: Programming Language :: Python :: 3.5
|
|
Classifier: Programming Language :: Python :: 3.6
|
|
Classifier: Programming Language :: Python :: 3.7
|
|
Classifier: Programming Language :: Cython
|
|
Classifier: Topic :: Scientific/Engineering
|
|
Requires-Python: >=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*
|
|
Requires-Dist: python-dateutil (>=2.5.0)
|
|
Requires-Dist: pytz (>=2011k)
|
|
Requires-Dist: numpy (>=1.9.0)
|
|
|
|
|
|
**pandas** is a Python package providing fast, flexible, and expressive data
|
|
structures designed to make working with structured (tabular, multidimensional,
|
|
potentially heterogeneous) and time series data both easy and intuitive. It
|
|
aims to be the fundamental high-level building block for doing practical,
|
|
**real world** data analysis in Python. Additionally, it has the broader goal
|
|
of becoming **the most powerful and flexible open source data analysis /
|
|
manipulation tool available in any language**. It is already well on its way
|
|
toward this goal.
|
|
|
|
pandas is well suited for many different kinds of data:
|
|
|
|
- Tabular data with heterogeneously-typed columns, as in an SQL table or
|
|
Excel spreadsheet
|
|
- Ordered and unordered (not necessarily fixed-frequency) time series data.
|
|
- Arbitrary matrix data (homogeneously typed or heterogeneous) with row and
|
|
column labels
|
|
- Any other form of observational / statistical data sets. The data actually
|
|
need not be labeled at all to be placed into a pandas data structure
|
|
|
|
The two primary data structures of pandas, Series (1-dimensional) and DataFrame
|
|
(2-dimensional), handle the vast majority of typical use cases in finance,
|
|
statistics, social science, and many areas of engineering. For R users,
|
|
DataFrame provides everything that R's ``data.frame`` provides and much
|
|
more. pandas is built on top of `NumPy <http://www.numpy.org>`__ and is
|
|
intended to integrate well within a scientific computing environment with many
|
|
other 3rd party libraries.
|
|
|
|
Here are just a few of the things that pandas does well:
|
|
|
|
- Easy handling of **missing data** (represented as NaN) in floating point as
|
|
well as non-floating point data
|
|
- Size mutability: columns can be **inserted and deleted** from DataFrame and
|
|
higher dimensional objects
|
|
- Automatic and explicit **data alignment**: objects can be explicitly
|
|
aligned to a set of labels, or the user can simply ignore the labels and
|
|
let `Series`, `DataFrame`, etc. automatically align the data for you in
|
|
computations
|
|
- Powerful, flexible **group by** functionality to perform
|
|
split-apply-combine operations on data sets, for both aggregating and
|
|
transforming data
|
|
- Make it **easy to convert** ragged, differently-indexed data in other
|
|
Python and NumPy data structures into DataFrame objects
|
|
- Intelligent label-based **slicing**, **fancy indexing**, and **subsetting**
|
|
of large data sets
|
|
- Intuitive **merging** and **joining** data sets
|
|
- Flexible **reshaping** and pivoting of data sets
|
|
- **Hierarchical** labeling of axes (possible to have multiple labels per
|
|
tick)
|
|
- Robust IO tools for loading data from **flat files** (CSV and delimited),
|
|
Excel files, databases, and saving / loading data from the ultrafast **HDF5
|
|
format**
|
|
- **Time series**-specific functionality: date range generation and frequency
|
|
conversion, moving window statistics, moving window linear regressions,
|
|
date shifting and lagging, etc.
|
|
|
|
Many of these principles are here to address the shortcomings frequently
|
|
experienced using other languages / scientific research environments. For data
|
|
scientists, working with data is typically divided into multiple stages:
|
|
munging and cleaning data, analyzing / modeling it, then organizing the results
|
|
of the analysis into a form suitable for plotting or tabular display. pandas is
|
|
the ideal tool for all of these tasks.
|
|
|
|
|