92 lines
4.4 KiB
Text
92 lines
4.4 KiB
Text
|
Metadata-Version: 2.1
|
||
|
Name: pandas
|
||
|
Version: 0.23.4
|
||
|
Summary: Powerful data structures for data analysis, time series, and statistics
|
||
|
Home-page: http://pandas.pydata.org
|
||
|
Maintainer: The PyData Development Team
|
||
|
Maintainer-email: pydata@googlegroups.com
|
||
|
License: BSD
|
||
|
Platform: any
|
||
|
Classifier: Development Status :: 5 - Production/Stable
|
||
|
Classifier: Environment :: Console
|
||
|
Classifier: Operating System :: OS Independent
|
||
|
Classifier: Intended Audience :: Science/Research
|
||
|
Classifier: Programming Language :: Python
|
||
|
Classifier: Programming Language :: Python :: 2
|
||
|
Classifier: Programming Language :: Python :: 3
|
||
|
Classifier: Programming Language :: Python :: 2.7
|
||
|
Classifier: Programming Language :: Python :: 3.5
|
||
|
Classifier: Programming Language :: Python :: 3.6
|
||
|
Classifier: Programming Language :: Python :: 3.7
|
||
|
Classifier: Programming Language :: Cython
|
||
|
Classifier: Topic :: Scientific/Engineering
|
||
|
Requires-Python: >=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*
|
||
|
Requires-Dist: python-dateutil (>=2.5.0)
|
||
|
Requires-Dist: pytz (>=2011k)
|
||
|
Requires-Dist: numpy (>=1.9.0)
|
||
|
|
||
|
|
||
|
**pandas** is a Python package providing fast, flexible, and expressive data
|
||
|
structures designed to make working with structured (tabular, multidimensional,
|
||
|
potentially heterogeneous) and time series data both easy and intuitive. It
|
||
|
aims to be the fundamental high-level building block for doing practical,
|
||
|
**real world** data analysis in Python. Additionally, it has the broader goal
|
||
|
of becoming **the most powerful and flexible open source data analysis /
|
||
|
manipulation tool available in any language**. It is already well on its way
|
||
|
toward this goal.
|
||
|
|
||
|
pandas is well suited for many different kinds of data:
|
||
|
|
||
|
- Tabular data with heterogeneously-typed columns, as in an SQL table or
|
||
|
Excel spreadsheet
|
||
|
- Ordered and unordered (not necessarily fixed-frequency) time series data.
|
||
|
- Arbitrary matrix data (homogeneously typed or heterogeneous) with row and
|
||
|
column labels
|
||
|
- Any other form of observational / statistical data sets. The data actually
|
||
|
need not be labeled at all to be placed into a pandas data structure
|
||
|
|
||
|
The two primary data structures of pandas, Series (1-dimensional) and DataFrame
|
||
|
(2-dimensional), handle the vast majority of typical use cases in finance,
|
||
|
statistics, social science, and many areas of engineering. For R users,
|
||
|
DataFrame provides everything that R's ``data.frame`` provides and much
|
||
|
more. pandas is built on top of `NumPy <http://www.numpy.org>`__ and is
|
||
|
intended to integrate well within a scientific computing environment with many
|
||
|
other 3rd party libraries.
|
||
|
|
||
|
Here are just a few of the things that pandas does well:
|
||
|
|
||
|
- Easy handling of **missing data** (represented as NaN) in floating point as
|
||
|
well as non-floating point data
|
||
|
- Size mutability: columns can be **inserted and deleted** from DataFrame and
|
||
|
higher dimensional objects
|
||
|
- Automatic and explicit **data alignment**: objects can be explicitly
|
||
|
aligned to a set of labels, or the user can simply ignore the labels and
|
||
|
let `Series`, `DataFrame`, etc. automatically align the data for you in
|
||
|
computations
|
||
|
- Powerful, flexible **group by** functionality to perform
|
||
|
split-apply-combine operations on data sets, for both aggregating and
|
||
|
transforming data
|
||
|
- Make it **easy to convert** ragged, differently-indexed data in other
|
||
|
Python and NumPy data structures into DataFrame objects
|
||
|
- Intelligent label-based **slicing**, **fancy indexing**, and **subsetting**
|
||
|
of large data sets
|
||
|
- Intuitive **merging** and **joining** data sets
|
||
|
- Flexible **reshaping** and pivoting of data sets
|
||
|
- **Hierarchical** labeling of axes (possible to have multiple labels per
|
||
|
tick)
|
||
|
- Robust IO tools for loading data from **flat files** (CSV and delimited),
|
||
|
Excel files, databases, and saving / loading data from the ultrafast **HDF5
|
||
|
format**
|
||
|
- **Time series**-specific functionality: date range generation and frequency
|
||
|
conversion, moving window statistics, moving window linear regressions,
|
||
|
date shifting and lagging, etc.
|
||
|
|
||
|
Many of these principles are here to address the shortcomings frequently
|
||
|
experienced using other languages / scientific research environments. For data
|
||
|
scientists, working with data is typically divided into multiple stages:
|
||
|
munging and cleaning data, analyzing / modeling it, then organizing the results
|
||
|
of the analysis into a form suitable for plotting or tabular display. pandas is
|
||
|
the ideal tool for all of these tasks.
|
||
|
|
||
|
|