You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

91 lines
4.4 KiB

4 years ago
  1. Metadata-Version: 2.1
  2. Name: pandas
  3. Version: 0.23.4
  4. Summary: Powerful data structures for data analysis, time series, and statistics
  5. Home-page: http://pandas.pydata.org
  6. Maintainer: The PyData Development Team
  7. Maintainer-email: pydata@googlegroups.com
  8. License: BSD
  9. Platform: any
  10. Classifier: Development Status :: 5 - Production/Stable
  11. Classifier: Environment :: Console
  12. Classifier: Operating System :: OS Independent
  13. Classifier: Intended Audience :: Science/Research
  14. Classifier: Programming Language :: Python
  15. Classifier: Programming Language :: Python :: 2
  16. Classifier: Programming Language :: Python :: 3
  17. Classifier: Programming Language :: Python :: 2.7
  18. Classifier: Programming Language :: Python :: 3.5
  19. Classifier: Programming Language :: Python :: 3.6
  20. Classifier: Programming Language :: Python :: 3.7
  21. Classifier: Programming Language :: Cython
  22. Classifier: Topic :: Scientific/Engineering
  23. Requires-Python: >=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*
  24. Requires-Dist: python-dateutil (>=2.5.0)
  25. Requires-Dist: pytz (>=2011k)
  26. Requires-Dist: numpy (>=1.9.0)
  27. **pandas** is a Python package providing fast, flexible, and expressive data
  28. structures designed to make working with structured (tabular, multidimensional,
  29. potentially heterogeneous) and time series data both easy and intuitive. It
  30. aims to be the fundamental high-level building block for doing practical,
  31. **real world** data analysis in Python. Additionally, it has the broader goal
  32. of becoming **the most powerful and flexible open source data analysis /
  33. manipulation tool available in any language**. It is already well on its way
  34. toward this goal.
  35. pandas is well suited for many different kinds of data:
  36. - Tabular data with heterogeneously-typed columns, as in an SQL table or
  37. Excel spreadsheet
  38. - Ordered and unordered (not necessarily fixed-frequency) time series data.
  39. - Arbitrary matrix data (homogeneously typed or heterogeneous) with row and
  40. column labels
  41. - Any other form of observational / statistical data sets. The data actually
  42. need not be labeled at all to be placed into a pandas data structure
  43. The two primary data structures of pandas, Series (1-dimensional) and DataFrame
  44. (2-dimensional), handle the vast majority of typical use cases in finance,
  45. statistics, social science, and many areas of engineering. For R users,
  46. DataFrame provides everything that R's ``data.frame`` provides and much
  47. more. pandas is built on top of `NumPy <http://www.numpy.org>`__ and is
  48. intended to integrate well within a scientific computing environment with many
  49. other 3rd party libraries.
  50. Here are just a few of the things that pandas does well:
  51. - Easy handling of **missing data** (represented as NaN) in floating point as
  52. well as non-floating point data
  53. - Size mutability: columns can be **inserted and deleted** from DataFrame and
  54. higher dimensional objects
  55. - Automatic and explicit **data alignment**: objects can be explicitly
  56. aligned to a set of labels, or the user can simply ignore the labels and
  57. let `Series`, `DataFrame`, etc. automatically align the data for you in
  58. computations
  59. - Powerful, flexible **group by** functionality to perform
  60. split-apply-combine operations on data sets, for both aggregating and
  61. transforming data
  62. - Make it **easy to convert** ragged, differently-indexed data in other
  63. Python and NumPy data structures into DataFrame objects
  64. - Intelligent label-based **slicing**, **fancy indexing**, and **subsetting**
  65. of large data sets
  66. - Intuitive **merging** and **joining** data sets
  67. - Flexible **reshaping** and pivoting of data sets
  68. - **Hierarchical** labeling of axes (possible to have multiple labels per
  69. tick)
  70. - Robust IO tools for loading data from **flat files** (CSV and delimited),
  71. Excel files, databases, and saving / loading data from the ultrafast **HDF5
  72. format**
  73. - **Time series**-specific functionality: date range generation and frequency
  74. conversion, moving window statistics, moving window linear regressions,
  75. date shifting and lagging, etc.
  76. Many of these principles are here to address the shortcomings frequently
  77. experienced using other languages / scientific research environments. For data
  78. scientists, working with data is typically divided into multiple stages:
  79. munging and cleaning data, analyzing / modeling it, then organizing the results
  80. of the analysis into a form suitable for plotting or tabular display. pandas is
  81. the ideal tool for all of these tasks.