Included in the Pandas open-source library are DataFrames, which are two-dimensional array-like data tables in which each column contains values of one variable and each row contains one set of values from each column. Data stored in a DataFrame can be of numeric, factor, or character types. Pandas DataFrames are also thought of as a dictionary or collection of series objects.
Data scientists and programmers familiar with the R programming language for statistical computing know that DataFrames are a way of storing data in grids that are easily overviewed. This means that Pandas is chiefly used for machine learning in the form of DataFrames.
Pandas allows for importing and exporting tabular data in various formats, such as CSV or JSON files.
Pandas also allows for various data manipulation operations and for data cleaning features, including selecting a subset, creating derived columns, sorting, joining, filling, replacing, summary statistics, and plotting.
According to organizers of the Python Package Index—a repository of software for the Python programming language—Pandas is well suited for working with several kinds of data, including:
- Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
- Ordered and unordered (not necessarily fixed-frequency) time series data
- Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
Any other form of observational/statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure.