As you may recall, a list can be something like [1, 2, 3, 4]. In python, each element of a list does not necessarily have to be the same. A list can also be, if you really insist, [1, 2, “three”, True, 3.14]. Keep this in mind and consider what a list of lists could be.

[[1, 2, 3, 4], [“one”, “two”, “three”, “four”]]
A list (outside). Of lists (inside).
While the list of lists demonstrated above should really be a dictionary such as {“one”:1, “two”:2, etc}, sometimes, other tools don’t support dictionaries. Sometimes, you might even end up with a list of lists of dictionaries of tuples to lists of other dictionaries of strings to more lists, which would look something like (but with data) [[{():[{"":[]}]}]].
In spite of this post, I am not against to using multi-dimensional collections, albeit there being plenty of layers to get lost in, commas to miss. They might be prone to human error, but what isn’t? Why learn yet another tool when it seems like by the time I learn this one, a new, better tool will exist with all the market share, forcing me to learn that one, too. This isn’t being cynical… it’s being practical.

One of the biggest challenges of programming is not programming itself, but the data on which the program is run. Data, like most of life, is not perfect. In fact, data is usually filled with mistakes, random & uninvited characters, and might technically be structured but no one really followed the rules. It is important the data is not this useless for conducting analysis, otherwise our analysis is useless and — worse — our time wasted.
Pandas can clean. Cleaning data with pandas versus multi-dimensional collections is like the difference between cleaning a house belonging to 2 working professionals without kids, and a 2 bedroom apartment where 6 people live with 2 dogs.
Once the data is clean, you proceed with your magic: also using pandas. Merge multiple data sets, assign indices, name columns, remove or replace missings, search for outliers, remove or replace outliers. You could also do this with multi-dimensional for collections and for loops. This is recommended for people with too much time.

First thing to know about pandas is that it supports the essential datatypes. These include numbers (ints and floats), Booleans, strings, and dates.
Listen: you can put a date in a string. You can put a number in a string. You cannot put a number in a date. You cannot put a date in a number. Sometimes, you can even put a Boolean in a number. You can definitely not put a Boolean in a date.
You can also learn these things over time, and with a lot of trial and error. Or by reading the documentation and random credible articles.
The documentation explains crucial information such as a package’s methods, or the functionalities of a class. For the pandas class, this dtypes method will always tell me what data type an item belongs to, for when it isn’t obvious. It might seem obvious, but listen: deduction isn’t foolproof. It may look like an int, but is it an int? dtypes will always tell you the truth. Always go straight to the source for the truth.

How to use pandas
- Only use when necessary (i.e. if your data set is sufficiently large, or if you are 3 or more degrees deep into a multi-dimensional collection)
- Use with Jupyter Notebook
- Import csv is not only for csv’s
- Assign proper column names
- Make the row index your primary key, or at least something unique & meaningful
- Double check all data types are what they are supposed to be
- Google/Stack Overflow everything else