Introduction to Jupyter Notebooks
|
|
Introduction to Python
|
Use variable = value to assign a value to a variable in order to record it in memory.
Variables are created on demand whenever a value is assigned to them.
Use print(something) to display the value of something .
|
Controlling Program Behavior
|
Use if condition to start a conditional statement and else to provide a default.
The bodies of the branches of conditional statements must be indented.
Use == to test for equality.
X and Y is only true if both X and Y are true.
X or Y is true if either X or Y , or both, are true.
True and False represent truth values.
Use for variable in sequence to process the elements of a sequence one at a time.
The body of a for loop must be indented.
Use len(thing) to determine the length of something that contains other values.
|
Working with Tabular Data
|
Use read_csv to read tabular data into Python.
use sort_values([columns]) to sort the dataframe.
use df_object.column_name or df_object[column_name] to select one column.
use df_object[list_of_column_names] to select multiple columns.
use df_object[condition] to filter data. For example, df_object[df_object[column_name] == value].
use .describe() to get descriptive statistics of one column.
use df1.merge(df2) to merge two DataFrames.
use df_object.groupby(column_list1).agg({column1:agg_function1, column2:agg_function2…}) to apply aggregation on data group.
Create pivot table with pandas.pivot_table
|
Python basics 3 - Functions and Modules
|
Define a function using def function_name(parameter) .
The body of a function must be indented.
Call a function using function_name(value) .
Numbers are stored as integers or floating-point numbers.
Integer division produces the whole part of the answer (not the fractional part).
Variables defined within a function can only be seen and used within the body of the function.
If a variable is not defined within the function it is used, Python looks for a definition before the function call
Specify default values for parameters when defining a function using name=value in the parameter list.
Parameters can be passed by matching based on name, by position, or by omitting them (in which case the default value is used).
Put code whose parameters change frequently in a function, then call it with different parameter values to customize its behavior.
import a module with import statement. Download a module with !pip install statement.
For us business students, it is more important to learn what a module can do and how to use the functions in the module than actually implement the functions by ourselves.
|
Working with Tabular Data
|
Use read_csv to read tabular data into Python.
use sort_values([columns]) to sort the dataframe.
use df_object.column_name or df_object[column_name] to select one column.
use df_object[list_of_column_names] to select multiple columns.
use df_object[condition] to filter data. For example, df_object[df_object[column_name] == value].
use .describe() to get descriptive statistics of one column.
use df1.merge(df2) to merge two DataFrames.
use df_object.groupby(column_list1).agg({column1:agg_function1, column2:agg_function2…}) to apply aggregation on data group.
Create pivot table with pandas.pivot_table
|
Statistical Analysis and Visualization
|
Use scipy.stats.ttest_ind for t-test
There are many other statistical tools in scipy, you can read the documentation for more details
You can customize many parameters in your graph, you can read the documentation for more details
For both Scipy and matplotlib, the most difficult part is to preprocess your data. After that, you can just find the right function and feed your data into it.
|
Extra - Errors and Exceptions
|
Tracebacks can look intimidating, but they give us a lot of useful information about what went wrong in our program, including where the error occurred and what type of error it was.
An error having to do with the ‘grammar’ or syntax of the program is called a SyntaxError . If the issue has to do with how the code is indented, then it will be called an IndentationError .
A NameError will occur if you use a variable that has not been defined, either because you meant to use quotes around a string, you forgot to define the variable, or you just made a typo.
Containers like lists and strings will generate errors if you try to access items in them that do not exist. This type of error is called an IndexError .
Trying to read a file that does not exist will give you an FileNotFoundError . Trying to read a file that is open for writing, or writing to a file that is open for reading, will give you an IOError .
|
Extra - Debugging
|
Know what code is supposed to do before trying to debug it.
Make it fail every time.
Make it fail fast.
Change one thing at a time, and for a reason.
Keep track of what you’ve done.
Be humble.
|
Data Preparation techniques
|
|
Linear regression
|
|
Introduction to CRISP-DM
|
The importance of business understanding is always overlooked.
Data preparation is least liked, most time consuming task
CRISP-DM is an iterative process.
|
Case Study
|
|
Case Study 2 - Yellow Cab Chicago Case
|
|