Chebucto Regional Softball Club

Deb Nam-Krane

One of the highlights of my life is geeking out with Ivan over his *No Bullshit Guide to Math & Physics* while I was homeschooling my sons during the pandemic, and thus I am happy to share his latest, *No Bullshit Guide to Statistics*. I'm almost sad that my sons already finished their Stats class, but if you're looking for a reference, pick this up.

**As with any endorsements I make, I get absolutely nothing for this, so buy in good conscience**

No Bullshit Guide to Statistics

Statistics is the art of learning from data. Given a dataset of interest, you can use the statistics toolbox to analyze the data and derive interesting observations and conclusions.Why should you learn statistics?Understanding statistics is of strategic importance in the modern world. Statistics has applications in education, healthcare, sports, business (sales, finance, advertising, marketing, quality control, insurance), economics, engineering, and science. Anyone who wants to use data to make informed decisions needs to know statistics.Historically, statistics was a specialized topic reserved for researchers and academics, but in the 21st century, statistical literacy is increasingly important for business professionals, technologists, and the general public. We're surrounded by data of all kinds. Access to data can be very useful if you know how to make sense of it. To extract insights from data, you need to know statistics.How can you learn statistics?Traditional statistics textbooks use an outdated approach to teaching statistics based on prepackaged recipes for statistical analysis. This is because, in the olden days, people didn't have access to computers so they were limited to applying only simplified recipes.Modern computational statistics curriculumThe modern approach to statistics embraces computational methods like simulation, bootstrap estimation, and resampling methods. The computational approach makes statistical thinking more accessible and intuitive. Instead of memorizing a bunch of formulas, you can leverage the power of computers for probability calculations and statistical procedures.Python makes learning statistics easyPython provides powerful tools for doing probability and statistics calculations. You can use Python code as a "parallel narrative" that complements and supplements math formulas. Don't worry if you don't have any prior experience with Python. You don't need to be a programmer to understand the code examples in the book. You just need to know how to use Python as a calculator. See this blog post for more info about how Python helps when learning statistics.Solid foundationTo understand the general principles of statistical inference, you need to come prepared with practical data management skills and knowledge of probability theory. A solid foundation of DATA and PROB makes learning statistics much easier and less intimidating.Interactive computational notebooksJupyterLab is a computational environment that makes it easy to do calculations in computational notebooks. Notebooks combine text, graphics, and code calculations into a coherent narrative. Jupyter notebooks provide an interactive learning environment where you can "play" with the code examples from the book by modifying parameters to see how the outputs change.Use Python for real-world data managementThe Python ecosystem provides best-in-class tools for working with data. It's important for you to know how to work with realistic data, including extracting data from various sources and formats. You must also learn practical skills like data cleaning, which are required for real-world data analyses.Probability calculations as codeThe Python ecosystem provides various tools for probability calculations. For example, you can use the probability models defined in scipy.stats to learn about probability distributions.Understanding is better than memorizationA solid foundation of practical (data) and theoretical (probability theory) prerequisites enables a deep knowledge of statistical inference procedures. You'll learn to understand concepts like estimators and sampling distributions, construct confidence intervals, run hypothesis tests, and interpret results correctly.Solving exercises and problemsThe best (only?) way to learn complicated concepts is to get hands-on experience in applying these concepts by solving exercises and problems. Readers who are strong in math will be able to solve the exercises using pen-and-paper calculations. Alternatively, you can use the computational approach (Python) to solve exercises.Mental effort requiredI'm not going to lie to you and tell you learning statistics is going to be easy. It is not. You'll need to expose your brain to lots of mathematical, computational, and conceptual complexity, but the payoff is worth it.What am I buying?The No Bullshit Guide to Statistics covers all the standard topics of a first-year university-level statistics course, as well as modern statistics topics like resampling methods and Bayesian inference. The book introduces statistics concepts in a rigorous, yet accessible manner.Computational approachThe book uses Python code examples to illustrate probability and statistics calculations. The computational approach makes complicated ideas more intuitive and easy to understand.Interactive notebooksEach section of the book is accompanied by a computational notebook that includes all the code examples from the text. Playing with these notebooks on your own is an excellent way to understand what's going on. You can either trust me when I tell you what the parameters of the normal distribution mu (the mean) and sigma (the standard deviation) do, or input different values into a code cell inside a Jupyter notebook, then press SHIFT+ENTER on your keyboard to re-generate the graph of the probability distribution norm(mu,sigma) for the parameters mu and sigma you picked.Python prerequisites includedThe book includes a Python tutorial (Appendix C) for readers with no prior programming experience. Don't worry, you don't need to be a programmer to understand the code examples in the book, we're just going to be using Python as a fancy calculator.Math prerequisites includedI assume that readers have a minimal math background like math notation and functions. All other math prerequisites are included in the text and introduced as needed throughout the chapters. For example, you'll learn combinatorics formulas just-in-time to use them for discrete distributions, and learn how to calculate integrals when you need them for working with continuous distributions.Statistics procedures from scratchWe'll show how to perform probability and statistics calculations in full detail, using the low-level Python building blocks provided by numpy and scipy.stats. Many of the probability calculations you need to know can look intimidating when presented as math formulas (think integrals). These same calculations become more accessible when approached from a computational perspective. Indeed, a simple Python for-loop allows us to run simulations and visualize any probability calculation. By seeing the "picture" of what the calculation produces as output and reading code that performs the calculations, it becomes relatively easy to understand the math formula, since you know what the math is trying to describe.Statistics procedures using the Python ecosystemPython provides libraries for data management (Pandas), data visualizations (Seaborn), numerical computing (NumPy), scientific computing (SciPy), high-level statistics calculations (scipy.stats), linear models (statsmodels), and Bayesian models (bambi). It's cool that we can write the low-level code for doing probability and statistics calculations, but in practice, it's better to reuse the statistics code written by experts like the coders behind the scipy.stats, statsmodels, and bambi libraries. There is no problem with using other people's code, provided you know what it's doing under the hood. Indeed, when using statistics for real-world data analyses, it's preferable to use existing libraries, because they are much faster than our hand-rolled code that we wrote.The cool part about knowing two different ways to perform the same statistical calculations is that we can check that they produce the same answer. We can use the predefined helper functions to check the correctness of the calculations we performed from first principles are correct. Alternatively, you can check if you really understand what a helper function does by writing your own implementation of the helper function that produces the same answer. If the code you wrote and the predefined functions give the same results, then we good.Statistical intuitionIt's important to understand the reasoning and logic behind statistics procedures, including the assumptions they are based on. The book will teach you all the necessary context, instead of just skimming the surface. I won't ever ask you to blindly apply any recipe for statistical analysis. Instead, you'll develop a deep understanding of the general principles of statistical inference.Multiple, redundant explanationsThis book uses multiple modalities to help readers understand statistics concepts: text, graphics, formulas, code, and computer simulations. I'm aiming for this redundancy of explanations to be helpful and not annoyingly repetitive. Though to be honest, if at any point a reader starts to get "bored" since they can accurately predict what happens next in the text, formula, graph, or code thanks to previous text, formulas, graphs, or codes, then I have won! This reader is no longer afraid of stats.Multiple perspectivesThe book presents inferential statistics tasks from multiple perspectives. In Chapter 3, we'll learn classical (frequentist) methods like estimation, confidence intervals, and hypothesis testing. Then in Chapter 4, we'll revisit the same topics from the perspective of linear models. Chapter 5 explains the Bayesian perspective to estimation, credible intervals, and hypothesis testing. We'll analyze the same datasets in all chapters, which will allow us to compare and contrast the different statistical inference paradigms.Connecting theory and practiceThe book presents both theoretical and practical aspects of statistics as part of a connected whole. Part 1 of the book starts with hands-on matters of data management. You'll work on real-world data scenarios to get some experience working with Pandas to manipulate datasets. You'll also learn about probability theory through a mix of theoretical formulas, visualizations, and computer simulations. In Part 2 of the book, we'll build on your experience with data and probability theory to define statistical models and statistical inference procedures. You'll learn about three different perspectives on the core statistical inference tasks: estimation, uncertainty quantification, and decision making. You'll develop a solid understanding of the WHAT and WHY questions, separate from the procedural steps that explain HOW to perform statistical procedures. For example, you'll learn to perform the "compare two groups using the difference between means" analysis using four approaches: resampling methods (permutation test), analytical approximations (Welch's two-sample t-test), linear model on dummy-encoded data (equivalent to Student's t-test with pooled variance), and using a Bayesian model. By the end of the book, you'll have both the "know HOW" (practical skills to apply statistics procedures in real-world situations), and the "know WHY" (understand what the statistical inference tasks are in general).Lots of exercises and problem setsEach chapter includes exercises and problems so you can practice the new material in realistic data analysis scenarios. Solving exercises is the best way to learn stats. I've prepared "starter notebooks" with the problem statements so you can work on them in JupyterLab.A modern supplement to STATS101Chapter 3 includes all the standard material that undergraduates learn in STATS101. I will make sure that you know how to apply the prepackaged "recipes" for data analysis in case you're taking a STATS101 class and this is expected of you. There is no problem with the STATS101 material, I'm just going to enrich that stuff for you by showing you a bunch of other ways to accomplish the statistical inference tasks.Emphasis on real-world applicationsI spent many hours of research and many dollars consulting with experts in the academic and business worlds so that I can make an authoritative executive decision about what is essential and what is superfluous for an introductory statistics book. There are countless proposals and recommendations for improving the undergraduate statistics curriculum. I took all those recommendations and multiplied them by the "practical importance" factor I gathered from industry experts. What stats do people actually use in the real world? It took me seven years, but I come up with a playlist of topics that is both academically sound AND practically useful.Book outlineThe No Bullshit Guide to Statistics comes in two parts. Part 1 covers DATA and PROB prerequisites. Part 2 covers statistical inference topics from three different perspectives (frequentist, linear models, and Bayesian). This book is the distillation of everything I know about data, probability, and statistical inference.Part 1: DATA and PROBABILITYThe goal of the first part of the book is to develop your practical data management skills and your understanding of study design (observational studies vs. statistical experiments). We'll also learn about descriptive statistics used to summarize sample data, and probability distributions used as models for population data. These topics are the key prerequisites for the statistical inference topics covered in Part 2 of the book. Chapter 1. DATA [93pp] Introduction to data Data in practice Descriptive statistics Chapter 2. PROBABILITY [187pp] Discrete random variables Multiple random variables Inventory of discrete distributions Continuous random variables Multiple continuous random variables Inventory of continuous distributions Simulation and empirical distributions Probability models for random samples Appendix A: Answers and solutions Appendix B: Notation Appendix C: Python tutorial (using Python as a calculator) Appendix D: Pandas tutorial (data management) Appendix E: Seaborn tutorial (data visualizations) Appendix F: Calculus tutorial (basic integration required to work with continuous RVs) Click this link to see a PDF preview of Part 1 of the book: noBSstats_part1_preview.pdf [223pp, 9MB].Part 2: STATISTICAL INFERENCEStatistical inference is the process of learning about the properties of an unknown population based on a sample from that population. Part 2 of the book introduces the key ideas of statistical inference, including classical (frequentist) statistics (Chapter 3), linear models (Chapter 4), and Bayesian statistics (Chapter 5). The goal of this "enriched" curriculum is to cover the topics and procedures that are useful in the real world outside the classroom. Chapter 3. CLASSICAL (FREQUENTIST) STATISTICS [274pp] Estimators Confidence intervals Introduction to hypothesis testing Analytical approximations Two-sample hypothesis tests Statistical design and error analysis Inventory of statistical tests Statistical practice Chapter 4. LINEAR MODELS [154pp] Simple linear regression Multiple linear regression Interpreting linear models Regression with categorical predictors Causal inference using linear models Generalized linear models Chapter 5. BAYESIAN STATISTICS [184pp] Introduction to Bayesian statistics Bayesian inference computations Bayesian linear models Bayesian difference between means Hierarchical models Appendix A: Answers and solutions Appendix B: Notation Click this link to see a PDF preview of Part 2 of the book: noBSstats_part2_preview.pdf [191pp, 16MB].Who is this book for?This book is for people who really want to understand statistics. To learn statistics in a deep manner, you need to be ready to invest several weeks of concentrated attention to read the book, play with the computational notebooks, and solve practice problems.If you're looking for a book with statistics shortcuts and statistical analysis recipes you can follow without thinking, then this book is not for you. The No Bullshit Guide to Statistics takes the opposite approach: we'll derive all formulas from first principles, and verify each formula by running numerical simulations.StudentsStudents taking a statistics class are in for a good time. The book explains the core ideas of statistics in an intuitive manner that makes them easy to understand. The computational approach to statistical inference will allow you to "see" concepts and "play" with them interactively. Students taking a STATS101 class right now should focus on Chapter 3, which covers the standard introductory topics. People with no prior coding experience should read the Python tutorial in Appendix C to get up to speed on the Python syntax for evaluating expressions, manipulating variables, and calling functions. If you're taking a class on linear models (a.k.a. linear regression), then you should focus on the material in Chapter 4. Students taking a Bayesian statistics class should focus on Chapter 5 first. Graduate students who need to use statistics for their research will benefit from the review of the basics, and the comprehensive coverage of classical (frequentist) statistics procedures (Chapter 3), linear models (Chapter 4), and Bayesian methods (Chapter 5). In grad school, you need to actually use statistics, not just read books about it, so the hands-on explanations of the tools in the stats toolbox will be very useful.AcademicsResearchers who need to use statistics to publish papers and advance their academic career will benefit from the no-nonsense guide to the statistics toolbox. The exercises and problems will teach you what you need to know to apply statistics to your dataset, and more importantly, teach you how to correctly interpret the results of your statistical analyses. Solid knowledge of statistical inference methodology will make all your papers bulletproof against criticism on statistics grounds.People in industryPeople working in industry will learn what they need to know to apply statistics to real-world business data (sales, finance, advertising, marketing, quality control, insurance). Whether you're analyzing a marketing campaign, the traffic patterns from the company website, or the sales data for last month, knowing statistics will be your gateway to business intelligence. Nowadays, you need to make data-driven business decisions to stay ahead of the competition.TechiesIf you have prior programming experience, then you'll have a very good time learning about computational statistics. The computational approach to probability is like a cheat code that allows you to understand advanced statistics topics by running computer simulations. Basically, if you know how to use a for-loop, then you already know how to generate thousands of random samples from any probability distribution and use this samples to perform the required statistical analysis. You don't have to worry about deciphering the math formulas, so long as you understand the code examples (because the code does the same thing as the math).Adult learnersDid you take a statistics class at university many years ago, but don't remember anything from it? Now is your chance to (re)learn STATS. You can go through the material much more easily now as an adult, and it's really worth it for the knowledge buzz you will experience along the way. Learning statistics will also allow you to make sense of the statistical arguments that you see in science papers, research reports, political policy debates, etc. that surround you. Indeed, in any sentence that involves "you" and "data," your knowledge determines if you're going to be the subject or the object in that sentence. Understanding statistics and how computers work will make it easy to spot situations when you're the object and something or someone else is using statistics on you (e.g. TikTok/Instagram/YouTube keeping you hooked for hours). Knowing statistics will also allow you to do useful stuff with your personal data. For example, you could use your own sleep data to run an experiment that studies the effects of different substances you consume on your sleep quality. Your data; your compute; and S-level usefulness insights derived, with no corporation involved in the process. Learn statistics so you can de-FAANG your life and stand a chance against the IT corporate beasts.Reader's testimonialsThis is what readers have been saying about the book:"I liked that you compare the outputs of your home-brew functions to results from scipy.stats. This shows both stats basics, but also good software practices of code testing." — Jess Bird, MSc statistics"Your books are helping me sharpen my brain. Huge fan of your work." — Mujeeb KhanFrequently asked questionsQ. Is this a theory book or a practical book?A. The book doesn't cover much theory, there are no detailed proofs or long math derivations. We won't shy away from math formulas though, since it's important for you to understand how the underlying probability machinery works before using it for statistical inference.Q. Is this a coding book?A. It's definitely not a coding book, but to be "practical" I made sure each equation is also available as code. I use Python code to show stats procedures, so you can see the steps. The idea is that reading the code examples in parallel with the text+math explanations will lead to better understanding.Q. What can I learn for free?A. All the tutorials and the computational notebooks are available through the book's website noBSstats.com for free. Each notebook provides self-contained explanations (text+math+code) that you'll be able to follow and learn from, even if you don't have the main text.Q. Is this a Bayesian or a frequentist book?A. I don't take sides in the BAYESIANISM vs. FREQUENTISM debate, and cover both perspectives in the book. ¿Por qué no los dos? Chapters 3 and 4 present the classical (frequentist) perspective, then in Chapter 5 we revisit the same topics from a Bayesian perspective.Q. Why is the book not free?A: I'm a big fan of Open Educational Resources (OERs), but I fear that making the No Bullshit Guide textbooks free would send the wrong signal to potential readers. Free often means low quality, and there is none of that in my books. I don't want this to be one of the hundred books you have downloaded with the aspiration to read one day, but you never get to... If no-cost is really important to you, then you can wait a few months and I'm sure a pirated version of the book will show up on LibGen.Q. Why don't you charge more for the books?A. I realize $29 for Part 1 [433pp] and $39 for Part 2 [656pp] doesn't accurately reflect the value provided by these books, but my priority (as an author) is for people to have access to statistical knowledge/power, rather than making the optimized business decision (as a publishing company).Q. Why is the book so long?A. I learned from my tutoring experience that providing redundant explanations of key concepts is useful for learning. You need to know different paths to arrive at the same concepts. Some readers will "get it" when they read the math and text. Other readers might enjoy the visual explanations (Show me the picture!). For others, things will click only when they see concrete code examples that implement the math calculations. By presenting the same topics and concepts as text, math, graphics, and code, I'm trying to provide alternative paths to understanding statistics concepts. But wait there is more! I have included redundancy also at the subject level. Chapter 3 describes statistics from the frequentist perspective, which is currently the most popular paradigm for presenting results in scientific publications. Chapter 4 on linear models repeats the same calculations using linear models, which are a unifying framework for studying the influence of predictor variables on an outcome variable. You can perform statistical tests directly, or by thinking of them as linear models, and you get the same results using both paradigms. But wait there is more! In Chapter 5 we revisit the same statistical analyses a third time from the point of view of Bayesian statistics. The numerical results of the Bayesian analysis turn out to be similar to the results we obtain in Chapter 3 and Chapter 4, but the interpretation of these results is different.Free stuffThe book is not free, but all the educational resources I developed for the book are available for free, including: Book preview PDFs that contain the introductions from each chapter:noBSstats_part1_preview.pdf [223pp, 9MB] and noBSstats_part2_preview.pdf [191pp, 16MB]. Concept maps from the books: statistics_concepts.pdf Book outline (open for comments and suggestions). The book's website: noBSstats.com. This website is built from this GitHub repository https://github.com/minireference/noBSstats/, which also contains the datasets from the book, the computational notebooks, the code for producing the figures, and the code for generating synthetic data. Run the notebooks in the cloud here: https://mybinder.org/v2/gh/minireference/noBSstats/main Free tutorials: Python tutorial https://nobsstats.com/tutorials/python_tutorial.html (see also bit.ly/pytut3) Pandas tutorial https://nobsstats.com/tutorials/pandas_tutorial.html I wrote the Python module ministats with statistics helper functions — see the source code here: https://github.com/minireference/ministats. A playlist with video tutorials for Chapter 3 (the most complicated stuff). Blog posts about statistics: https://minireference.com/blog/fixing-the-statistics-curriculum/ https://minireference.com/blog/no-bullshit-guide-to-statistics-progress-update/ https://minireference.com/blog/what-stats-do-people-want-to-learn/ https://minireference.com/blog/python-for-stats/ https://minireference.com/blog/noBSstats-sales-pitch/ https://minireference.com/blog/noBSstats-prerelease/ About the authorMy academic background is in engineering (BEng), physics (MSc), and computer science (PhD). I was a math tutor for two decades, which taught me how to explain concepts intuitively and concisely. I am the author of the textbooks in the No Bullshit Guide series, which are all written in a style that is conversational, jargon-free, and to the point. I started the Minireference Publishing to "productize" my tutoring services: the textbooks contain lots of math formulas, but readers get the same Learner User Experience (LUX) as during a tutoring session.I spent the last seven years learning statistics up to the graduate level so that I can identify which topics and concepts are essential, and which are part of the historical baggage that we can forget. The No Bullshit Guide to Statistics is the distillation of everything I know about statistics, packaged into a flow that uses all the teaching tricks I've learned from my tutoring days.Prerelease bundle and future updatesThe promotional price US$34 is for the eBook bundle that includes both Part 1 and Part 2. This is a 50% discount from the full price, which will be $29+$39=$68 for the two parts. Your purchase of the prerelease includes all future updates (you'll get email notifications via Gumroad when updates are released). The text is in its final form—I just need to add a few more exercises before I call it v1.0 and send it to print.

Gumroad (minireference.gumroad.com)

Chebucto Regional Softball Club

dnkboston@apobangpo.space

Posts

No Bullshit Guide to Statistics