Q&A with Becoming a Teacher of Statistics Class




March 28, 2018


This last Tuesday (03/27/2018) I was invited to go do a Q&A with the students in the EPsy 5271 Becoming a Teacher of Statistics course. This class also met via video-link with a similar course at Penn State. The students asked thoughful (sometimes difficult) questions and I tried to answer them. I asked them if I could blog out their questions and my responses and they kindy said “yes”. Rather than respond to all of them at once, I thought I would use this opportunity to create several blogposts. So without further ado, here we go.

Is there a distinction between a statistics curriculum that is rich with computational experience, and a data science curriculum that includes statistical theory? Where do you think education in the fields of data science and statistics is headed, both separately and in relation to each other?

This is a tough question. My guess is yes. And the distinction is in the emphasis and the theoretical understandings. My other hypothesis, is that this distance between the two diminishes with experience beyond those received in your formal education. Let me give an example: Consider two recent graduates—one from a statistics program (but with many computational experiences) and the other from a CS program (but with many data/stat related experiences)— who both work in a DS company.

Both graduates are equipped to handle a DS job, but in different ways. For the stat major, computation is the tool and the problem is approached through their stat training (design, analysis). For the CS major it is likely the reverse. The statistical methods are the tool that inform their approach to tool building, or coding.

In 10 years time, while their approaches to the work may still be slightly different, the discrepancies between the “students” has likely all but vanished as they grow and experience things (skills that are required strengthen and those that aren’t weaken).

From an educational perspective, it is unclear that one approach is better than the other. Although many people have opinions about this, there is not empirical evidence to suggest one approach is better (at least so far as I am aware). In looking at different undergraduate data science programs around the country, different institutions take different approaches to this. (Often it is a function of which department the faculty who started the program are based in rather than a well-planned out best-approach for students.)

What strategies would you recommend for implementing student centered/cooperative learning in the setting of very large lecture sizes?

There are probably several strategies to make this work in a large class. For example, having students do a think-pair-share in which they (1) work for a minute individually, (2) pair with a fellow student to work, and then (3) share their work with another pair of student. I would also change the way in which TAs are utilized in those courses, for example making sure they are in all classes (lectures and labs) rather than just teaching the lab sections.

Jacobs and Inn (2003) wrote a book chapter, Using Cooperative Learning in Large Classes, that addresses this very question. Rhonda Magel published an article, Using Cooperative Learning in a Large Introductory Statistics Class, in the Journal of Statistics Education related to cooperative learning in statistics courses.

Many universities have a Center for Teaching and Learning. This is a wonderful resource for instructors and their staff would probably welcome working with you to implement these types of pedagogies into your courses. Often they also have online resources. For example, the University of Waterloo’s Centre for Teaching Excellence has an online resource called Activities for Large Classes.

I see that you are interested in Data Science and have been successfully funded to promote statistics education. Have you applied or been funded through NIH? Do you plan to be involved in the input on the Draft for Strategic Plan for Data Science?

I have never personally been funded through NIH. We primarily seek funding through NSF. (BioSQuaRE was funded through the Howard Hughes Foundation.) To be honest, I hadn’t heard about this document until your question, so thank you for bringing it to my attention.

After reading through the document, my take is that it seems to be addressing how data science can be used in research. My interest in data science is more on the education side of things. For example, I was on a committee through the National Academies of Science that was set up to study undergraduate data science. If you are interested, you can read our report, Envisioning the Data Science Discipline: The Undergraduate Perspective.

While what happens in research and what is taught in the classroom are related, how the health sciences should approach the use of data science in their research is beyond my scope.