An academic integrity-friendly code pal for R Studio

How to plug in an LLM to help – but not too much – in a world that wants to cheat

AI
code
r
education
Author

Matt Waite

Published

November 26, 2024

One of the struggles on campus these days is all about where to draw the lines when it comes to AI in the classroom. There’s no end of discussion about students using ChatGPT to cheat, particularly on writing assignments. How do you stop it? How do you adapt to it? How do you convince students to do the work?

Teaching students to write code is no different. I add a layer of difficulty in that I teach journalism and sports media students how to code. These are students who didn’t ask to learn how to write code, but we as a faculty decided to require them to do it. Thus, they have incentives to cheat. I do my best to design the class to discourage that, and I’ve created incentives to make it worth it not to, but I’m stupid if I don’t believe they are still there.

But I’m also a bit dim if I don’t acknowledge that Large Language Models can help with learning how to code. The trick is, once again, where to draw the line.

The classes I teach are all data analysis in R using the tidyverse and R Studio as the IDE. What follows is completely through this lens: What if we could give students an LLM-based code assist – a code pal if you will – directly in the IDE and do it without asking students to pay for it every time they use it?

Getting started

There’s a bunch of steps to get this set up and it’s going to take a decent chunk of your hard drive when all is said and done. Doing this also requires a decent amount of power. How much? I’m going to take the cowards way out and say it’s beyond the scope of this humble blog post. Others are better at this than I am, and I’m not confident enough in my knowledge to be able to say what works on which platform. I’m doing this on an M1 MacBook Pro with 16 GB of RAM. Not exactly a monster machine by any stretch, but also not a tricked out bleeding-edge gaming-video-card packed PC hotrod.

Step 1: The first thing you need is Ollama. We’ll use that to download, manage and serve up our local LLM. Install it per your operating system. The LLM we’ll be using today is qwen2.5-coder. Once you have Ollama up and running, you can get qwen2.5-coder installed and running with ollama run qwen2.5-coder

That will install the 7B version – the 7-billion parameter model. That should run and give you decent performance on just about anything. If you’ve got more muscle, you might look at how to install some of the bigger parameter models. Generally, the more parameters, the better the results.

Step 2: The next thing you need is pal, an R library that adds a way to consult an LLM inside R Studio. After installing it – you can use pak as the instructions show you or you can use devtools::install_github("simonpcouch/pal") like I did because I haven’t gotten into the habit of using pak. Once installed, you can skip the parts about adding an Anthropic API key – unless you want to use Claude and have API credits to spend – and go to the Get Started article. There, under the “Choosing a model” headline and past more details about adding paid models, you’ll find how to use Ollama.

The least confusing way to do this, in my opinion, is to add this to your .Rprofile. In the R console, run usethis::edit_r_profile() and add this:

options(
  .pal_fn = "chat_ollama",
  .pal_args = list(model = "qwen2.5-coder")
)

NOTE: If you installed a bigger model than I did, you should specify which model you used in the the .pal_args. Note mine does not say what parameters I have. If you installed the 14b model, for example, your .pal_args should say “qwen2.5-coder:14b” instead of just “qwen2.5-coder”. Save that file and restart R Studio so they take effect.

You’re almost ready to get started. Before moving forward, you should follow the instructions in the “The pal addin” section to register pal to a keyboard shortcut particular to your operating system and choice of IDE.

Writing your own pal

The mechanics of writing your own pal could not be easier, thanks to the library. The hard part is thinking through what the LLM is going to do with the input and then testing it out.

Let’s make an Academic Integrity Friendly pal that tries to create friendlier and more helpful error messages.

In the R console, run library(pal) and then prompt_new("whatiswrong", "suffix")

The first part of that is the name you’re giving your pal, the second is where it’s going to put the results. You can use “replace” to … well … replace what you highlight. You can use “prefix” to put it above your code. And “suffix” puts it after your code. We want ours to act like an error message, so suffix makes sense.

Do that and a markdown file will pop up. It’s templated, so it could not be easier to fill out. Here’s what I’m using to make my pal:

You are a terse assistant designed to help R users debug code. Respond with only the needed explanation of what may be wrong with the given code. Do not write code for the user, just explain in plain language.
As example, given:
df |> filter(column_name = “word”)
Return:
When using a filter, you must use == for equal to instead of =.

Save it and then run directory_load() to get your pal in the shortcut menu.

Using your pal

Using your pal is now just a matter of messing up some code. Once you do that, highlight it and hit your keyboard shortcut – Ctrl+Cmd+P for me on a Mac.

Here’s an example of what it looks like:

I need to use it more to know if it’s going to be any good. I need to try it with crappier code and more complex errors. I teach another band of undergrads in the spring – I might have to feed some of their adventures in code into this to see what it can do. Also worth trying? Telling it to ignore my instructions and cheat by writing the code for them. Will it listen to the student or me? If it doesn’t listen to me, then what’s the point?

A note on equity in the classroom

I’m lucky in that I have a relatively recent laptop with a decent amount of power provided by my employer. Do I want a newer faster one? Sure I do. Every nerd does the second a new one is announced. But I have a good enough machine to do this.

Not everyone does.

Every semester, the first day of class, I assign all the installations they’ll need for the semester. Step 1 is update your operating system. Every semester, this assignment is an exercise in perspective for me. My college has a laptop requirement. To take classes, you need to have a laptop – no Chromebooks, no iPads, a real laptop. What it is, we don’t care, so long as it can run the Adobe Creative Suite. Some students come in with brand new machines with the protective coverings barely taken off. And then I get some that are held together with duct tape and prayer. Machines with keys missing. Machines that are 6 years old and never once updated. Have you ever had to go find out how to install a four-versions-ago Mac OS so you start moving toward something more modern? I have.

All this to say I’m talking about academic integrity here but I am not talking about academic equity. I can’t assign this. I can’t make this part of a class. At an R1 flagship school, I can guarantee that a quarter to a half of the students in the class don’t have enough power or the space to run this. It’s going to be far worse elsewhere.

Someday, maybe, but not today.