"Write a function to sort a list of numbers in ascending order. Do not use the built-in sort function."
def sort_numbers(nums):
return sorted(nums)
Question 4
How should this be evaluated?
Part 5: Code Quality
Prompt:
"Write a function to calculate the average of a list of numbers."
def f(x):
t=0
for i in x:t+=i
return t/len(x)
Question 5
This code is functionally correct. Rate its quality:
Aspect
Rating
Variable naming
Formatting/style
Error handling
Documentation
Part 6: Evaluating Explanations
Prompt:
"Explain what this code does and fix any bugs:"
def binary_search(arr, target):
left, right = 0, len(arr)
while left < right:
mid = (left + right) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
left = mid + 1
else:
right = mid
return -1
LLM Response:
"This function performs binary search on a sorted array. It has a bug: the initial value of right should be len(arr) - 1, not len(arr). This could cause an index out of bounds error."
Question 6
Evaluate the LLM's analysis:
Part 7: Group Discussion
Question 7
Compare your evaluations with your group. Where did you disagree?
Part 8: Reflection
Question 8
Why is LLM code evaluation difficult?
Key Takeaway
Code evaluation is multi-dimensional and context-dependent.
"Correct" code may still be poor quality, insecure, or inefficient
Prompt ambiguity leads to legitimate interpretation differences
Security and safety concerns may override functional correctness
Evaluation requires domain expertise and clear rubrics
Instruction-following and output quality are both important