Opening Thoughts
Have you ever encountered situations where your code, after completion, keeps producing unexpected errors during execution? What's more frustrating is when the error messages are cryptic and you don't know where to start. As a programmer who has been writing Python for over ten years, I deeply relate to this. Today, I'd like to share some insights I've gained while debugging Python code.
Fundamentals
When it comes to debugging, many people's first instinct is to use the print method. True, this is indeed the most intuitive approach. I remember when I first started learning Python, I would put print statements everywhere. But have you ever wondered why many experienced programmers advise against over-relying on print?
Let's look at a real case. The other day, while helping a student debug, I saw their code filled with statements like this:
print("debug1")
result = complex_calculation()
print("debug2")
print(result)
print("debug3")
What's wrong with this debugging approach? First, these print statements make the code very messy. Second, if your program is complex, the printed information will be overwhelming, making it like finding a needle in a haystack.
I recommend using Python's logging module instead of print:
import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)
def complex_calculation():
logger.debug("Starting calculation")
# Actual calculation process
logger.debug(f"Calculation result: {result}")
return result
What are the benefits? First, you can set different log levels (DEBUG, INFO, WARNING, ERROR, CRITICAL) to filter information. Second, logs can include timestamps, function names, and other context information, which is particularly helpful for problem locating.
Advanced Techniques
Speaking of this, I must mention Python's debugging tool—pdb. I remember when I first discovered pdb, it felt like finding a new world. Did you know that pdb allows you to pause your program during execution, check the values of all current variables, and even modify variables dynamically?
Here's a specific example:
def calculate_average(numbers):
import pdb; pdb.set_trace() # Set breakpoint
total = 0
count = 0
for num in numbers:
if num > 0:
total += num
count += 1
return total / count
numbers = [1, -2, 3, -4, 5]
result = calculate_average(numbers)
When the program runs to pdb.set_trace(), it will automatically pause. At this point you can: - Enter n to execute the next line - Enter p variable_name to view variable values - Enter c to continue running - Enter q to quit debugging
I particularly like using pdb because it lets me observe the code execution process "in person." Sometimes just looking at the code might not reveal why an error occurs, but by executing step by step with pdb, the problem often becomes clear.
Practical Experience
After all this theory, let's get practical. Last year, I encountered a particularly tricky problem while working on a data processing project. The program needed to process large amounts of data but would crash at certain points, with the crash location varying each time.
Initially, I also used the print method, but the printed information was overwhelming and I couldn't locate the problem. Later, I thought of a technique: using the binary search method to locate the problem. Specifically:
def process_data(data):
midpoint = len(data) // 2
try:
# Process first half
process_subset(data[:midpoint])
# If first half is okay, process second half
process_subset(data[midpoint:])
except Exception as e:
if len(data) <= 10: # Print when data size is small enough
print(f"Error occurred with data: {data}")
raise e
else:
print(f"Error in subset of size {len(data)}")
# Recursively process smaller datasets
process_data(data[:midpoint])
process_data(data[midpoint:])
Using this method, I quickly discovered the problem was with some special characters in the data. This made me realize that debugging is like detective work, requiring methodical collection of clues and gradually narrowing down the suspects.
Common Pitfalls
Speaking of debugging, I must mention some common pitfalls. Over the years, I've found many Python beginners encounter similar issues.
For example, variable scope issues:
total = 0
def add_to_total(number):
total += number # This will raise an error
return total
This code looks fine but will raise an UnboundLocalError. Why? Because Python treats the total inside the function as a local variable, not referencing the external total. The correct way is:
total = 0
def add_to_total(number):
global total
total += number
return total
Another example is shallow copy vs deep copy:
original_list = [[1, 2, 3], [4, 5, 6]]
copied_list = original_list.copy() # Shallow copy
copied_list[0][0] = 999
print(original_list) # [[999, 2, 3], [4, 5, 6]]
Many people are surprised to find that modifying copied_list affects original_list. This is because the copy() method only creates a copy of the outermost layer, while the inner lists remain references. To solve this, you need to use deep copy:
import copy
original_list = [[1, 2, 3], [4, 5, 6]]
copied_list = copy.deepcopy(original_list) # Deep copy
Tool Recommendations
At this point, I must recommend some debugging tools I frequently use. Besides pdb mentioned earlier, there are:
-
PyCharm's debugger: Provides a graphical interface, making it more intuitive. You can set conditional breakpoints, view variables, and even modify code during debugging.
-
memory_profiler: Used for analyzing memory usage. It's simple to use:
from memory_profiler import profile
@profile
def my_function():
a = [1] * (10 ** 6)
b = [2] * (2 * 10 ** 7)
del b
return a
After running, it will show the memory usage of each line of code.
- cProfile: Used for performance analysis. For example:
import cProfile
def slow_function():
total = 0
for i in range(1000000):
total += i
return total
cProfile.run('slow_function()')
This will show the execution time of each operation in the function, helping you find performance bottlenecks.
Insights
After many years of Python programming experience, I've summarized several debugging insights:
-
Think before debugging: When encountering problems, don't rush to debug. Spend a few minutes thinking about possible causes first, so you'll have direction when debugging.
-
Keep code clean: Good code structure itself can reduce bugs. I often see functions hundreds of lines long, which are difficult to debug. It's recommended to split large functions into smaller ones, with each function doing only one thing.
-
Write good unit tests: Many bugs can actually be found when writing unit tests. For example:
import unittest
def calculate_average(numbers):
return sum(numbers) / len(numbers)
class TestCalculateAverage(unittest.TestCase):
def test_normal_case(self):
self.assertEqual(calculate_average([1, 2, 3]), 2.0)
def test_empty_list(self):
with self.assertRaises(ZeroDivisionError):
calculate_average([])
- Make good use of version control: I've seen many people randomly modify code while debugging, eventually making it a mess. Using version control systems like Git allows you to modify code confidently because you can always return to previous versions.
Conclusion
Debugging is a skill every programmer must master. Just as traditional Chinese medicine emphasizes looking, listening, asking, and feeling, program debugging also requires multiple approaches and flexible use of various tools and methods. Do you have any unique debugging insights? Feel free to share in the comments.
By the way, if you're interested in any specific debugging technique, let me know, and we can discuss it further. After all, on this programming journey, we're all fellow travelers, and sharing and communication help us progress together.