How to Fix Python Encoding Errors



You tried to read a file or process some text in Python, and you got a confusing error about “encoding,” “UnicodeDecodeError,” or “codec.” These errors look intimidating, but they’re actually quite common and fixable.

Encoding errors happen when Python tries to read text that’s stored in a different format than it expects. Think of it like trying to read a book written in a language you don’t speak — the letters might look like gibberish.

This guide will help you understand and fix Python encoding errors.

What Causes Python Encoding Errors

  • Reading a file with the wrong encoding — The file was saved in one format (like Shift-JIS or Latin-1) but Python is trying to read it as UTF-8 (the most common text format).
  • Special characters in your data — Characters like accented letters (é, ñ), emojis, or non-English text can cause issues if the encoding doesn’t match.
  • Mixing bytes and strings — In Python 3, text (strings) and raw data (bytes) are different types, and converting between them incorrectly causes errors.

Fix 1: Specify the Correct Encoding When Reading Files

The most common encoding error happens when reading files. Python defaults to UTF-8, but not all files use UTF-8.

Example of the error:

# This might raise UnicodeDecodeError
with open("data.csv") as f:
    content = f.read()

Step 1: Try specifying UTF-8 explicitly (this fixes many cases).

# Explicitly specify UTF-8 encoding
with open("data.csv", encoding="utf-8") as f:
    content = f.read()
print("File read successfully!")

If the file reads without errors, you’re done.

Step 2: If UTF-8 doesn’t work, try other common encodings.

# Try Latin-1 (also called ISO-8859-1) — common for European text
with open("data.csv", encoding="latin-1") as f:
    content = f.read()

# Try Shift-JIS — common for Japanese text
with open("data.csv", encoding="shift_jis") as f:
    content = f.read()

# Try CP1252 — common for Windows files
with open("data.csv", encoding="cp1252") as f:
    content = f.read()

Try each one until the file reads without errors and the text looks correct.

Step 3: If you’re not sure what encoding the file uses, detect it automatically.

# First, install the chardet library
# pip install chardet

import chardet

# Read the file as raw bytes first
with open("data.csv", "rb") as f:
    raw_data = f.read()

# Detect the encoding
detected = chardet.detect(raw_data)
print(f"Detected encoding: {detected['encoding']}")
print(f"Confidence: {detected['confidence']}")

# Now read with the detected encoding
with open("data.csv", encoding=detected["encoding"]) as f:
    content = f.read()

If the detected encoding reads the file correctly, you’ve found the right encoding.

Fix 2: Handle Encoding Errors Gracefully

Sometimes you just need to read the file even if a few characters are broken. Python provides error handling options for this.

Step 1: Use errors="replace" to replace unreadable characters with a placeholder.

# Replaces unreadable characters with "?" symbols
with open("data.csv", encoding="utf-8", errors="replace") as f:
    content = f.read()
print(content)

If the file reads and most of the text looks correct, this approach works for your use case.

Step 2: Use errors="ignore" to skip unreadable characters entirely.

# Silently skips characters that can't be decoded
with open("data.csv", encoding="utf-8", errors="ignore") as f:
    content = f.read()

Note: This might lose some data, so only use this when you don’t need every character to be perfect.

Fix 3: Fix Encoding When Writing Files

You might also get encoding errors when writing text to a file, especially if the text contains special characters.

Step 1: Always specify UTF-8 when writing files.

# Write with UTF-8 encoding to support all characters
with open("output.txt", "w", encoding="utf-8") as f:
    f.write("Hello, café! こんにちは 🎉")
print("File written successfully!")

If the file is created and the text looks correct when you open it, you’re good.

Step 2: On Windows, if you’re printing special characters to the console:

# Windows command prompt might not display all characters
# Set the console encoding at the top of your script
import sys
import io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8")

What to Do If It Still Doesn’t Work

  • Check the file source — If you downloaded the file, check if the website or tool offers a UTF-8 version.
  • Open in a text editor — Programs like Notepad++ or VS Code can show you the file’s encoding and let you convert it. In VS Code, check the bottom-right corner.
  • Re-save the file as UTF-8 — Open the file in your text editor and save it with UTF-8 encoding. In VS Code, click the encoding in the bottom-right and choose “Save with Encoding.”
  • Check Python version — Make sure you’re using Python 3, which handles Unicode much better than Python 2.

Summary

  • Python encoding errors happen when the text format doesn’t match what Python expects.
  • The most common fix is to add encoding="utf-8" when opening files.
  • If you don’t know the encoding, use the chardet library to detect it automatically.

Related articles:

  • vscode-japanese-encoding.html
  • file-not-found-error-python.html
  • pip-install-error.html