Working with Files in Python
Have you ever worked with a large number of files and found yourself feeling overwhelmed by the task at hand? Python has a variety of powerful tools that can help you manage your files in a more efficient and organized way.
In this article, we will explore some of the most commonly used file-handling techniques in Python and demonstrate how they can be used to streamline your workflow. Python’s “with open() as …” pattern
Let’s start with the basics.
Opening and reading files is an essential part of working with Python. In Python, you can use the “with open() as …” pattern to open a file and automatically close it when you are done.
Here is an example:
with open('filename.txt', 'r') as file:
contents = file.read()
In the above example, we have opened a file named ‘filename.txt’ in read mode and assigned the contents of the file to a variable called ‘contents’. When the ‘with’ block ends, the file is automatically closed.
You can also use this pattern to write to files by using the ‘w’ mode instead of ‘r’.
Getting a Directory Listing
When working with multiple files, it is often helpful to get a list of all the files in a directory. There are two different ways to do this in Python, depending on which version you are using.
Directory Listing in Legacy Python Versions
If you are using a legacy version of Python (2.x), you can use the ‘os.listdir()’ function to get a list of all the files in a directory. Here is an example:
import os
files = os.listdir('/path/to/directory')
In the above example, the ‘os.listdir()’ function takes a path to a directory as an argument and returns a list of all the files in that directory.
Directory Listing in Modern Python Versions
If you are using a more recent version of Python (3.x), you have access to a more powerful function called ‘os.scandir()’. This function returns a list of directory entries along with their attributes, making it easier to work with directories and files.
Here is an example:
import os
with os.scandir('/path/to/directory') as entries:
for entry in entries:
print(entry.name, entry.path)
In the above example, the ‘os.scandir()’ function returns an iterable that we can use to loop through all the directory entries and print their names and paths.
Listing All Files in a Directory
Once you have a list of all the files in a directory, you might want to filter out certain files based on their names or extensions. Here are a few ways to do that in Python:
import os
# List all txt files in a directory
txt_files = [filename for filename in os.listdir('/path/to/directory') if filename.endswith('.txt')]
# List all files in a directory using pathlib
from pathlib import Path
all_files = [entry.name for entry in Path('/path/to/directory').iterdir() if entry.is_file()]
In the first example, we use a list comprehension to filter out only the files that end with ‘.txt’. In the second example, we use the ‘pathlib’ module to get an iterator of all the files in the directory and filter out only the files (not directories) using the ‘is_file()’ method.
Getting File Attributes
Sometimes you might need to get more information about a file, such as its size, permissions, or modification time. The ‘os.stat()’ function returns a ‘stat_result’ object that contains all this information and more.
Here is an example:
import os
file_stats = os.stat('/path/to/file.txt')
print(file_stats.st_size, file_stats.st_mode, file_stats.st_mtime)
In the above example, we print the file size, mode (permissions), and modification time.
Making Directories
Python also provides functions for creating directories. Here are two ways to create directories:
Creating a Single Directory
import os
os.mkdir('/path/to/newdirectory')
In the above example, we use the ‘os.mkdir()’ function to create a new directory at the specified path.
Creating Multiple Directories
import os
os.makedirs('/path/to/newdirectory/moredirectories')
In the above example, we use the ‘os.makedirs()’ function to create multiple directories at once.
Filename Pattern Matching
Sometimes you might want to filter out files based on more complex patterns than just their file extensions. There are a few different ways to do this in Python.
Using String Methods
import os
files = os.listdir('/path/to/directory')
filtered_files = [filename for filename in files if filename.startswith('prefix_') and filename.endswith('_suffix')]
In the above example, we use the ‘startswith()’ and ‘endswith()’ methods of strings to filter out only the files that start with ‘prefix_’ and end with ‘_suffix’. Simple
Filename Pattern Matching Using fnmatch
import os
import fnmatch
files = os.listdir('/path/to/directory')
filtered_files = fnmatch.filter(files, 'pattern*')
In the above example, we use the ‘fnmatch.filter()’ function to filter out only the files that match the given pattern.
More Advanced Pattern Matching
import os
import fnmatch
files = os.listdir('/path/to/directory')
filtered_files = [filename for filename in files if fnmatch.fnmatchcase(filename, '*pattern[!0-9]*.txt')]
In the above example, we use the ‘fnmatch.fnmatchcase()’ function to filter out only the files that match the given pattern, which includes a regex-like pattern that matches files that contain ‘pattern’ but not followed by a number and ending with ‘.txt’.
Traversing Directories and Processing Files
When working with files, you might need to traverse through a directory tree and process each file. There are a few different ways to do this in Python.
os.walk()
import os
for root, dirs, files in os.walk('/path/to/directory'):
for file in files:
# Process each file
In the above example, we use the ‘os.walk()’ function to traverse through all the directories in the specified tree and process each file.
pathlib.Path.rglob()
from pathlib import Path
for file in Path('/path/to/directory').rglob('*'):
if file.is_file():
# Process each file
In the above example, we use the ‘rglob()’ method of the ‘Path’ object to recursively traverse through all the directories in the tree and get an iterator of all the files.
fileinput module
import fileinput
for line in fileinput.input('/path/to/file.txt'):
# Process each line in the file
In the above example, we use the ‘fileinput.input()’ function to loop through all the lines in the specified file.
Making Temporary Files and Directories
Sometimes you might need to create temporary files or directories to store temporary data. Python’s ‘tempfile’ module provides functions for creating temporary files and directories that are automatically deleted when they are closed.
Here are a few examples:
import tempfile
# Create a temporary file
with tempfile.NamedTemporaryFile(mode='w', delete=False) as temp_file:
# Process the file
# Create a temporary directory
with tempfile.TemporaryDirectory() as temp_dir:
# Process the directory
In the first example, we use the ‘NamedTemporaryFile()’ function to create a temporary file with the ‘w’ mode. In the second example, we use the ‘TemporaryDirectory()’ function to create a temporary directory.
Deleting Files and Directories
Python provides functions for deleting files and directories. Here are a few examples:
Deleting Files in Python
import os
os.remove('/path/to/file.txt')
In the above example, we use the ‘os.remove()’ function to delete the specified file.
Deleting Directories
import os
os.rmdir('/path/to/directory')
In the above example, we use the ‘os.rmdir()’ function to delete the specified directory. Note that the directory must be empty before it can be deleted.
Deleting Entire Directory Trees
import shutil
shutil.rmtree('/path/to/directory')
In the above example, we use the ‘shutil.rmtree()’ function to delete the entire directory tree, including all files and directories.
Copying, Moving, and Renaming Files and Directories
Python provides functions for copying, moving, and renaming files and directories.
Here are a few examples:
Copying Files in Python
import shutil
shutil.copy('/path/to/sourcefile.txt', '/path/to/destinationfile.txt')
In the above example, we use the ‘shutil.copy()’ function to copy the source file to the destination file.
Copying Directories
import shutil
shutil.copytree('/path/to/sourcedirectory', '/path/to/destinationdirectory')
In the above example, we use the ‘shutil.copytree()’ function to copy the source directory to the destination directory.
Moving Files and Directories
import shutil
shutil.move('/path/to/sourcefile.txt', '/path/to/filename.txt')
In the above example, we use the ‘shutil.move()’ function to move the source file to the destination file.
Renaming Files and Directories
import os
os.rename('/path/to/sourcefile.txt', '/path/to/destinationfile.txt')
In the above example, we use the ‘os.rename()’ function to rename the source file to the destination file.
Archiving
Python provides modules for working with archives, such as ZIP and TAR files. Here are a few examples:
Reading ZIP Files
import zipfile
with zipfile.ZipFile('/path/to/archive.zip', 'r') as archive:
archive.printdir()
In the above example, we use the ‘zipfile.ZipFile()’ function to open the ZIP file in read mode and print out the contents of the archive.
Extracting ZIP Archives
import zipfile
with zipfile.ZipFile('/path/to/archive.zip', 'r') as archive:
archive.extractall('/path/to/destination')
In the above example, we use the ‘extractall()’ method of the ‘ZipFile’ object to extract all the files in the archive to the specified directory.
Extracting Data From Password Protected Archives
import zipfile
with zipfile.ZipFile('/path/to/archive.zip') as archive:
try:
archive.extractall(pwd=b'password')
except:
print('Incorrect password')
In the above example, we use the ‘pwd’ argument of the ‘extractall()’ method to specify the password for the archive.
Creating New ZIP Archives
import zipfile
with zipfile.ZipFile('/path/to/newarchive.zip', 'w') as archive:
archive.write('/path/to/file.txt')
In the above example, we use the ‘zipfile.ZipFile()’ function to create a new ZIP file in write mode and add a file to the archive using the ‘write()’ method.
Opening TAR Archives
import tarfile
with tarfile.open('/path/to/archive.tar', 'r') as archive:
archive.list()
In the above example, we use the ‘tarfile.open()’ function to open the TAR archive in read mode and list the contents of the archive using the ‘list()’ method.
Extracting Files From a TAR Archive
import tarfile
with tarfile.open('/path/to/archive.tar', 'r') as archive:
archive.extractall('/path/to/destination')
In the above example, we use the ‘extractall()’ method of the ‘TarFile’ object to extract all the files in the archive to the specified directory.
Creating New TAR Archives
import tarfile
with tarfile.open('/path/to/newarchive.tar', 'w') as archive:
archive.add('/path/to/file.txt')
In the above example, we use the ‘tarfile.open()’ function to create a new TAR archive in write mode and add a file to the archive using the ‘add()’ method.
Working With Compressed Archives
Python provides modules for working with compressed archives, such as GZIP and BZIP2 files. Here are a few examples:
import gzip
# Reading from a GZIP file
with gzip.open('/path/to/archive.gz', 'r') as archive:
contents = archive.read()
# Writing to a GZIP file
with gzip.open('/path/to/newarchive.gz', 'w') as archive:
archive.write(b'data')
import bz2
# Reading from a BZIP2 file
with bz2.open('/path/to/archive.bz2', 'r') as archive:
contents = archive.read()
# Writing to a BZIP2 file
with bz2.open('/path/to/newarchive.bz2', 'w') as archive:
archive.write(b'data')
Conclusion
Python provides a variety of powerful tools for working with files and archives, making it easy to manage large numbers of files efficiently. Whether you need to perform simple file operations or work with complex directory structures and archives, there is a Python function or module that can help you get the job done.
Directory Listing in Legacy Python Versions
In legacy Python versions (Python 2.x), you can use the ‘os.listdir()’ function to get a list of all the files in a directory. Here is an example:
import os
files = os.listdir('/path/to/directory')
print(files)
In the above example, the ‘os.listdir()’ function takes a path to a directory as an argument and returns a list of all the files in that directory. This function only returns the names of the files in the directory and does not include any additional information about the files themselves.
Directory Listing in Modern Python Versions
If you are using a more recent version of Python (Python 3.x), you can use the ‘os.scandir()’ function or the ‘pathlib.Path()’ object to get a directory listing. These tools provide more functionality and information about the files in the directory.
os.scandir()
The ‘os.scandir()’ function returns a list of directory entries along with their attributes, making it easy to work with directories and files. Here is an example:
import os
with os.scandir('/path/to/directory') as entries:
for entry in entries:
print(entry.name, entry.path, entry.stat().st_size)
In the above example, the ‘os.scandir()’ function returns an iterable that we can use to loop through all the directory entries and print their names, paths, and sizes using the ‘stat()’ method.
pathlib.Path()
The ‘pathlib.Path()’ object provides a more object-oriented way to work with files