Adventures in Machine Learning

Mastering File Manipulation in Python with OS and Pathlib Modules

Part 1: Getting File Size in Python

Python is an incredibly versatile and popular programming language used for a variety of tasks across various industries and disciplines. One common task that many Python developers encounter is the need to work with files – this can include tasks such as reading and writing files, manipulating file data, and getting file information such as a file’s size or path.

In this article, we will focus on two important modules in Python for file manipulation – the OS and pathlib modules.

1. Using os.path.getsize()

One common task when working with files is getting the file size.

Luckily, Python provides several methods to achieve this. Let’s take a look at these methods:

The os module in Python provides a utility function called os.path.getsize() that can be used to get the size of a file. This function takes in a file path as input and returns the size of the file in bytes.

Here’s an example:

import os
file_size = os.path.getsize('/path/to/file')
print(f"File size is {file_size} bytes.")

In this code snippet, we import the os module and then use the getsize() method with the file path ‘/path/to/file’. The method returns the file size in bytes, which we then print to the console.

2. Converting File Size to KB, MB, or GB

Getting the file size in bytes may not be very useful, especially if the file is very large.

In such cases, it’s more helpful to get the file size in kilobytes (KB), megabytes (MB), or gigabytes (GB). Here’s how you can achieve this:

import os
file_size = os.path.getsize('/path/to/file')
# Convert bytes to KB
kb_size = file_size/1024
print(f"File size is {kb_size} KB.")
# Convert bytes to MB
mb_size = file_size/(1024*1024)
print(f"File size is {mb_size} MB.")
# Convert bytes to GB
gb_size = file_size/(1024*1024*1024)
print(f"File size is {gb_size} GB.")

In this code snippet, we start by getting the file size in bytes using the getsize() method. We then convert the file size to KB, MB, or GB by dividing the file size by the appropriate value (1024 for KB, 1024*1024 for MB, and 1024*1024*1024 for GB).

3. Using os.stat() Method

The os module provides another method called os.stat() that can be used to get information about a file, including its size.

Here’s an example:

import os
file_stats = os.stat('/path/to/file')
file_size = file_stats.st_size
print(f"File size is {file_size} bytes.")

In this code snippet, we use the os.stat() method to get file statistics for ‘/path/to/file’. We then extract the file size from the returned file statistics object using the st_size attribute.

4. Using pathlib Module

The pathlib module in Python provides a more object-oriented way of working with file paths and file manipulation tasks.

Here’s an example of how you can use the Path class in the pathlib module to get the file size:

from pathlib import Path
file_size = Path('/path/to/file').stat().st_size
print(f"File size is {file_size} bytes.")

In this code snippet, we import the Path class from the pathlib module and use it to create a Path object for ‘/path/to/file’. We then use the stat() method to get the file statistics object and extract the file size using the st_size attribute.

5. Getting File Size of a File Object

Finally, if you’re working with a file object instead of a file path, you can also get the file size using the seek() and tell() methods.

Here’s an example:

with open('/path/to/file', 'rb') as f:
    f.seek(0, 2)  # Move file pointer to the end of the file
    file_size = f.tell()  # Get current file pointer position, which is equal to the file size
    print(f"File size is {file_size} bytes.")

In this code snippet, we use the open() function to create a file object for ‘/path/to/file’. We then use the seek() method to move the file pointer to the end of the file and use the tell() method to get the current file pointer position, which is equal to the file size in bytes.

Part 2: Understanding the OS and Pathlib Modules

1. Overview of OS Module

Now that we’ve covered methods for getting file size in Python, let’s take a closer look at the OS and pathlib modules and their functionalities.

The os module in Python provides a range of functions for interacting with the operating system, including file operations like working with file paths, directories, and processes. The functions in the os module generally fall into two categories – file path manipulation and file-related operations.

2. Absolute and Relative File Paths

When working with files, it’s important to understand the concepts of absolute and relative file paths.

An absolute path refers to the full path from the root directory to the file, while a relative path refers to the path from the current working directory to the file. For example, ‘/home/user/file.txt’ is an absolute path, while ‘file.txt’ is a relative path.

3. Using the pathlib Module

The pathlib module in Python provides a more readable and intuitive way of working with file paths, directories, and file-related operations.

The module provides several classes such as Path and PurePath, which wrap around filesystem paths and provide methods for working with paths, such as joining paths, getting file extensions, or differentiating between relative and absolute paths. In conclusion, Python provides versatile and useful ways of working with files.

The OS and pathlib modules allow Python developers to manipulate file data, get file size, work with absolute and relative file paths, and perform file-related operations. Understanding these modules’ capabilities is essential for developing robust code that efficiently interacts with files and the operating system.

In this article, we explored various methods for getting the file size in Python, including using the OS module, pathlib module, and file objects. Additionally, we discussed the overview of the OS and pathlib modules, including understanding absolute and relative file paths and the use of the Path class.

The article emphasizes the importance of understanding these modules’ capabilities to efficiently interact with files and the operating system. By utilizing these methods, Python developers can easily manipulate file data, work with different types of file paths, and perform file-related operations.

Popular Posts