File handling is an essential part of your coding experience. Without file handling, you will not be able to create programs that are to their full potential. The first task that you will learn in file handling is mostly how to open a file.
But in the real world, it is rare that you will know the exact path to a file. More times than not, you know the directory in which the file is stored. So, here we will learn how to get all files in a directory in Python. Before that, let's revise files and directories to understand the concepts first.
What are files?
A file is a named location on a disk that stores data. You can use Python to read from and write to files, which allows you to save data in a persistent storage location and read it back later.
To work with files in Python, you use the built-in open function to open a file, and then with a statement to ensure that the file is properly closed when you are done with it.
What is a directory?
A directory is a location on a computer's file system that can contain other directories and files. It is also sometimes referred to as a folder. Directories allow you to organize your files and keep them separate from one another.
For example, you might have a directory for documents, another for pictures, and another for music. Each directory can contain multiple files and subdirectories, which allows you to create a hierarchical structure for your files.
What connects files and directories?
In a computer's file system, files and directories are connected through the use of paths. A path is a string that specifies the location of a file or directory in the file system. There are two types of paths: absolute paths and relative paths.
An absolute path is a complete path to a file or directory that begins at the root of the file system. It specifies the exact location of the file or directory, regardless of the current working directory.
On the other hand, a relative path is a path to a file or directory that is relative to the current working directory. It specifies the location of a file or directory in relation to the current directory.
In short, Files are collections of information. A collection of files can be stored under a common name, called a folder or directory. There are various situations where one might need to know the contents of a directory. For example, when we do not know the file’s full name but know its directory, we can list the directory to search for the file.
How to list all the files in a directory in Python?
There are various modules that python provides that you can use to access and list all the files in any given directory. Broadly all the functions that we can use come under three modules, the os module, and the path module.
1) OS module
The OS module provides many functions that can be used to list all the files stored in a given directory in python. The os.listdir() is the most common method that you will find to list all the files that are present in any given directory. It is easy to use.
When the path to a directory is passed as an argument to the os.listdir() function, it returns a list that contains all the files in the directory. A code example of this is:
import os path = "D:/ABRAR/UNIVERSITY/Advanced ML/Website Generator" print(os.listdir(path=path))
Output:
['input.py', 'main.py', 'whisper.py']
The os.walk() function does not return a list. But rather it returns file names. These file names are all the files that exist in the directory. The os.walk() can be used when you want to iterate over all the files that are present in the directory one by one.
import os path = "D:/ABRAR/UNIVERSITY/Advanced ML/Website Generator" for (root,dirs,file) in os.walk(path): for x in file: print(x)
Output:
input.py main.py whisper.py
The os.scandir is a function in the os module that can be used only in Python 3.5 and greater. The os.scandir() returns an object instead of an iterable but rather it returns an object. This object is of the os.DirEntry type.
What this means is that the object contains all the entries of the directory given to it. We can use the is_file() function to see if the entry that is being checked is a file or not.
import os path = "D:/ABRAR/UNIVERSITY/Advanced ML/Website Generator" entries = os.scandir(path) for val in entries: if val.is_file(): print(val.name)
Output:
input.py main.py whisper.py
2) Glob Module
We can check and retrieve files in a directory in python that matches certain patterns. This can be done using the glob module. There are two methods that we can use in the glob module to find files that match a given pattern, these are: glob() and iglob() method.
The glob function returns a list of file and directory names that match a given pattern. The pattern can contain wildcards, such as * to match any sequence of characters, or ? to match any single character.
The iglob function is similar to glob, but instead of returning a list of matching files and directories, it returns an iterator that yields the matches one at a time. This can be more efficient when working with large numbers of files, as it allows you to process the files as they are found, rather than waiting for the entire list to be generated.
Here is the code:
import glob path = "D:/ABRAR/UNIVERSITY/Advanced ML/Website Generator/" print("Using glob") names = glob.glob(path+'*.py') print(names) print("Using iglob") cnames = glob.iglob(path+'*.*py') for name in cnames: print(name)
Output:
Using glob ['D:/ABRAR/UNIVERSITY/Advanced ML/Website Generator\\input.py', 'D:/ABRAR/UNIVERSITY/Advanced ML/Website Generator\\main.py', 'D:/ABRAR/UNIVERSITY/Advanced ML/Website Generator\\whisper.py'] Using iglob D:/ABRAR/UNIVERSITY/Advanced ML/Website Generator\input.py D:/ABRAR/UNIVERSITY/Advanced ML/Website Generator\main.py D:/ABRAR/UNIVERSITY/Advanced ML/Website Generator\whisper.py
Also, you should learn how to overwrite a file in python.
Conclusion
File access is important in the arsenal of a coder. Without the knowledge of handling files, one cannot do much in computer systems. And now you know how to get a list of files in a directory in Python. There are various methods to list the directories using the os and glob modules.