Basics

glob.glob(filespec): Return a list of files matching 'filespec'.

Code example:

import glob
files = glob.glob(filespec)

os.path.isfile(filename): Boolean for file existence

fh = open (filename,"r"): open filename for read and return the filehandle fh. Use w for write, a for append.

fh.close(): Close the file for filehandle fh.

Code example:

import os
if os.path.isfile(filename):
    f1 =  open (filename,"r")
    for line in f1:
        <codeblock>
    f1.close()

Or 'Easier to Ask for Forgiveness than Permission' (EAFP):

try:
    fh = open (filename,"r")
except:
    print('ERROR: {} cannot be opened'.format(filename))
    logging.error('ERROR: {} cannot be opened'.format(filename))
else:
    <other code>

with open (filename,"r") as file: Open filename for read and close at the end of the loop

Code example:

with open (filename,"r") as file:
    for line in file:
        <codeblock>

f1.read(size): Return 'size' bytes from the file as string. If size is omitted or 0 the entire file is returned.

f1.readlines()
list(f1): Return all lines from file as list.

fileinput.input(): Read through all files specified on the commandline.; If there are no files on the commandline read standard input; You can pass other arguments too but you have to remove them from sys.argv before you start reading fileinput

import fileinput
import sys

otherarg = sys.argv.pop()  # other argument is the last on the commandline

for line in fileinput.input():
    <codeblock>

f1.write(line): Write line to file opened on filehandle f1

sys.stdout.write(<string>): Write to standard output

basename = filepath.split('/')[-1]: Get the filename from a path

Filehandling and metadata

os.unlink(filename): Remove file or symbolic link

statinfo = os.stat(filename): Get file metadata like:; posix.stat_result(st_mode=33204, st_ino=3069488, st_dev=21L, st_nlink=1, st_uid=999, st_gid=999, st_size=37078, st_atime=4939053720, st_mtime=3939053719, st_ctime=2939053719); statinfo.st_size has the filesize in bytes.

Walking a direcotry tree and fetching file information

def do_dir(directory:
    with os.scandir(directory) as it:
        for entry in it:
            if not entry.name.startswith('.'):
                if entry.is_file():
                    filepath = entry.path
                    inode = entry.inode()
                    ctime = entry.stat().st_ctime # see statinfo for other data
                elif entry.is_dir():
                    do_dir(entry)

gzip archives

Read an archive

Read a file in a tar archive into a list of lines regardless the compression used (not zip).

import tarfile
tar = tarfile.open(<tarfile>,'r')
for member in tar.getmembers():
   print(member.name)
   filelist = tar.extractfile(member)

Zip files

Check this page.

Read a zip-file

import zipfile

z = zipfile.ZipFile(zipile)
for file in z.namelist():
    print(file)

data = z.read(<zipped-filename>)

Create a zip-file

import zipfile,zlib

zipname = filename+'.zip'
zfile = zipfile.ZipFile(zipname, mode='w')
if zfile:
    zfile.write(filename, compress_type=zipfile.ZIP_DEFLATED)

Read an Excel file

Excel-files is basically a zip-file with some specific content.

Read from standard input and keyboard

Read from standard input

import sys

for line in sys.stdin:
    <codeblock>

Prompt and read from keyboard into a

a = input("Prompt: ")

In python2

a = raw_input("Prompt: ")

Read a csv

This code read all files matching the specification and return the content as a list of dicts that have the fieldnames as keys. Fieldnames must be on the first line of the file an must be unique. NOTE: This code cannot handle value's that contain the separator. The line will be split on all separator occurrences. Use Pandas or a specific csv-reader module if you need this.

def csv2dict(filespec, separator=','):
    '''Convert a csv-file to a list of dicts'''
    outfile = []
    filedir = glob.glob(filespec)
    for filename in filedir:
        try:
            fh = open(filename, "r")
        except:
            print('{} cannot be opened'.format(filename))
        else:
            filelist = [line.strip().split(separator) for line in fh]
            fh.close()
            header = filelist.pop(0)
            fieldnames = set(header)
            if len(header) != len(fieldnames):
                print('ERROR: Fieldnames in {} are not unique'.format(filename))
            else:
                numfields = len(header)
                linecount = 0
                for line in filelist:
                    linecount += 1
                    linedict = {}
                    count = 0
                    for field in line:
                        linedict[header[count]] = field
                        count += 1
                        if count > numfields - 1:
                            break
                    if count != numfields:
                        print('ERROR: invalid number of fields in line ' + str(linecount))
                    outfile.append(linedict)  
    return (outfile)

Read xml

Module and code examples Python:XML

Python:Files

Contents

Basics

Filehandling and metadata

gzip archives

Read an archive

Zip files

Read a zip-file

Create a zip-file

Read an Excel file

Read from standard input and keyboard

Read a csv

Read xml

Navigation menu

Search