python implements tail-f function

Posted by markstrange on Thu, 16 Jan 2020 19:30:52 +0100

This article was originally written because there was an interview in the reboot group, and the written examination question had this question. I don't know how to do it, what kind of thinking, so I sent it to everyone in the group for discussion

I've thought about it for a moment. Let's briefly talk about my idea. Of course, there is also a very useful pyinotify module to monitor file changes. But what I would like to introduce is the solution. After all, as an interviewer, I still want to see the solution. I think the difficulty of this problem is not to monitor file increments, but how to print the last 10 lines

I hope you will have a simple understanding of python foundation, processing files and common modules before reading this article, and know what the following terms are

open('a.txt')
file.seek
file.tell
time.sleep()

The following ideas are limited to my personal knowledge. There are inevitably mistakes and thoughtless ideas. I hope you can come up with better methods. I can optimize the code at any time. There are not too many holes in the topic. Let the innocent snail teach you how to use python

How to implement in python

In fact, it's not hard to think about it

  • Open this file and move the pointer to the end
  • Try to read line once every second, print out if there is content, sleep if there is no content
  • That's what I mean

Monitoring files

The ideas are as follows:

  • Open file with open
  • Use seek file pointer to jump to the back of the file
  • while True to loop
    Read line continuously. If you can read the content, print it out

Code is coming

with open('test.txt') as f:
    f.seek(0,2)
    while True:
        last_pos = f.tell()
        line = f.readline()
        if line:
            print line

Code description

  • The second parameter of seek is 2, which means seek starts from the end of the file. The more standard writing method uses SEEK_END under the os module, which is more readable
  • Only simple logic is written, and the code is simple and rough. If the topic is 10 points, you can get 4 points at most, no more

Optimization point

  • print has defects. Each time, it's a new line. It's more harmonious to replace sys.stdout.write(line)
  • File name parameter, can't write dead
  • Direct printing can be regarded as the default behavior. What to do can be written as function processing, so that we can do other processing for the new line, such as display in the browser
  • Plus fault-tolerant processing, for example, if the file does not exist, an error will be reported
  • while True is always a file, which consumes performance. Every time you read it, every second is reliable
    Call time.sleep(1)
  • Organize code with classes

The example code is as follows

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import sys
import time

class Tail():
    def __init__(self,file_name,callback=sys.stdout.write):
        self.file_name = file_name
        self.callback = callback
    def follow(self):

        try:
            with open(self.file_name) as f:
                f.seek(0,2)
                while True:
                    line = f.readline()
                    if line:
                        self.callback(line)
                    time.sleep(1)
        except Exception,e:
            print 'Failed to open the file. See if the file does not exist or if there is a problem with permissions'
            print e

usage method:

# Print to screen using default sys.stdout.write
py_tail = Tail('test.txt')
py_tail.follow()

# Define your own processing functions

def test_tail(line):
    print 'xx'+line+'xx'

py_tail1 = Tail('test.txt', test_tail)
py_tail1.follow()

Eh, wait a minute, tail-f will print the last 10 lines by default, which seems to be the difficulty of this problem. As we all know, reading file pointer in python can only move to a fixed position, and it can't judge which line it is. It's easy to implement first, and it's gradually strengthened

Default print last 10 lines

Now this code can get 6 points. We have another function that we haven't done. That is to print the last n lines. The default is 10 lines. Now add this function and add a function

When the files were small

We know that readlines can get all the contents, and the branches and codes are ready to come out. It's easy to get the last 10 lines of the list. Do you have them? They are sliced properly

# Demonstrate the code and explain the core logic. The complete code is as follows
last_lines = f.readlines()[-10:]
for line in last_lines:
    self.callback(line)

Now the code is like this

import sys
import time

class Tail():
    def __init__(self,file_name,callback=sys.stdout.write):
        self.file_name = file_name
        self.callback = callback
    def follow(self,n=10):
        try:
            with open(self.file_name) as f:
                self._file = f
                self.showLastLine(n)
                self._file.seek(0,2)
                while True:
                    line = self._file.readline()
                    if line:
                        self.callback(line)
        except Exception,e:
            print 'Failed to open the file. See if the file does not exist or if there is a problem with permissions'
            print e
    def showLastLine(self, n):
        last_lines = self._file.readlines()[-10:]
        for line in last_lines:
            self.callback(line)

Further: what to do with big logs

At this time, when the code has 7 minutes, it's very random. But if the file is very large, especially the log file, it's easy to have several G's. We only need the last few lines to read all the memory, so we need to continue to optimize the showLastLine function. I think that's the difficulty of this problem

The general idea is as follows

  • I first estimate that the log line is about 100 characters. Note, I only estimate one. It doesn 't matter if there is more. We just need an initial value, which will be corrected later

  • I want to read ten lines, so from the beginning seek to the position 1000 from the end seek(-1000,2), read out the last 1000 characters, and then make a judgment

  • If the 1000 characters are longer than the file length, regardless of the level 1 situation, read and split directly

  • If the new line character in 1000 characters is greater than 10, it means that in 1000 characters, it's easy to do. The processing idea is similar to level 1. split directly takes the next ten

  • The question is what to do if it's less than 10
    For example, in 1000 characters, there are only four line breaks, indicating that one line is about 1000 / 4 = 250 characters, and we are still short of six lines, so we read 250 * 5 = 1500 again, and there are about 2500 characters that are relatively reliable. Then we make the same logical judgment on 2500 characters, until the read characters, the line breaks are greater than 10, and the reading is finished

After the logic is clear, the code will come out

Annotated version

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import sys
import time

class Tail():
    def __init__(self,file_name,callback=sys.stdout.write):
        self.file_name = file_name
        self.callback = callback
    def follow(self,n=10):
        try:
            # Open file
            with open(self.file_name) as f:
                self._file = f
                self._file.seek(0,2)
                # Character length of storage file
                self.file_length = self._file.tell()
                # Print last 10 lines
                self.showLastLine(n)
                # Continuous read file print increment
                while True:
                    line = self._file.readline()
                    if line:
                        self.callback(line)
                    time.sleep(1)
        except Exception,e:
            print 'Failed to open the file. See if the file does not exist or if there is a problem with permissions'
            print e
    def showLastLine(self, n):
        # About 100 in a row. Change this number to 1 or 1000
        len_line = 100
        # n the default value is 10. You can also pass in the follow parameter
        read_len = len_line*n
        # Use last lines to store the last content to be processed
        while True:
            # If the 1000 characters to be read are greater than the length of the previously stored file
            # After reading the file, break it directly
            if read_len>self.file_length:
                self._file.seek(0)
                last_lines = self._file.read().split('\n')[-n:]
                break
            # First read 1000 characters and then judge the number of line breaks in 1000 characters
            self._file.seek(-read_len, 2)
            last_words = self._file.read(read_len)
            # count is the number of line breaks
            count = last_words.count('\n')
            
            if count>=n:
                # If the number of line breaks is greater than 10, it is easy to process and read directly
                last_lines = last_words.split('\n')[-n:]
                break
            # Not enough 10 line breaks
            else:
                # break
                #Not enough ten lines
                # If there is no line break, then we think there are about 100 lines
                if count==0:

                    len_perline = read_len
                # If there are four line breaks, we think there are about 250 characters in each line
                else:
                    len_perline = read_len/count
                # The length to be read becomes 2500, continue to judge again
                read_len = len_perline * n
        for line in last_lines:
            self.callback(line+'\n')
if __name__ == '__main__':
    py_tail = Tail('test.txt')
    py_tail.follow(20)

The effect is as follows. Finally, the last few lines can be printed. You can try to print the last 10 lines no matter whether there is only one line in the log or 300 characters in each line

If you can do this, the general interviewer will be directly handled by you. The code is about 8 points. If you want to go further, there are still some places that can be optimized. Put the code on github, and take the interested ones to study

To be optimized: leave it to you as an extension

  • If your tail-f process, the log is backed up and emptied or cut, how to deal with it
    In fact, it's very simple. You maintain a pointer position. If you find that the file pointer position changes in the next cycle, you can read it from the latest pointer position
  • Specific types of each error
    I'm just try ing to write a function here to separate the error messages such as the file does not exist and the file does not have permission
  • Small performance optimization. For example, I read 1000 lines for the first time, only 4 lines. I estimated that 250 characters are needed in each line, and 2500 lines are needed as a whole. Next time, I don't need to read 2500 lines directly, but read 1500 lines and spell the previous 1000 lines

If you write this out, the basic interviewer will

Beat you to death

import os
def tail(file_name):
    os.system('tail -f '+file_name)

tail('log.log')

Topics: Python less github