This article was originally written because there was an interview in the reboot group, and the written examination question had this question. I don't know how to do it, what kind of thinking, so I sent it to everyone in the group for discussion
I've thought about it for a moment. Let's briefly talk about my idea. Of course, there is also a very useful pyinotify module to monitor file changes. But what I would like to introduce is the solution. After all, as an interviewer, I still want to see the solution. I think the difficulty of this problem is not to monitor file increments, but how to print the last 10 lines
I hope you will have a simple understanding of python foundation, processing files and common modules before reading this article, and know what the following terms are
open('a.txt') file.seek file.tell time.sleep()
The following ideas are limited to my personal knowledge. There are inevitably mistakes and thoughtless ideas. I hope you can come up with better methods. I can optimize the code at any time. There are not too many holes in the topic. Let the innocent snail teach you how to use python
How to implement in python
In fact, it's not hard to think about it
- Open this file and move the pointer to the end
- Try to read line once every second, print out if there is content, sleep if there is no content
- That's what I mean
Monitoring files
The ideas are as follows:
- Open file with open
- Use seek file pointer to jump to the back of the file
- while True to loop
Read line continuously. If you can read the content, print it out
Code is coming
with open('test.txt') as f: f.seek(0,2) while True: last_pos = f.tell() line = f.readline() if line: print line
Code description
- The second parameter of seek is 2, which means seek starts from the end of the file. The more standard writing method uses SEEK_END under the os module, which is more readable
- Only simple logic is written, and the code is simple and rough. If the topic is 10 points, you can get 4 points at most, no more
Optimization point
- print has defects. Each time, it's a new line. It's more harmonious to replace sys.stdout.write(line)
- File name parameter, can't write dead
- Direct printing can be regarded as the default behavior. What to do can be written as function processing, so that we can do other processing for the new line, such as display in the browser
- Plus fault-tolerant processing, for example, if the file does not exist, an error will be reported
- while True is always a file, which consumes performance. Every time you read it, every second is reliable
Call time.sleep(1) - Organize code with classes
The example code is as follows
#!/usr/bin/env python # -*- coding:utf-8 -*- import sys import time class Tail(): def __init__(self,file_name,callback=sys.stdout.write): self.file_name = file_name self.callback = callback def follow(self): try: with open(self.file_name) as f: f.seek(0,2) while True: line = f.readline() if line: self.callback(line) time.sleep(1) except Exception,e: print 'Failed to open the file. See if the file does not exist or if there is a problem with permissions' print e
usage method:
# Print to screen using default sys.stdout.write py_tail = Tail('test.txt') py_tail.follow() # Define your own processing functions def test_tail(line): print 'xx'+line+'xx' py_tail1 = Tail('test.txt', test_tail) py_tail1.follow()
Eh, wait a minute, tail-f will print the last 10 lines by default, which seems to be the difficulty of this problem. As we all know, reading file pointer in python can only move to a fixed position, and it can't judge which line it is. It's easy to implement first, and it's gradually strengthened
Default print last 10 lines
Now this code can get 6 points. We have another function that we haven't done. That is to print the last n lines. The default is 10 lines. Now add this function and add a function
When the files were small
We know that readlines can get all the contents, and the branches and codes are ready to come out. It's easy to get the last 10 lines of the list. Do you have them? They are sliced properly
# Demonstrate the code and explain the core logic. The complete code is as follows last_lines = f.readlines()[-10:] for line in last_lines: self.callback(line)
Now the code is like this
import sys import time class Tail(): def __init__(self,file_name,callback=sys.stdout.write): self.file_name = file_name self.callback = callback def follow(self,n=10): try: with open(self.file_name) as f: self._file = f self.showLastLine(n) self._file.seek(0,2) while True: line = self._file.readline() if line: self.callback(line) except Exception,e: print 'Failed to open the file. See if the file does not exist or if there is a problem with permissions' print e def showLastLine(self, n): last_lines = self._file.readlines()[-10:] for line in last_lines: self.callback(line)
Further: what to do with big logs
At this time, when the code has 7 minutes, it's very random. But if the file is very large, especially the log file, it's easy to have several G's. We only need the last few lines to read all the memory, so we need to continue to optimize the showLastLine function. I think that's the difficulty of this problem
The general idea is as follows
I first estimate that the log line is about 100 characters. Note, I only estimate one. It doesn 't matter if there is more. We just need an initial value, which will be corrected later
I want to read ten lines, so from the beginning seek to the position 1000 from the end seek(-1000,2), read out the last 1000 characters, and then make a judgment
If the 1000 characters are longer than the file length, regardless of the level 1 situation, read and split directly
If the new line character in 1000 characters is greater than 10, it means that in 1000 characters, it's easy to do. The processing idea is similar to level 1. split directly takes the next ten
The question is what to do if it's less than 10
For example, in 1000 characters, there are only four line breaks, indicating that one line is about 1000 / 4 = 250 characters, and we are still short of six lines, so we read 250 * 5 = 1500 again, and there are about 2500 characters that are relatively reliable. Then we make the same logical judgment on 2500 characters, until the read characters, the line breaks are greater than 10, and the reading is finished
After the logic is clear, the code will come out
Annotated version
#!/usr/bin/env python # -*- coding:utf-8 -*- import sys import time class Tail(): def __init__(self,file_name,callback=sys.stdout.write): self.file_name = file_name self.callback = callback def follow(self,n=10): try: # Open file with open(self.file_name) as f: self._file = f self._file.seek(0,2) # Character length of storage file self.file_length = self._file.tell() # Print last 10 lines self.showLastLine(n) # Continuous read file print increment while True: line = self._file.readline() if line: self.callback(line) time.sleep(1) except Exception,e: print 'Failed to open the file. See if the file does not exist or if there is a problem with permissions' print e def showLastLine(self, n): # About 100 in a row. Change this number to 1 or 1000 len_line = 100 # n the default value is 10. You can also pass in the follow parameter read_len = len_line*n # Use last lines to store the last content to be processed while True: # If the 1000 characters to be read are greater than the length of the previously stored file # After reading the file, break it directly if read_len>self.file_length: self._file.seek(0) last_lines = self._file.read().split('\n')[-n:] break # First read 1000 characters and then judge the number of line breaks in 1000 characters self._file.seek(-read_len, 2) last_words = self._file.read(read_len) # count is the number of line breaks count = last_words.count('\n') if count>=n: # If the number of line breaks is greater than 10, it is easy to process and read directly last_lines = last_words.split('\n')[-n:] break # Not enough 10 line breaks else: # break #Not enough ten lines # If there is no line break, then we think there are about 100 lines if count==0: len_perline = read_len # If there are four line breaks, we think there are about 250 characters in each line else: len_perline = read_len/count # The length to be read becomes 2500, continue to judge again read_len = len_perline * n for line in last_lines: self.callback(line+'\n') if __name__ == '__main__': py_tail = Tail('test.txt') py_tail.follow(20)
The effect is as follows. Finally, the last few lines can be printed. You can try to print the last 10 lines no matter whether there is only one line in the log or 300 characters in each line
If you can do this, the general interviewer will be directly handled by you. The code is about 8 points. If you want to go further, there are still some places that can be optimized. Put the code on github, and take the interested ones to study
To be optimized: leave it to you as an extension
- If your tail-f process, the log is backed up and emptied or cut, how to deal with it
In fact, it's very simple. You maintain a pointer position. If you find that the file pointer position changes in the next cycle, you can read it from the latest pointer position - Specific types of each error
I'm just try ing to write a function here to separate the error messages such as the file does not exist and the file does not have permission - Small performance optimization. For example, I read 1000 lines for the first time, only 4 lines. I estimated that 250 characters are needed in each line, and 2500 lines are needed as a whole. Next time, I don't need to read 2500 lines directly, but read 1500 lines and spell the previous 1000 lines
If you write this out, the basic interviewer will
Beat you to death
import os def tail(file_name): os.system('tail -f '+file_name) tail('log.log')