Properly clean up your wechat chat files

Posted by wisewood on Thu, 03 Mar 2022 06:56:03 +0100

Wechat office has been normalized. I also put a Windows computer that has been turned on all year round at home to log in to wechat for timely Q & A and bioinformatics knowledge sorting and creation.

There is a problem with this. Its default C disk is a solid-state hard disk, with a space of only 500G. However, many software such as wechat find a location to store data in the document folder under the user of C disk by default.

In particular, wechat chat records consume considerable disk space. Generally speaking, files are stored under the current user's document directory. I simply used the command to see:

# Generally speaking, the file is stored in the following directory of the current user's document:
# WeChat Files/.../FileStorage/File

$ du -h -d 1  
. . . 
4.0G    ./2021-08
4.2G    ./2021-09
3.9G    ./2021-10
6.1G    ./2021-11
2.9G    ./2021-12
4.7G    ./2022-01
3.6G    ./2022-02 
42G     .

If you don't know how to input commands like du -h -d 1, it is recommended to download a Git software on your Windows computer, and then you can right-click to open the black-and-white command line of Git for interaction.

It can be seen that it consumes 42G of space, and many small partners' Windows is a laptop, with 128G of space, so the cost is still a little big.

Delete duplicate files first

First of all, I log in to 4 wechat and forward the same file to 50 group chats every time. For example, if I send a document pdf of 1m, it will be downloaded by 200 group chats of 4 wechat at the same time, which will become 200M disk space consumption, as shown below:

find ./ -name "*(*"

./2022-02/Single cell transcriptome-2 Analysis of grouping criteria(1).zip
./2022-02/Single cell transcriptome-2 Analysis of grouping criteria(2).zip
./2022-02/Single cell transcriptome-2 Analysis of grouping criteria(3).zip
./2022-02/Single cell transcriptome-2 Analysis of grouping criteria(4).zip
# 200 compressed package files are omitted here--- 

So I first delete the bracketed names of these files!

find ./ -name "*(*" |while read id;do(rm -rf "$id");done

After reducing the load, it's 12G, and the effect is still very obvious!

Then delete the large file:

Similarly, use the find command to query files larger than 100M. You can see that they are basically the single-cell chart reproduction code sent to me by the apprentice after completing the homework:

find ./  -type f -size +100M  |xargs.exe ls -lh |cut -d" " -f5-
# Basically, they are the single cell chart reproduction code sent to me by the apprentice 
158M Jun  7  2021 ./2021-06/GSE40791.zip
139M Jun 19  2021 ./2021-06/week2.zip
175M Jun 25  2021 ./2021-06/Article reproduction_Sophie_20210625.zip
116M Jun 28  2021 ./2021-06/Data analysis of thyroid cancer.zip
176M Jul 12  2021 ./2021-07/01_Code.zip
144M Jul 11  2021 ./2021-07/GSE150241-code.zip
171M Jul 20  2021 ./2021-07/GSE156329.zip
196M Jul 18  2021 ./2021-07/GSE166635_code.zip
190M Jul 19  2021 ./2021-07/GSE171306_Sophie_Single cell data analysis.zip
110M Jul 30  2021 ./2021-07/Meng_3rd_code.zip
118M Jul 17  2021 ./2021-07/paper+supplementary.zip


160M Nov 12 19:37 ./2021-11/How many? gse Summary of dataset results.zip
102M Nov 20 09:05 ./2021-11/unicellular+Deep learning.zip
247M Dec  7 21:33 ./2021-12/scRNA.7z
108M Dec 12 09:30 ./2021-12/Apprentice Assignment 1.key
197M Dec  4 20:34 ./2021-12/Mouse neuron_Project results.rar
365M Jan  6 22:25 ./2022-01/1.306 Comprehensive of Western Medicine( pdf).rar

Because these codes contain original data, they are huge. After checking, we really have something worth remembering, because I backup and sort out the code every time. There is no need to use the original version, so we can delete it together.

find ./  -type f -size +100M  |while read id;do(rm -rf "$id");done

If you don't have enough detoxification, you can delete those larger than 10M together:

find ./  -type f -size +10M  |while read id;do(rm -rf "$id");done

If you want to understand the above code, you need to have basic computer knowledge of bioinformatics data analysis and learning process. I roughly divide it into statistical visualization based on R language and NGS data processing based on Linux:

The six stages of Linux need to be crossed one by one. Generally speaking, each stage requires at least one day of learning:

  • Stage 1: make the linux system as smooth as the desktop operating system such as Windows or MacOS. The main purpose is to visualize, be familiar with the black-and-white command-line interface, and complete the conventional folder and file management only in the keyboard interaction mode.
  • Stage 2: achieve tabular processing of text files, which is similar to sorting, counting, screening, de redundancy, finding, cutting, replacing, merging and complementing Excel tables in keyboard interactive mode, and master the troika of text processing: awk, sed and grep.
  • Stage 3: metacharacters, wildcards and various extensions in the shell. Since then, linux operation is no longer mysterious!
  • Stage 4: Advanced directory management: soft and hard links, absolute and relative paths, and environment variables.
  • Stage 5: task submission and batch processing, script writing, free your hands.
  • Stage 6: software installation and conda management to make the linux system practical.

If you really think my tutorial is helpful to your scientific research project and makes you enlightened, or your project uses a lot of my skills, please add a short thank you when publishing your achievements in the future, as shown below:

We thank Dr.Jianming Zeng(University of Macau), and all the members of his bioinformatics team, biotrainee, for generously sharing their experience and codes.

If I have traveled around the world in universities and research institutes (including Chinese mainland) in ten years, I will give priority to seeing you if you have such a friendship.