Welcome to the world of Python. This tutorial will take you through Python and understand the charm of Python. This tutorial focuses on helping beginners, especially bioinformatics analysts, quickly learn the common functions and usage of python, so only some Python functions are selected. Please refer to the classic Python tutorial A byte of python and its Chinese version for a better understanding of Python Thank you for referring to A byte of python (Chinese version) for the concept and text description of this document.
This tutorial has been updated to the python 3 version.
catalogue
- Background introduction
- Programming opening
- Why learn Python
- How to install Python
- How to run Python commands and scripts
- What editor do you use to write Python scripts
- Python program examples
- Python basic syntax
- Numeric variable operation
- String variable operation
- List operation
- Collection operation
- Range usage
- Dictionary operation
- Level indent
- Variable, data structure, process control
- Input and output
- Interactive I / O
- File reading and writing
- Practical practice (I)
- background knowledge
- Operation (I)
- Function operation
- Function operation
- Operation (II)
- modular
- Command line parameters
- Command line parameters
- Operation (III)
- More Python content
- Monolingual sentence block
- List synthesis, a simplified for loop that generates a new list
- lambda, map, filer, reduce
- exec, eval (execute string python statement, keep program)
- regular expression
- Python drawing
- Reference
Background introduction
Programming opening
A: What books are you reading recently?
B: Programming.
A: Shen Congwen's book.
B: ......
data:image/s3,"s3://crabby-images/d4877/d487754585c6ad17335e45a8a24d19970b747e4c" alt=""
D: Which country's language is that?
C: ......
Why learn Python
- Simple grammar Programs written in Python language are like pseudo code built in natural language, "what you see is what you think". Reading Python code is like reading the simplest English essay. Writing Python code is easier than writing English articles, "what you think is what you write". Many friends who have just learned Python think it's incredible. It's right how they think and write it.
- Powerful Now the development of programming language has been very mature. Each programming language can realize all the functions of other programming languages. Therefore, as far as the programming language itself is concerned, the functions are not much different. The powerful function of Python language lies in its active community and strong third-party module support, which makes it more and more powerful as scientific computing.
- Good scalability It can be perfectly integrated with C to speed up the operation speed. Available acceleration modules include python, pypy, Pyrex, psycho, etc
Python common package
- Scientific computing Numpy, SciPy (also a stumbling block for installing python packages until there are conda)
- Data frame operation package similar to R Pandas
- Visualization tools Seaborn (with pandas), matplotlib (analog MATLAB), plotly (interactive drawing), ggplot (analog ggplot2)
- Website development web py, Django, Flask
- Task scheduling and process management Airflow (pipeline preferred)
- Machine learning scikit learn (Classic), PyML, Tensorflow (Google release), pylearn2, Orange (machine learning package with graphical interface)
- Web page capture Beautiful Soup, requests,
- Reprogrammable Jupyter
- Regular expression re
How to learn Python
Programming is like Lego. We need to know the characteristics of each component so that we can use it when needed. We also need to have a blueprint in our head and know what to do at each step. By combining the two, we can spell the world you want.
In my opinion, learning programming is to apply what you have learned, and the way of learning is to read and write.
- Read more classic books First, understand programming or Python Programming conceptually and theoretically, and read more. Read a book a hundred times and see its meaning.
- Do more exercises Any exercise is OK, easy first and then difficult. It would be better if you could find professional related.
- Multi read code Read more excellent code and correct your habits and thinking.
Several stages of Python learning
data:image/s3,"s3://crabby-images/09bc6/09bc680b1509e949d234e30b80f49c6c32b1c3f6" alt=""
(legend "programming confidence and ability": the vertical axis is the confidence value and the horizontal axis is the ability level. The dotted line is divided into four stages from left to right: hand in hand honeymoon period, chaotic cliff, desperate desert and exciting rising period. The fifth dotted line indicates that the work is ready)
- Reading documents is a honeymoon period. It's over. Anyone can read them.
- Writing a program is a chaotic cliff. I know it is, but I can't write it;
- The debugging program is a desert of despair. What you think you write is right, but the compiler is not open-minded;
- When the procedure is correct, it is an exciting period, and another step has been taken in the long march.
How to install Python
There are many packages with good functions in the python community, but installation by installation needs to solve complex dependencies. I usually recommend installing the completed integration package to solve the subsequent problems once and for all. Anaconda is the most preferred distribution package. It integrates common toolkits such as numerical calculation, graphics processing and visualization, such as IPython, Matplotlib, numpy and SciPy, and sets a simpler method to install Python modules, which can save a lot of installation time.
Python common package
- Scientific computing Numpy, SciPy (also a stumbling block for installing python packages until there are conda)
- Data frame operation package similar to R Pandas
- Visualization tools Seaborn (with pandas), matplotlib (analog MATLAB), plotly (interactive drawing), ggplot (analog ggplot2)
- Website development web py, Django, Flask
- Task scheduling and process management Airflow (pipeline preferred)
- Machine learning scikit learn (Classic), PyML, Tensorflow (Google release), pylearn2, Orange (machine learning package with graphical interface)
- Web page capture Beautiful Soup, requests,
- Reprogrammable Jupyter
- Regular expression re
How to run Python commands and scripts
- For beginners, this manual recommends learning Python commands and scripts directly under the Jupiter notebook. Our tutorial is also written in Jupyter Notebook. The code in it can be modified and run at any time, and can record your script and output at the same time, which is in line with the popular concept of "repeatable computing".
- Linux/Unix users directly enter your target folder CD / working on the terminal_ Dir [Enter], and then enter Jupyter notebook [Enter] at the terminal to start Jupyter notebook.
- Windows users can create a new Jupiter_ notebook. bat file (create a new txt file, and modify the suffix to. bat after writing the content. If the suffix cannot be modified, please Google search "how Window displays the file extension"), and write the following content (note that the drive letter and path in the first two lines are replaced with your working directory), and double-click to run.
D: cd PBR_training jupyter notebook pause
- After the Jupiter notebook is started, the default browser will be opened (you need to work in the graphical user interface). At this time, you can create or open the ipynb file in the corresponding path.
- For Linux or Unix users, enter python directly in the terminal and press enter to open the interactive python interpreter, as shown in the following figure. In this interpreter, you can type in any legal python statement to execute. In addition, all commands can be stored in one file and executed together, as shown in the following figure. We have a file containing python programs, test Py, we just need to enter python test at the terminal Py and press enter to run the file. At the same time, we can also enter Chmod 755 test at the terminal Py program test Py can execute permissions and enter them in the terminal/ test.py runs python scripts. For more advanced usage under Linux and Linux commands, see bash tutorial_ training-chinese. ipynb.
data:image/s3,"s3://crabby-images/27f3f/27f3f60c215e328fe63c1d621f7bd6ddda436889" alt=""
- For Windows users, you can call up the "Run" window through "windows key + R" and enter "cmd" to open the windows command interpreter. Enter Python to open the interactive Python interpreter. At the same time, you can also double-click the shortcut of the installed software to open the Python interpreter of the graphical interface, which can process interactive commands and import Python files and execute them.
- For the interactive Python interpreter, after use, close it with the keyboard combination Ctrl-d (Linux/Unix) or Ctrl-z (Windows).
What editor do you use to write Python scripts
After you learn, the main operations may be completed on the server, and your daily work will generally be solved in the form of scripts. I personally recommend using Vim to write Python scripts.
The vim configuration file under Linux can be downloaded from my github, and the Windows version can be downloaded from my baidu cloud.
Python program examples
# If we have the following FASTA format file, what should we do if we want to merge multiple line sequences into one line? for line in open("data/test2.fa"): print(line.strip())
>NM_001011874 gene=Xkr4 CDS=151-2091 gcggcggcgggcgagcgggcgctggagtaggagctggggagcggcgcggccggggaaggaagccagggcg aggcgaggaggtggcgggaggaggagacagcagggacaggTGTCAGATAAAGGAGTGCTCTCCTCCGCTG CCGAGGCATCATGGCCGCTAAGTCAGACGGGAGGCTGAAGATGAAGAAGAGCAGCGACGTGGCGTTCACC CCGCTGCAGAACTCGGACAATTCGGGCTCTGTGCAAGGACTGGCTCCAGGCTTGCCGTCGGGGTCCGGAG >NM_001195662 gene=Rp1 CDS=55-909 AAGCTCAGCCTTTGCTCAGATTCTCCTCTTGATGAAACAAAGGGATTTCTG TGGACAGTTTATCCAGGAAGGTACCCCTGCCCTTTGGGGTAAGGAACATCAGCACGCCCCGTGGACGACA CAGCATCACCAGGCTGGAGGAGCTAGAGGACGGCAAGTCTTATGTGTGCTCCCACAATAAGAAGGTGCTG >NM_011283 gene=Rp1 CDS=128-6412 AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTGACGTTTATACAGACCAC ACAAACTATTTACTCTTTTCTTCGTAAGGAAAGGTTCAACTTCTGGTCTCACCCAAAATGAGTGACACAC CTTCTACTAGTTTCTCCATGATTCATCTGACTTCTGAAGGTCAAGTTCCTTCCCCTCGCCATTCAAATAT CACTCATCCTGTAGTGGCTAAACGCATCAGTTTCTATAAGAGTGGAGACCCACAGTTTGGCGGCGTTCGG CGCCCTCGGCCCTGGCTGAGTAGTCGCTCCATAAGCACGCATGTGCAGCTCTGTCCTGCAACTGCCAATA TGTCCACCATGGCACCTGGCATGCTCCGTGCCCCAAGGAGGCTCGTGGTCTTCCGGAATGGTGACCCGAA >NM_0112835 gene=Rp1 CDS=128-6412 AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTGACGTTTATACAGACCAC ACAAACTATTTACTCTTTTCTTCGTAAGGAAAGGTTCAACTTCTGGTCTCACCCAAAATGAGTGACACAC CTTCTACTAGTTTCTCCATGATTCATCTGACTTCTGAAGGTCAAGTTCCTTCCCCTCGCCATTCAAATAT CACTCATCCTGTAGTGGCTAAACGCATCAGTTTCTATAAGAGTGGAGACCCACAGTTTGGCGGCGTTCGG TGTCCACCATGGCACCTGGCATGCTCCGTGCCCCAAGGAGGCTCGTGGTCTTCCGGAATGGTGACCCGAA
aDict = {}for line in open('data/test2.fa'): if line[0] == '>': key = line.strip() aDict[key] = [] else: aDict[key].append(line.strip())#------------------------------------------for key, valueL in list(aDict.items()): print(key) print(''.join(valueL))
>NM_001011874 gene=Xkr4 CDS=151-2091 gcggcggcgggcgagcgggcgctggagtaggagctggggagcggcgcggccggggaaggaagGAACTCGGACAATTCGGGCTCTGTGCAAGGACTGGCTCCAGGCTTGCCGTCGGGGTCCGGAG >NM_001195662 gene=Rp1 CDS=55-909 AAGCTCAGCCTTTGCTCAGATTCTCCTCTTGATGAAACAAAGGGATTTCTGCACATGCTTTCTTATGTGTGCTCCCACAATAAGAAGGTGCTG >NM_011283 gene=Rp1 CDS=128-6412 AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTGACGTTTATACACAGGCGCCCTCGGCCCTGGCTGAGTAGTCGCTCCATAAGCACGCATGTGCAGCTCTGTCCTGCAACTGCCAATATGTCCACCATGGCACCTGGCATGCTCCGTGCCCCAAGGAGGCTCGTGGTCTTCCGGAATGGTGACCCGAA >NM_0112835 gene=Rp1 CDS=128-6412 AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTATATGTCCACCATGGCACCTGGCATGCTCCGTGCCCCAAGGAGGCTCGTGGTCTTCCGGAATGGTGACCCGAA
Python syntax
Level indent
- Appropriate indentation. Whitespace is important in Python and is called indentation. White space (spaces and tabs) at the beginning of a logical line is used to determine the indentation level of the logical line, and thus to determine the grouping of statements. This means that statements at the same level must have the same indentation. Each set of such statements is called a block. The usual indent is 4 spaces, and a Tab key in the python notebook.
From the following two examples, you can see the wrong indentation type and the corresponding prompt.
- "unexpected indent" indicates that there is more blank space where it should not appear, and indicates that the problem is in the third line (line 3).
- "expected an indented block" indicates that there should be indented places without indenting, and also indicates the line where the problem lies.
- "Indent does not match any outer indentation level" indicates that the indentation is inconsistent. The problem is usually in the specified line and the line before it.
print "Improper indentation can cause errors, b Should not have indents before"a = 'No indent' b = 'There is a space in front of me'
File "<ipython-input-123-085115ffae95>", line 3 b = 'There is a space in front of me' ^ IndentationError: unexpected indent
print "Inappropriate indentation, print yes for The sub statement of should have indentation, but it is omitted"a = [1,2,3]for i in a:print "I should be indented, I belong for loop!!!\n"
File "<ipython-input-2-1b9e89963ac3>", line 5 print "I should be indented, I belong for loop!!!\n" ^ IndentationError: expected an indented block
a = [1,2, 3]if a: for i in a: print i print i + 1, "Why is my indentation different from other lines? Who ate my space?" print i + 1, "Why is my indentation different from other lines? Who gave me a space?"
File "<ipython-input-203-1af46ff5a29f>", line 5 print i + 1, "Why is my indentation different from other lines? Who ate my space?" ^ IndentationError: unindent does not match any outer indentation level
Use of Python as a calculator
Basic mathematical operations can be carried out in Python. Like those learned in primary school, addition, subtraction, multiplication, division, remainder, etc. it should be noted that the priority of operators.
2 + 2
4
2 + 3 +5
10
2 + 3 * 5
17
(2 + 3) * 5
25
# Divide by 23 / / 7
3
# Remainder 23% 7
2
First applet
print("Hello, Python!")
Hello, Python!
myname = input("Your name: ") print("Hello", myname)
Your name: ct Hello ct
Variable, data structure, process control
Let's first look at a dynamic diagram to show the assignment, storage, value change and program operation of variables in memory.
data:image/s3,"s3://crabby-images/3be34/3be342976375501be5475edd35c7ae068bdf9ed6" alt=""
Constant refers to a fixed number or string, such as 2, 2.9, Hello world, etc.
Variables, things that store numbers or strings are called variables, which can be assigned or modified. It can be simply understood that a variable is a box. You can put anything in it and take out the things in the box through the name of the box.
- Numeric variable: a variable that stores a number.
- List: a list is a data structure that handles a set of ordered items, that is, you can store a sequence of items in a list. Suppose you have a shopping list, which records what you want to buy, you can easily understand the list. It's just that on your shopping list, everything may occupy a single line, while in Python, you separate each item with a comma. The items in the list should be included in square brackets so Python knows you are indicating a list. Once you have created a list, you can add, delete, or search for items in the list. Because you can add or delete items, we say that the list is a variable data type, that is, this type can be changed. The name of the list variable should not be list. You can use alit.
- Set: similar to a list, but the elements cannot be repeated. It is usually used to remove repetition, find intersection, union, etc. Moreover, the query speed of set is much faster than that of list, which can be used to improve the operation speed.
- Sequence: list, tuple and string are all sequence formats. You can also use range to generate sequences. The two main operations of a sequence are index operation and slice operation.
Identifier
- The name of the variable is called an identifier. The identifier is case sensitive. The first character must be a letter in the alphabet (uppercase or lowercase) or an underscore (), Other parts include additional numbers. Valid identifiers are: ABC_ abc, a_ b_ 2, __ 23 et al. Invalid identifiers are: 2a, 3b.
- The identifier should be able to express the meaning in words, that is, to show the type of the variable and the actual meaning of the variable. For example, line represents a line of the file, and lineL represents a list of each line read from the file.
control flow
- if statement If statement is used to test a condition. If the condition is true, we run one statement (called if block), otherwise we process another statement (called else block). Else clauses are optional. If there are multiple conditions, elif is used in the middle. For example: "buy five steamed stuffed buns. If you see someone selling watermelon, buy one" -- finally, the program ape bought a steamed stuffed bun. " Buy steamed stuffed bun = 5 if you see someone selling watermelon: Buy steamed stuffed bun = 1
- While statement The while statement allows you to execute a block of statements repeatedly as long as a condition is true. The while statement is an example of a so-called circular statement. The while statement has an optional else clause.
- The continue statement is used to tell Python to skip the remaining statements in the current loop block and then proceed to the next loop.
# Variable name naming: clear expression, hump type, underline type, LookLikeThis = 1look_like_this = 'a'
Data types in Python: integer (int), floating point (float), and string (str)
type(2)
int
type(2.5)
float
type("Hello, everyone")
str
type([1,2,3])
list
# Type conversion function str() int() float()
# Characters and numbers are different 42= "42"
True
42 == int("42")
True
# This is my first python program!myName = input("Hello, what is your name?") print('It is good to meet you,' + myName) print('The length of your name is ' + str(len(myName))) myAge = input('What is your age?') print('You will be ' + str(int(myAge) + 1) + ' in a year.')
hello, world What is your name?ct It is good to meet you,ct The length of your name is 2 What is your age?23 You will be 24 in a year.
Logical values and comparison operations
a = Falsea
False
b = Trueb
True
42 == 40
False
42 == "42"
False
30 == 30
True
# Note the difference between assignment (a =) and comparison (= =). A ='hello'a = ='Hello '
True
Boolean operators and, or, not
- Logic and and: what is true is true
- Logical or or: all false is false
- Logical not: true becomes false, false becomes true
(3 > 2) and (5 < 6)
True
(3 < 2) or (5 < 6)
True
not True
False
control flow
if condition (True or False):
Code block 1
elif condition:
Code block
else :
Code block 2
#Condition name = input ('Please enter a name and click enter \ n ') if name = =' ehbio ': Print ('Hello ehbio') else: Print ('You are not ehbio ')
While Loop
a = 0if a < 5: print('Hello, world') a = a + 1
Hello, world
a = 0while a < 5: print('Hello, world') a = a + 1
Hello, world Hello, world Hello, world Hello, world Hello, world
Numeric variable operation
print("Numerical variable") a = 5 #Pay attention to the spaces on both sides of the equal sign. For easy identification, it is best to have spaces on both sides of the operator, and the number is unlimited print(a) print() print("The type of a is", type(a))#print "This is a reserved program. Usually, it is not used to judge the type of variable type yes isinstance."#print "a is an int, ", isinstance(a,int)# Another assignment is to overwrite a = 6print(a)
Numerical variable 5 The type of a is <class 'int'> 6
# Judge print("compare the size of the value") a = 5# Note the spaces on both sides of the greater than sign. For easy identification, it is better to have spaces on both sides of the operator. The number is unlimited if a > 4: print("a is larger than 4.")elif a == 4: print("a is equal to 4.")else: print("a is less than 4")
Compare values a is larger than 4.
print("Given numerical variable a and b Through judgment and re assignment a The value of is small, b The value of is large") a = 5b = 3if a > b: a,b = b,a#-------------------print(a) print(b)
Given numerical variable a and b Through judgment and re assignment a The value of is small, b The value of is large 3 5
print('''#The numerical operation conforms to the traditional priority, and brackets need to be used to change the priority, Just as like as two peas in primary school!''') a = 5b = 3print("a + b =", a + b) print("a * b =", a * b) print("a / b =", a / b) # 1print("2 * (a+b) =", 2 * (a + b)) print("Remainder: a % b =", a % b) print("Taking the remainder is a good place to judge the cycle, because the remainder of each fixed cycle will cycle once")
#The numerical operation conforms to the traditional priority, and brackets need to be used to change the priority, Just as like as two peas in primary school! a + b = 8 a * b = 15 a / b = 1.6666666666666667 2 * (a+b) = 16 Remainder: a % b = 2 Taking the remainder is a good place to judge the cycle, because the remainder of each fixed cycle will cycle once
String variable operation
print("String variable")# Note the pairing of quotation marks a = "Hello, welcome to Python"#a = 123#a = str(a)print("The string a is:", a) print()# Placeholder print ("the length of this string <% s > is% d"% (a, len (a))) print() print("The type of a is", type(a))
String variable The string a is: Hello, welcome to Python The length of this string <Hello, welcome to Python> is 24 The type of a is <class 'str'>
a = "Great events depend on arbitrariness, not on public conspiracy"print("The string a is:", a) print()# Len function: get the string length print ("the length of this string <% s > is% d"% (a, len (a))) print()
The string a is: Great events depend on arbitrariness, not on public conspiracy The length of this string <Great events depend on arbitrariness, not on public conspiracy> is 10
a = "Hello, welcome to Python"print("Take out the first character, the last character and the middle character of the string") print("The first character of a is %s\n" % a[0]) print("The first five characters of a are %s\n" % a[0:5]) print("The last character of a is %s\n" % a[-1]) print("The last character of a is %s\n" % a[len(a) - 1]) print("\n This part is very important. String indexing and slicing are very common.")
Take out the first character, the last character and the middle character of the string The first character of a is H The first five characters of a are Hello The last character of a is n The last character of a is n This part is very important. String indexing and slicing are very common.
a = "oaoaoaoa"print("Traversal string")for i in a: print(i) print("Output the position of characters that meet specific requirements") print() pos = 0for i in a: pos += 1 if i == 'o': print(pos) #-------------------#-----------------------print('''\n you know what? We inadvertently wrote Python's A built-in standard function find perhaps index,And more powerful''') print('''\n I try to realize the built-in function of the program language, which is to learn the program language A good way.''')
Traversal string o a o a o a o a Output the position of characters that meet specific requirements 1 3 5 7 Do you know? Inadvertently, we wrote Python of A built-in standard function find perhaps index,And more powerful I try to realize the built-in function of the program language is to learn the program language A good way.
print("Let's look at how to find all of them with built-in functions o Location of\n") a = "oaoaoaoa"print("Built in function find Only the first one can be identified o Location of") pos = a.find('o') print("So we need to find o After that, intercept the subsequent string and execute find operation")while 1: print(pos + 1) new = a[pos + 1:].find('o') if new == -1: break pos = new + pos + 1# help(str)
Let's look at how to find all of them with built-in functions o Location of Built in function find Only the first one can be identified o Location of So we need to find o After that, intercept the subsequent string and execute find operation 1 3 5 7
print() print("utilize split Split string\n") str1 = "a b c d e f g"strL = str1.split(' ') print(strL) print("\n use split Command can divide the string into a list. You can choose which column you want to use.")# Use the following command to see what you can do with the string# help(str)
utilize split Split string ['a', 'b', 'c', 'd', 'e', 'f', 'g'] use split Command can divide the string into a list. You can choose which column you want to use.
print("String connection\n") a = "Hello"b = "Python"c = a + ', ' + b print(c) print("\n The original strings can be added together!\n") print('''Note that this is not the best way to concatenate strings. Considering that the string cannot be modified, each connection operation opens up a new memory space, Save the string in it. This connection operation will be executed hundreds of thousands of times, which will greatly affect the running speed.''')
String connection Hello, Python The original strings can be added together! Note that this is not the best way to concatenate strings. Considering that the string cannot be modified, each connection operation opens up a new memory space, Save the string in it. This connection operation will be executed hundreds of thousands of times, which will greatly affect the running speed.
Why is + not recommended for string links Why is my Python so slow (1) A good demonstration example is given.
print('''Removes specific characters from a string. Usually, a line we read in the file contains a newline character, linux The following is\\n \n''') # \Escape character a = "online \ n" print ("currently, the string < a > is * *", a, "* *)\ \n The length of string <a> is **", len(a), "**. \ \n Why did I change to the next line?\n") a = a.strip() print("Currently, the string <a> is **", a, "**. \ \n The length of string <a> is **", len(a), "**. \ \n After deleting the newline character, one character is missing, and there is no newline!\n") a = a.strip('o') print("Currently, the string <a> is **", a, "**. \ \n The length of string <a> is **", len(a), "**. \ Another character is missing!!\n") a = a.strip('one') print("Currently, the string <a> is **", a, "**. \ \n The length of string <a> is **", len(a), "**. \ There are fewer characters!!\n")
Removes specific characters from a string. Usually, a line we read in the file contains a newline character, linux The following is\n Currently, the string <a> is ** oneline **. The length of string <a> is ** 8 **. Why did I change to the next line? Currently, the string <a> is ** oneline **. The length of string <a> is ** 7 **. After deleting the newline character, one character is missing, and there is no newline! Currently, the string <a> is ** neline **. The length of string <a> is ** 6 **. Another character is missing!! Currently, the string <a> is ** li **. The length of string <a> is ** 2 **. There are fewer characters!!
print("String replacement\n") a = "Hello, Python"b = a.replace("Hello", "Welcome") print("The original string is:", a) print() print("The replaced string is:", b) print() c = a.replace("o", "O") print(c) print("be-all o All replaced!\n") print("If I only replace the first one o What shall I do?\n") c = a.replace("o", "O", 1) print(c)
String replacement The original string is: Hello, Python The replaced string is: Welcome, Python HellO, PythOn be-all o All replaced! If I only replace the first one o What shall I do? HellO, Python
# It can also be a = "when you are busy, always check your points first in your leisure, and don't give up too much; think when you move, and take care of it from the quiet, not from the heart." print(a.replace(';', '\n'))
When you are busy, you should always check on your leisure first. If you are too busy, you will be rare Think when you move, and take care of it from the quiet, not from the heart.
print("String help, view the available functions of string") help(str)
print("Case judgment and conversion") a = 'Sdsdsd'print("All elements in <%s> is lowercase: %s" % (a, a.islower())) print("Transfer all elments in <%s> to lowerse <%s>" % (a, a.lower())) print("Transfer all elments in <%s> to upperse <%s>" % (a, a.upper()))
Case judgment and conversion All elements in <Sdsdsd> is lowercase: False Transfer all elments in <Sdsdsd> to lowerse <sdsdsd> Transfer all elments in <Sdsdsd> to upperse <SDSDSD>
print("This is a reserved program. Those who are interested in watching it and those who are not interested in skipping it will not affect their study") print('''The string is not modifiable, Assigning different names to the same variable only produces multiple different variables. Different variable names are assigned the same value, which is equal for comparison, but refers to different regions''') b = "123456"# print bprint("The memory index of b is", id(b))for i in range(1, 15, 2): b = b + '123456' # print b print("The memory index of b is", id(b))
This is a reserved program. Those who are interested in watching it and those who are not interested in skipping it will not affect their study The string is not modifiable, Assigning different names to the same variable only produces multiple different variables. Different variable names are assigned the same value, which is equal for comparison, but refers to different regions The memory index of b is 139844870936200 The memory index of b is 139844868463728 The memory index of b is 139844870954056 The memory index of b is 139844863857088 The memory index of b is 139844863857088 The memory index of b is 139845221506544 The memory index of b is 139844869671408 The memory index of b is 139844868660840
print("String to array") print() str1 = "ACTG"print(list(str1)) a = list(str1) print() print("After the string is converted to an array, you can reverse the order to get its reverse sequence") print() a.reverse() print(''.join(a))
String to array ['A', 'C', 'T', 'G'] After the string is converted to an array, you can reverse the order to get its reverse sequence GTCA
print("Numeric string to numeric value") a = '123'print(a + '1', int(a) + 1) a = '123.5'print()# print a + 1print(float(a) + 1) print('''The numbers taken from the file or command line parameters are in the form of strings, When doing four operations, you should first use int or float transformation.''')
Numeric string to numeric value 1231 124 124.5 The numbers taken from the file or command line parameters are in the form of strings, When doing four operations, you should first use int or float transformation.
print("String multiplication") a = "ehbio "a * 4
String multiplication 'ehbio ehbio ehbio ehbio '
# Multiplication cannot be decimal a * 3.1
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-21-c65dd4dac397> in <module>() ----> 1 a * 3.1 TypeError: can't multiply sequence by non-int of type 'float'
break and continue
# Simulate login account, user name Bob, password fishwhile True: name = input('Who are you?\n> ') if name != 'Bob': continue # Jump the program to the beginning password = input('Hello, Bob. What is your password? (password: fish)\n> ') if password == 'fish': break # Jump out of the loop or innermost loop print('Acess granted! ')
Who are you? > Bob Hello, Bob. What is your password? (password: fish) > fish Acess granted!
for range
# If only one parameter is given, the parameter is given from 0 (excluding), and the step is 1 for I in range (4): print(i)
0 1 2 3
# 1: start; 10: end (excluding); 2: stepfor i in range(1,10,2): print(i)
1 3 5 7 9
# The step size can also be negative, from large to small for i in range(10,1,-2): print(i)
10 8 6 4 2
Gauss calculates the sum of 1-100.
# Gaussian 1+2+3+...+100=?total = 0# The parameter is 101. Why? for i in range(101): total = total + i print(total)
5050
# Gaussian optimized end = 100sum_all = int((1+end) * end / 2)#else:# sum_all = end * (end -1 ) / 2 + endprint(sum_all)
5050
The question is as follows:
We have 100 yuan and need to buy 100 items, of which the unit price of pencil box is 5 yuan, the unit price of pen is 3 yuan and the unit price of rubber is 0.5 yuan. How to combine them can spend 100 yuan. At the same time, the sum of the three items is 100. Please solve it by programming.
# Pure violence solution for X in range (0, 101): for y in range (0, 101): for Z in range (0, 101): if x + y + Z = = 100 and 5 * x + 3 * y + 0.5 * z = = 100: print(x, y, z)
0 20 80 5 11 84 10 2 88
# Optimized solution# limit box and pen The maximum number of, that is, all the money only buy them, how many can you buy at most? max_box = int(100 / 5) + 1max_pen = int(100 / 3) + 1for box_num in range(max_box): # The total number of items to buy is fixed, for pen_num in range(max_pen - box_num): eraser_num = 100 - box_num - pen_num if 5 * box_num + 3 * pen_num + 0.5 * eraser_num == 100: print((box_num, pen_num, eraser_num))
(0, 20, 80) (5, 11, 84) (10, 2, 88)
Range function (range(start,end,step))
List operation
print("#Build an array "") aList = [1, 2, 3, 4, 5] print(aList) print("\n Arrays can be indexed with subscripts or ranges\n") print("The first element is %d." % aList[0]) print() print("The last element is %d." % aList[-1]) print() print("The first two elements are", aList[:2]) print("\n Array indexing and slicing operations are the same as strings, and both are important.")
#Build an array [1, 2, 3, 4, 5] Arrays can be indexed with subscripts or ranges The first element is 1. The last element is 5. The first two elements are [1, 2] Array indexing and slicing operations are the same as strings, and both are important.
aList = [] print("#append: add element to array "") aList.append(6) print(aList) print("\n#extend: add an array to the array ") print() bList = ['a', 'b', 'c'] aList.extend(bList) print(aList)
#Adding elements to an array [6] #Adds an array to the array [6, 'a', 'b', 'c']
aList = [1, 2, 3, 4, 3, 5] print("Delete element in array") aList.remove(3) # Delete only the first matching 3print() print(aList) aList.pop(3) # Remove the character print() with subscript 3 of the element print(aList) print('''\npop and remove It's different, remove Is to remove the element equal to the given value, pop Is to remove an element at a given location\n''')
Delete element in array [1, 2, 4, 3, 5] [1, 2, 4, 5] pop and remove It's different, remove Is to remove elements equal to the given value, pop Is to remove an element at a given location
aList = [1, 2, 3, 4, 5] print("#Traverse each element of the array ") print()for ele in aList: print(ele) print("\n#Output array and elements greater than 3 ") print()for ele in aList: if ele > 3: print(ele)
#Traverse each element of the array 1 2 3 4 5 #Elements greater than 3 in the output array 4 5
aList = [i for i in range(30)] print("#Elements greater than 3 and less than 10 in the output array ") print()for ele in aList: if ele > 3 and ele < 10: # Logic and are output only when both conditions are met print(ele)
#Elements greater than 3 and less than 10 in the output array 4 5 6 7 8 9
aList = [i for i in range(30)] print("#Elements greater than 25 or less than 5 in the output array ") print()for ele in aList: if ele > 25 or ele < 5: # Logical or is output when one of the two conditions is met print(ele)
#Elements greater than 25 or less than 5 in the output array 0 1 2 3 4 26 27 28 29
aList = [i for i in range(30)] print("#Elements not greater than 3 in the output array '') print()for ele in aList: # Logical non, output only when the given conditions are not met. # For this example, it is output only when ele is not greater than 3, which is equivalent to if ele < = 3: if not ele > 3: print(ele)
#Elements greater than 3 and less than 10 in the output array 0 1 2 3
print("Connect each element of the array (each element must be a string)") aList = [1, 2, 3, 4, 5]# print '\t'.join(aList) #wrongprint(aList) aList = [str(i) for i in aList] print(aList) print('\t'.join(aList)) print(':'.join(aList)) print(''.join(aList)) print('''\n Save the string to the list before using it join connect, Is the most appropriate way to connect a large number of strings''')
Connect each element of the array (each element must be a string) [1, 2, 3, 4, 5] ['1', '2', '3', '4', '5'] 1 2 3 4 5 1:2:3:4:5 12345 Save the string to the list before using it join connect, Is the most appropriate way to connect a large number of strings
aList = [1, 2, 3, 4, 5] print("Array reverse order") aList.reverse() print(aList) print("Array element sorting") aList.sort() print(aList)# print "lambda Sort, keep programs"#aList.sort(key=lambda x: x*(-1))# print aList
Array reverse order [5, 4, 3, 2, 1] Array element sorting [1, 2, 3, 4, 5]
Collection operation
print("Build a collection") aSet = set([1, 2, 3]) print(aSet) print("Add an element") aSet.add(4) print(aSet) aSet.add(3) print(aSet)
Build a collection {1, 2, 3} Add an element {1, 2, 3, 4} {1, 2, 3, 4}
print("Use conversion to set to remove duplicate elements in the list") aList = [1, 2, 1, 3, 1, 5, 2, 4, 3, 3, 6] print(aList) print(set(aList)) print(list(set(aList)))
Use conversion to set to remove duplicate elements in the list [1, 2, 1, 3, 1, 5, 2, 4, 3, 3, 6] {1, 2, 3, 4, 5, 6} [1, 2, 3, 4, 5, 6]
Range usage
print("\n use range,Produces a series of strings\n")for i in range(16): if i % 4 == 0: print(i) print("\n The number of multiples of 4 produced by specifying the step size\n")for i in range(0, 16, 4): print(i)
use range,Produces a series of strings 0 4 8 12 The number of multiples of 4 produced by specifying the step size 0 4 8 12
Dictionary operation
print("#Build a dictionary "") aDict = {1: 2, 3: 4, 'a': 'b', 'd': 'c'} print("Print Dictionary") print(aDict) print("Add key value pairs to the dictionary") aDict[5] = 6aDict['e'] = 'f'print(aDict)
#Build a dictionary Print Dictionary {1: 2, 3: 4, 'a': 'b', 'd': 'c'} Add key value pairs to the dictionary {1: 2, 3: 4, 'a': 'b', 'd': 'c', 5: 6, 'e': 'f'}
print() aDict = {1: 2, 3: 4, 'a': 'b', 'd': 'c'} print("Key value pair of output dictionary(key-value)")for key, value in list(aDict.items()): print(key, value)
Key value pair of output dictionary(key-value) 1 2 3 4 a b d c
print("Key value pairs of ordered output dictionary(key-value)") aDict = {'1': 2, '3': 4, 'a': 'b', 'd': 'c'} keyL = list(aDict.keys()) print(keyL)# Python3 Not supported in int Type and str Type comparison# All need to be converted to str type# keyL = [str(i) for i in keyL]keyL.sort() print(keyL)for key in keyL: print(key, aDict[key])
Key value pairs of ordered output dictionary(key-value) ['1', '3', 'a', 'd'] ['1', '3', 'a', 'd'] 1 2 3 4 a b d c
print("Dictionary value Can be a list") a = 'key'b = 'key2'aDict = {} print(aDict) aDict[a] = [] print(aDict) aDict[a].append(1) aDict[a].append(2) print(aDict) aDict[b] = [3, 4, 5] print()for key, subL in list(aDict.items()): print(key) for item in subL: print("\t%s" % item) print("This will be very useful when accessing the read files. This will be used in the following practical exercises.")
Dictionary value Can be a list {} {'key': []} {'key': [1, 2]} key 1 2 key2 3 4 5 This will be very useful when accessing the read in files. This will be used in the following practical exercises.
Dictionaries can use Chinese as Key and Chinese as value.
aDict = {'Caigen Tan': 'Those who understand things because of people's words are better than those who understand themselves; It is better to rest from complacency than to gain and lose from outside.'} print(aDict['Caigen Tan'])
Those who understand things because of people's words are better than those who understand themselves; It is better to rest from complacency than to gain and lose from outside.
print("Dictionary value It can also be a dictionary") a = 'key'b = 'key2'aDict = {} print(aDict) aDict[a] = {} print(aDict) aDict[a]['subkey'] = 'subvalue'print(aDict) aDict[b] = {1: 2, 3: 4}#aDict[(a,b)] = 2#aDict['a'] = 2#aDict['b'] = 2print()for key, subD in list(aDict.items()): print(key) for subKey, subV in list(subD.items()): print("\t%s\t%s" % (subKey, subV)) print("\n This will be very useful when accessing the read files. This will be used in the following practical exercises.")
Dictionary value It can also be a dictionary {} {'key': {}} {'key': {'subkey': 'subvalue'}} key subkey subvalue key2 1 2 3 4 This will be very useful when accessing the read files. This will be used in the following practical exercises.
Input and output
Interactive I / O
In many cases, you want your program to interact with users (possibly yourself). You will get input from the user and print some results. We can use raw separately_ Input and print statements to accomplish these functions.
a = input("Please input a string and type enter\n> ") print("The string you typed in is: ", a)
Please input a string > sheng xin bao dian The string you typed in is: sheng xin bao dian
print("This is a reserved example for play only\n") lucky_num = 5c = 0while True: b = int(input("Please input a number to check if you are \ lucky enough to guess right: \n")) if b == lucky_num: print("\nYour are so smart!!! ^_^ ^_^") #---------------------------------------------------- #-------------------------------------------------------- else: print("\nSorry, but you are not right. %>_<%") while 1: c = input("Do you want to try again? [Y/N] \n") if c == 'Y': try_again = 1 break elif c == 'N': try_again = 0 break else: print("I can not understand you, please check your input. \n") continue #---------------------------------------------------- if try_again: print("\nHere comes another run. Enjoy!\n") continue else: print("\nBye-bye\n") break
This is a reserved example for play only Please input a number to check if you are lucky enough to guess right: 2 Sorry, but you are not right. %>_<% Do you want to try again? [Y/N] Y Here comes another run. Enjoy! Please input a number to check if you are lucky enough to guess right: 5 Your are so smart!!! ^_^ ^_^ Please input a number to check if you are lucky enough to guess right: 1 Sorry, but you are not right. %>_<% Do you want to try again? [Y/N] N Bye-bye
File reading and writing
File reading and writing is the most common input and output operation. You can use file or open.
File path
When reading and writing files, if no file path is specified, it defaults to the current directory. Therefore, you need to know the current directory, and then judge whether the file to be read is in the current directory.
import os os.getcwd()# os.chdir("path")
'/MPATHB/ct/ipython/notebook'
print("Create a new file") context = '''The best way to learn python contains two steps: 1. Rember basic things mentionded here masterly. 2. Practise with real demands. '''print("In write mode(w)Open a file and name it(Test_file.txt)") fh = open("Test_file.txt", "w") print(context, file=fh)# fh.write(context)fh.close() # The file handle must be closed after the file operation is completed
Create a new file In write mode(w)Open a file and name it(Test_file.txt)
print("In read-only mode(r)Read in a file named(Test_file.txt)File") print()for line in open("Test_file.txt"): print(line)
In read-only mode(r)Read in a file named(Test_file.txt)File The best way to learn python contains two steps: 1. Rember basic things mentionded here masterly. 2. Practise with real demands.
Take a closer look at the output above. It looks very empty and there are many empty lines.
print('''Avoid the output of empty lines in the middle. Each line read from the file has a newline character, and Python of print By default, a newline character is added at the end of the output, Therefore, printing one line will leave one line blank. In order to solve this problem, there are the following two sets of solutions.''') print("stay print Add a comma after the statement(,)Can stop Python Line breaks added automatically for each output") print()for line in open("Test_file.txt"): print(line, end=' ') print() print("Remove the newline character of each line")for line in open("Test_file.txt"): print(line.strip())
Avoid the output of empty lines in the middle. Each line read from the file has a newline character, and Python of print By default, a newline character is added at the end of the output, Therefore, printing one line will leave one line blank. In order to solve this problem, there are the following two sets of solutions. stay print Add a comma after the statement(,)Can stop Python Line breaks added automatically for each output The best way to learn python contains two steps: 1. Rember basic things mentionded here masterly. 2. Practise with real demands. Remove the newline character of each line The best way to learn python contains two steps: 1. Rember basic things mentionded here masterly. 2. Practise with real demands.
Practical practice (I)
background knowledge
1. FASTA file format
>seq_name_1 sequence1 >seq_name_2 sequence2
2. FASTQ file format
@HWI-ST1223:80:D1FMTACXX:2:1101:1243:2213 1:N:0:AGTCAA TCTGTGTAGCCNTGGCTGTCCTGGAACTCACTTTGTAGACCAGGCTGGCATGCA + BCCFFFFFFHH#4AFHIJJJJJJJJJJJJJJJJJIJIJJJJJGHIJJJJJJJJJ
Operation (I)
Given the files in FASTA format (test1.fa and test2.fa), write a program cat Py reads in the file and outputs it to the screen
- open(file)
- Knowledge points used
Given the FASTQ format file (test1.fq), write a program cat Py reads in the file and outputs it to the screen
- ditto
Writer splitname Py, read test2 FA, take the name before the first space of the original sequence name as the processed sequence name, and output it to the screen
- split
- Knowledge points used
Write program formatfasta Py, read test2 FA, connect each FASTA sequence into a line and output it
- join
- Knowledge points used
Write program formatfasta-2 Py, read test2 FA, divide each FASTA sequence into a sequence of 80 letters
- String slicing operation
- Knowledge points used
Write program sortfasta Py, read test2 FA, and take the name before the first space of the original sequence name as the processed sequence name, which is output after sorting
- sort
- aDict[key] = []
- Knowledge points used
Extract the sequence of a given name
- Knowledge points used
- Modulo operation, 4% 2 = = 0
- Write program grepfastq Py, extract fastq Test1 corresponding to the name in name FQ sequence and output to a file.
Write the program screenresult Py, filter test Genes with foldChange greater than 2 and padj less than 0.05 in expr can output the whole line or only the gene name
- Logic and operator and
- Knowledge points used
Write program transfermultiplecolumbtomatrix Py converts the expression data of genes in multiple tissues in the file (multipleColExpr.txt) into matrix form
- aDict['key'] = {}
- if key not in aDict
- Knowledge points used
- Output format Name A-431 A-549 AN3-CA BEWO CACO-2 ENSG00000000460 25.2 14.2 10.6 24.4 14.2 ENSG00000000938 0.0 0.0 0.0 0.0 0.0 ENSG00000001084 19.1 155.1 24.4 12.6 23.5 ENSG00000000457 2.8 3.4 3.8 5.8 2.9
Write the program reversecomplementary Py calculate the reverse complementary sequence of acgtacgtacgtcacgttcagctagac
- reverse
- Knowledge points used
Write the program collapsemirnareads Py transformed sequencing data of smrna seq
- Input file format (mir.collapse, tab - two column file divided, the first column is the sequence, and the second column is the number of times the sequence is measured) ID_REF VALUE ACTGCCCTAAGTGCTCCTTCTGGC 2 ATAAGGTGCATCTAGTGCAGATA 25 TGAGGTAGTAGTTTGTGCTGTTT 100 TCCTACGAGTTGCATGGATTC 4
The simplified short sequence matching program (map.py) converts short The sequences in FA are aligned to ref.fa, and the output short sequences are matched to which sequences and positions in the ref.fa file
- find
- Output format (the output format is bed format, the first column is the matched chromosome, and the second and third columns are the start and end positions of the matched chromosome sequence (the position mark starts with 0, representing the first position; the end position is not included, and the position of the sequence shown in the first example is (199208) (front closed and back open, actually the sequence of chromosome 199-206 of chr1, starting with 0) . column 4 is the sequence of the short sequence itself.).
remarks:
- Each mentioned "knowledge point used" is a new knowledge point relative to the previous topic. Please consider it comprehensively. In addition, for different ideas, not all the knowledge points mentioned will be used, and the knowledge points not mentioned may also be used. But all the knowledge points are introduced in the previous handout.
- Learn to exercise "reading program", that is, simulate the whole reading and processing process of files to find possible logical problems.
About program debugging
- When writing a program at the beginning, there may be various errors, such as inconsistent indentation, misspelled variable name, missing colon, file name without quotation marks, etc. at this time, you should check the error type and the wrong line according to the error prompt to locate the error. Of course, sometimes the line reporting an error does not necessarily have an error. It may be that the line in front of or behind it has an error.
- When the results do not meet expectations, learn to use print to check whether the operation of each step is correct. For example, if I read the dictionary, I will print the dictionary to see if what I read is what I want and whether it contains characters that should not exist; Or print a character when each judgment sentence or function is called in to track the running track of the program.
Function operation
Functions are reusable program segments. They allow you to give a block a name, and then you can run the block any number of times using that name anywhere in your program. This is called a calling function. We have used many built-in functions, such as len, range, input, int, str.
You can also import functions in specific packages, such as OS getcwd, sys. exit.
Functions are defined by the def keyword. The def keyword is followed by the identifier name of a function, followed by a pair of parentheses. Some variable names can be included in parentheses, and the line ends with a colon. Next is a block of statements, which are the body of the function.
#Custom function def print_hello(): print("Hello, you!") print_hello()
Hello, you!
def hello(who): print("Hello, %s!" % who) hello('you') hello('me')
Hello, you! Hello, me!
#Custom function def sum_num(a, b): c = a + b return c d = sum_num(3, 4) d
7
Local and global variables
Global function: all statements downstream of logic can be used from the definition position except function
Local action: within a function, out of a function can not be recognized
Local can use global to access global variables, while global cannot use local variables.
var = 1def variable_test(): var = 2print("var before running variable_test()", var) variable_test() print("var after running variable_test()", var)
var before running variable_test() 1 var after running variable_test() 1
var = 1def variable_test(): global var var = 2print("var before running variable_test()", var) variable_test() print("var after running variable_test()", var)
var before running variable_test() 1 var after running variable_test() 2
# Global variables use def spam() in the local scope: print("eggs in spam",eggs) eggs = 28spam()
eggs in spam 28
# Global variables are used in local scopes# However, if the local part also has its definition, it is easy to cause undefined conflict def spam(): print(eggs) eggs = 29eggs = 28spam() print(eggs)
--------------------------------------------------------------------------- UnboundLocalError Traceback (most recent call last) <ipython-input-22-61f9fdfeb6fe> in <module>() 6 7 eggs = 28 ----> 8 spam() 9 print(eggs) <ipython-input-22-61f9fdfeb6fe> in spam() 1 # Global variables are used in local scopes 2 def spam(): ----> 3 print(eggs) 4 eggs = 29 5 UnboundLocalError: local variable 'eggs' referenced before assignment
# Try to avoid local variables and global variables with the same name def spam(): eggs = 'spam local' print("eggs in spam",eggs) # Output spam localdef bacon(): eggs = 'bacon local' print("eggs in bacon", eggs) # Output bacon local spam() print("eggs in bacon after running spam", eggs) # Output bacon localeggs = 'global'bacon() print("Global eggs", eggs)
eggs in bacon bacon local eggs in spam spam local eggs in bacon after running spam bacon local Global eggs global
print("Wrap the previously written statement block a little and it is a function\n")def findAll(string, pattern): posL = [] pos = 0 for i in string: pos += 1 if i == pattern: posL.append(pos) #------------------- return posL#------END of findAll-------a = findAll("ABCDEFDEACFBACACA", "A") print(a) print(findAll("ABCDEFDEACFBACACA", "B"))
Wrap the previously written statement block a little, and it is a function [1, 9, 13, 15, 17] [2, 12]
def read(file): aDict = {} for line in open(file): if line[0] == '>': name = line.strip() aDict[name] = [] else: aDict[name].append(line.strip()) #---------------------------------- for name, lineL in list(aDict.items()): aDict[name] = ''.join(lineL) return aDict print(read("data/test1.fa")) read("data/test2.fa")
{'>NM_001011874 gene=Xkr4 CDS=151-2091': 'gcggcggcgggcgagcgggcgctggagtaggagctggggagcggcgcggccggggaaggaagccagggcg', '>NM_001195662 gene=Rp1 CDS=55-909': 'AGGTCTCACCCAAAATGAGTGACACACCTTCTACTAGTTTCTCCATGATTCATCTGACTTCTGAAGGTCA', '>NM_0112835 gene=Rp15 CDS=128-6412': 'AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTGACGTTTATACAGACCAC', '>NM_011283 gene=Rp1 CDS=128-6412': 'AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTGACGTTTATACAGACCAC'} {'>NM_001011874 gene=Xkr4 CDS=151-2091': 'gcggcggcgggcgagcgggcgctggagtaggagctggggagcggcgcggccggggaaggaaGTCGGGGTCCGGAG', '>NM_001195662 gene=Rp1 CDS=55-909': 'AAGCTCAGCCTTTGCTCAGATTCTCCTCTTGATGAAACAAAGGGATTTCTGCACCAGGCTGGAGGAGCTAGAGGACGGCAAGTCTTATGTGTGCTCCCACAATAAGAAGGTGCTG', '>NM_011283 gene=Rp1 CDS=128-6412': 'AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTGGTCTTCCGGAATGGTGACCCGAA', '>NM_0112835 gene=Rp1 CDS=128-6412': 'AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTGGTCTTCCGGAATGGTGACCCGAA'}
Operation (II)
- Rewrite the program block in "job (I)" in the form of function and call execution
- def func(para1,para2,...):
- func(para1,para2,...)
- Knowledge points used
- remarks:
- Each mentioned "knowledge point used" is a new knowledge point relative to the previous topic. Please consider it comprehensively. In addition, for different ideas, not all the knowledge points mentioned will be used, and the knowledge points not mentioned may also be used. But all the knowledge points are introduced in the previous handout.
- Each program is very simple for the people around you who can write, so you must restrain yourself, make the answers independently, and look at the error prompts more than the difference between the program output and the expected results.
- Learn to exercise "reading program", that is, simulate the whole reading and processing process of files to find possible logical problems.
- If the program runs without errors, it does not mean that the program you write has completed your requirements. You have to insert an eye to see whether the output result is what you want.
- About program debugging
- When writing a program at the beginning, there may be various errors, such as inconsistent indentation, misspelled variable name, missing colon, file name without quotation marks, etc. at this time, you should check the error type and the wrong line according to the error prompt to locate the error. Of course, sometimes the line reporting an error does not necessarily have an error. It may be that the line in front of or behind it has an error.
- When the results do not meet expectations, learn to use print to check whether the operation of each step is correct. For example, if I read the dictionary, I will print the dictionary to see if what I read is what I want and whether it contains characters that should not exist; Or print a character when each judgment sentence or function is called in to track the running track of the program.
modular
Python has built-in many standard libraries, such as math for mathematical operations, sys for calling system functions, re for processing regular expressions, os for operating system related functions, etc. We focus on two libraries:
- sys
- sys.argv handles command line arguments
- sys.exit() exit function
- sys.stdin standard input
- sys.stderr standard error
- os
- os.system() or OS Popen() executes system commands
- os.getcwd() gets the current directory
- os.remove() deletes the file
import os os.getcwd()#help(os.getcwd)#os.remove(r'D:\project\github\PBR_training\script\splitName.py')#os.system('rm file')
'/MPATHB/ct/ipython/notebook'
from os import getcwd getcwd()
'/MPATHB/ct/ipython/notebook'
Command line parameters
sys.argv is a list that stores the command line parameters passed to the program, including the program name.
%%writefile testSys.pyimport sys print(sys.argv)
Writing testSys.py
%run testSys 'abc' 1
['testSys.py', 'abc', '1']
%%writefile cat.pyimport sysdef read_print_file(filename): for line in open(filename): print(line, end="")#------END read_print_file--------------------------#main The function and its calling part are the fixed format of my personal program, which can be copied def main(): #The general main program is contained in the main function, and the main function can be invoked at the end of the file. if len(sys.argv) < 2: #If there are less than two command line parameters, prompt for action #Output general prompt information to standard error print("Usage: python %s filename" % sys.argv[0], file=sys.stderr) sys.exit(0) file = sys.argv[1] read_print_file(file)#--------END main------------------#This sentence means that the main function is called only when the file is executed. If this file is called by other files, the main function is not executed. if __name__ == '__main__': main()
Writing cat.py
About__ main__ See for the explanation of About main and programming templates in Python.
%run cat
Usage: python cat.py filename
%run cat data/test1.fa
>NM_001011874 gene=Xkr4 CDS=151-2091 gcggcggcgggcgagcgggcgctggagtaggagctggggagcggcgcggccggggaaggaagccagggcg >NM_001195662 gene=Rp1 CDS=55-909 AGGTCTCACCCAAAATGAGTGACACACCTTCTACTAGTTTCTCCATGATTCATCTGACTTCTGAAGGTCA >NM_0112835 gene=Rp15 CDS=128-6412 AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTGACGTTTATACAGACCAC >NM_011283 gene=Rp1 CDS=128-6412 AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTGACGTTTATACAGACCAC
More powerful with optparse (keep content)
%%writefile skeleton.py#!/usr/bin/env pythondesc = ''' Functional description: '''import sysimport osfrom time import localtime, strftime timeformat = "%Y-%m-%d %H:%M:%S"from optparse import OptionParser as OPdef cmdparameter(argv): if len(argv) == 1: global desc print >>sys.stderr, desc cmd = 'python ' + argv[0] + ' -h' os.system(cmd) sys.exit(0) usages = "%prog -i file" parser = OP(usage=usages) parser.add_option("-i", "--input-file", dest="filein", metavar="FILEIN", help="The name of input file. \ Standard input is accepted.") parser.add_option("-v", "--verbose", dest="verbose", default=0, help="Show process information") parser.add_option("-d", "--debug", dest="debug", default=False, help="Debug the program") (options, args) = parser.parse_args(argv[1:]) assert options.filein != None, "A filename needed for -i" return (options, args)#--------------------------------------------------------------------def main(): options, args = cmdparameter(sys.argv) #----------------------------------- file = options.filein verbose = options.verbose debug = options.debug #----------------------------------- if file == '-': fh = sys.stdin else: fh = open(file) #-------------------------------- for line in fh: pass #-------------END reading file---------- #----close file handle for files----- if file != '-': fh.close() #-----------end close fh----------- if verbose: print("--Successful %s" % strftime(timeformat, localtime()), file=sys.stderr)if __name__ == '__main__': startTime = strftime(timeformat, localtime()) main() endTime = strftime(timeformat, localtime()) fh = open('python.log', 'a') print("%s\n\tRun time : %s - %s " % \ (' '.join(sys.argv), startTime, endTime), file=sys.stderr) fh.close()
Writing skeleton.py
%run skeleton -h
Usage: skeleton.py -i file Options: -h, --help show this help message and exit -i FILEIN, --input-file=FILEIN The name of input file. Standard input is accepted. -v VERBOSE, --verbose=VERBOSE Show process information -d DEBUG, --debug=DEBUG Debug the program
Operation (III)
- Enable all programs in job (2) to accept command line parameters
- import sys
- sys.argv
- import optparse
- Knowledge points used
- remarks
- Each mentioned "knowledge point used" is a new knowledge point relative to the previous topic. Please consider it comprehensively. In addition, for different ideas, not all the knowledge points mentioned will be used, and the knowledge points not mentioned may also be used. But all the knowledge points are introduced in the previous handout.
- Each program is very simple for the people around you who can write, so you must restrain yourself, make the answers independently, and look at the error prompts more than the difference between the program output and the expected results.
- Learn to exercise "reading program", that is, simulate the whole reading and processing process of files to find possible logical problems.
- If the program runs without errors, it does not mean that the program you write has completed your requirements. You have to insert an eye to see whether the output result is what you want.
- About program debugging
- When writing a program at the beginning, there may be various errors, such as inconsistent indentation, misspelled variable name, missing colon, file name without quotation marks, etc. at this time, you should check the error type and the wrong line according to the error prompt to locate the error. Of course, sometimes the line reporting an error does not necessarily have an error. It may be that the line in front of or behind it has an error.
- When the results do not meet expectations, learn to use print to check whether the operation of each step is correct. For example, if I read the dictionary, I will print the dictionary to see if what I read is what I want and whether it contains characters that should not exist; Or print a character when each judgment sentence or function is called in to track the running track of the program.
More Python content
Monolingual sentence block
if True: print('yes')if True: print('yes') x = 5y = 3if x > y: print(y)else: print(x)#-------------print((y if y < x else x)) print(x)
yes yes 3 3 5
List parsing
A simplified for loop that generates a new list
aList = [1, 2, 3, 4, 5] bList = []for i in aList: bList.append(i * 2)#-----------------------------------#nameL = [line.strip() for line in open(file)]bList = [i * 2 for i in aList] print(bList)
[2, 4, 6, 8, 10]
print("List parsing can be used for judgment") aList = [1, 2, 3, 4, 5] bList = [i * 2 for i in aList if i % 2 != 0] print(bList)
List synthesis can make judgment [2, 6, 10]
print("List parsing can also be nested") aList = [1, 2, 3, 4, 5] bList = [5, 4, 3, 2, 1] bList = [i * j for i in aList for j in bList]# for i in aList:# for j in bList:# print i * jprint(bList)
List synthesis can also be nested [5, 4, 3, 2, 1, 10, 8, 6, 4, 2, 15, 12, 9, 6, 3, 20, 16, 12, 8, 4, 25, 20, 15, 10, 5]
Dictionary parsing
aList = ['a', 'b', 'a', 'c'] aDict = {i:aList.count(i) for i in aList} aDict
{'a': 2, 'b': 1, 'c': 1}
bDict = {i:j*2 for i,j in aDict.items()} bDict
{'a': 4, 'b': 2, 'c': 2}
Assert
Set the conditions that must be met during operation, and report an error when the situation exceeds the expectation. It is often used in file reading or format judgment to help prevent abnormal reading or operation.
a = 1b = 2assert a == b, "a is %s, b is %s" % (a, b)if a == b: passelse: print("a is %s, b is %s" % (a, b))
--------------------------------------------------------------------------- AssertionError Traceback (most recent call last) <ipython-input-75-9c43179b4557> in <module>() 1 a = 1 2 b = 2 ----> 3 assert a == b, "a is %s, b is %s" % (a, b) 4 5 if a == b: AssertionError: a is 1, b is 2
More string methods
is.X string method
Does isalpha() contain only letters
Is isalnum() just a letter or a number
Is isdecimal() only numeric
isspace() whether there are only space tab line breaks
Whether istitle() starts with uppercase and is followed by lowercase letters
a = 'b1'a.isalpha()
False
a = 'b c'a.isalpha()
False
a = 'bc1'a.isalnum()
True
a = '1a'a.isalnum()
True
','.join(['i', 'love', 'python'])
'i,love,python'
'***'.join(['i', 'love', 'python'])
'i***love***python'
"linux R perl C python".split()
['linux', 'R', 'perl', 'C', 'python']
Text alignment rjust() ljust() center()
'hello'.rjust(10)
' hello'
'hello'.rjust(20,'*')
'***************hello'
'hello'.center(20,'-')
'-------hello--------'
def printPicnic(itemsDict, leftWidth, rightWidth): print('PICNIC ITEMS'.center(leftWidth + rightWidth, '-')) for k, v in itemsDict.items(): print(k.ljust(leftWidth, '.') + str(v).rjust(rightWidth)) picnicItems = {'sandwiches': 4, 'apples': 12, 'cups': 4, 'cookies': 8000} printPicnic(picnicItems, 12, 5) printPicnic(picnicItems, 20, 6)
---PICNIC ITEMS-- sandwiches.. 4 apples...... 12 cups........ 4 cookies..... 8000 -------PICNIC ITEMS------- sandwiches.......... 4 apples.............. 12 cups................ 4 cookies............. 8000
strip(),rstrip(),lstrip() delete whitespace characters
spam = ' hello 'spam
' hello '
spam.strip()
'hello'
spam.rstrip()
' hello'
spam.lstrip()
'hello '
a = 'Hello world, welcome to python world'a.strip('d')
'Hello world, welcome to python worl'
lambda, map, filer, reduce
- Lambda generates a function without a name. Usually, in order to meet one-time use, its syntax is lambda argument_list: expression. A parameter list is a comma separated list, and an expression is a combination of these parameters.
- Map performs a loop operation, using the syntax map(func, seq). The first parameter is the name of the function or function to be called, and the second parameter is a sequence (such as list, string, dictionary). Map calls func with each element of the sequence as a parameter and creates a new output list.
- Filter is used to filter lists. The syntax is filter(func, list). Call func with each element of the second parameter. If the return value is True, it will be retained, otherwise it will be discarded.
- Reduce continuously applies functions to the elements of the list. The syntax is reduce(func, list). If we have a list alit = [1,2,3,..., n], the operation after calling reduce (func, alit) is: first, the first two elements will be passed into the function func for operation, and the return value will replace these two elements to become the first element of the array alit = [func (1,2), 3,..., n]; Then the current first two elements are transferred to the func function of the graph for operation, and the return value replaces the two elements to become the first element of the array, aList = [func(func(1,2),3),..., n], until there is only one element in the list.
print("Summation function")def f(x, y): return x + y print(f([1, 2, 3], [4, 5, 6])) print(f(10, 15))
Summation function [1, 2, 3, 4, 5, 6] 25
print("Single parameter map, lambda call") aList = [1, 2, 3, 4, 5] print([x**2 for x in aList]) print("Multiple parameter map, lambda call")def f(x, y): return x + y print(list(map(f, [1, 2, 3], [4, 5, 6]))) print("The parameter is a string") print([x.upper() for x in 'acdf'])
Single parameter map, lambda call [1, 4, 9, 16, 25] Multiple parameter map, lambda call [5, 7, 9] The parameter is a string ['A', 'C', 'D', 'F']
print("Output all odd numbers") aList = [1, 2, 3, 4, 5] print([x for x in aList if x % 2])
Output all odd numbers [1, 3, 5]
from functools import reduce print("List summation") aList = [1, 2, 3, 4, 5] print(reduce(lambda a, b: a + b, aList))
List summation 15
from functools import reduce print("The list takes the maximum value") aList = [1, 2, 3, 4, 5] print(reduce(lambda a, b: a if a > b else b, aList))
The list takes the maximum value 5
exec, eval (execute string python statement, keep program)
a = 'print("Executing a string as a command")'exec(a)
Executing a string as a command
a = '(2 + 3) * 5'eval(a)
25
regular expression
Regular expressions are generally called pattern matching. Given a pattern, look for substrings that can be paired. In Python, it is implemented using the re module.
re.compile: convert regular expression to pattern object
re.match: matches the pattern object at the beginning of the string to be matched
re.search: search for the first matching object within the string to be matched
# For example, look for the start codon import re cds = "ATGACGCTCGGACGACTAATG"start_codon = re.compile('ATG') start_codon_match = start_codon.match(cds) start_codon_match.group()
'ATG'
# If there is UTR in front and the start codon is not in the first position, match cannot find the mRNA = "gtcaatgacgcctcggacgactaatg" start_ codon_ match = start_ codon. match(mRNA)if start_ codon_ match: print(start_codon_match.group())else: print("No start codon found at the beginning of the given sequence.")
No start codon at the beginning of the given sequence.
# If there is UTR in front and the start codon is not in the first position, you need to use searchmrna = "gtcaatgacgctcataatg" start_ codon_ match = start_ codon. search(mRNA)if start_ codon_ match: print(start_codon_match.group())else: print("No start codon found in the given sequence.")
ATG
# If you want to find all terminators, use findallmrna = "atgatgtaataguga" stop_ codon = re. compile('[TU]AA|[TU]AG|[TU]GA') stop_codon.findall(mRNA)
['TGA', 'TAA', 'UAA', 'TAG', 'UGA']
The above pattern uses two special symbols of regular expressions, | and [].
A|B: indicates that a or B has a match, such as TAA|TAG above; If you want to limit the range of characters at both ends of | you need to use parentheses. For example, T(AA|T)AG means that TAAAG or TTAG can be matched.
[TU]: any character in brackets can be matched, either T or U. In addition, you can also use [A-Z] to represent all capital letters, A-Za-z to represent all English letters, [0-9] to represent all Arabic numerals, or write [W-Z5-9_] Represents a portion of letters, numbers, and underscores.
Match a motif
# Match a motif,# requirement motif# The first bit is A,# The second arbitrary character,# The third digit is T,# Intermediate interval 3-5 Any base,# The next one is GseqL = ["ACGTACGT", "ACTCCCG","ACTCCGGG","AGTTTTTG"]# . Represents any character (Line breaks are not included)# {3,5} indicates that the previous character occurs 3-5 times. Pattern = re compile("A.T.{3,5}G") print("Matched", "\t","Matched part")for seq in seqL: match = pattern.search(seq) if match: print(seq, "\t",match.group())
Matched Matched part ACTCCCG ACTCCCG ACTCCGGG ACTCCGGG AGTTTTTG AGTTTTTG
Cut strings according to spaces
If there is a string, a, B, C, D, e, F, which is connected by a comma, an unequal number of spaces, TAB keys, or a combination thereof, you want to split it to get a separate part.
# []All familiar,# \s: Represents all whitespace, including whitespace, TAB Key, line feed, etc# +: indicates that the preceding character appears one or more times. Pattern = re compile("[,\s]+") seq = "A B C D , E, F"pattern.split(seq)
['A', 'B', 'C', 'D', 'E', 'F']
Memory matching
If we have some FQ GZ file, want to get its file name and output it.
# root and leaf It's the sample name# After the first underline is the biological repetition rep1, rep2# 1 and 2 after the second underline represent the left and right ends of double ended sequencing, respectively. fqL = ["root_rep1_1.fq.gz", "root_rep1_2.fq.gz", "root_rep2_1.fq.gz","root_rep2_2.fq.gz", "leaf_rep1_1.fq.gz", "leaf_rep1_2.fq.gz", "leaf_rep2_1.fq.gz","leaf_rep2_2.fq.gz"]# * Indicates that the preceding character appears any number of times# () This indicates memory matching, which can be obtained by subscript# \ Is an escape character, \.hold.Into a normal character, that is, what matches here is a real character.,# Instead of any character pattern = re compile("(.*)_(.*)_[12]\.fq\.gz")for fq in fqL: match = pattern.search(fq) sample = match.group(1) rep = match.group(2) print(sample,rep)
root rep1 root rep1 root rep2 root rep2 leaf rep1 leaf rep1 leaf rep2 leaf rep2
Match replacement
Chinese names usually have the first name and the last name. Foreigners write the first name and the last name. Now we need to do a conversion operation.
# Change the following names to last name and first name nameL = ["Chen Tong", "Liu Yongxin", "Wang Ying"]# \w: Represents a word character, equivalent to[A-Za-z0-9_]pattern = re.compile("(\w+) (\w+)")# \2, \1 Indicates the first and second memory matches. The specific counting method has been subject to the leftmost bracket,# The first leftmost bracket is \ 1 and the second is \ 2 for name in nameL: print(pattern.sub(r"\2 \1", name))
Tong Chen Yongxin Liu Ying Wang
More regular expression rules are shown in the figure below. The rest is to study hard and practice more.
data:image/s3,"s3://crabby-images/4f41c/4f41c71caeafccf3e9b1c0014c59939ececc5b1d" alt=""
Picture from https://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html
Python drawing
Figure and Subplot
import matplotlib.pyplot as pltimport numpy as npimport pandas as pdfrom numpy.random import randn
x = [1, 3, 5, 7, 9, 10, 23, 45, 45, 56] y = [2, 4, 6, 8, 11, 12, 23, 45, 56, 78] fig = plt.figure() ax1 = fig.add_subplot(2, 2, 1) # Create four Figure objects, the last 1 is to select the first ax2 = fig.add_subplot(2, 2, 2) ax3 = fig.add_subplot(2, 2, 3) ax4 = fig.add_subplot(2, 2, 4) ax1.hist(x, y) ax2.scatter(x, y) ax4.plot(x, y) plt.show()
data:image/s3,"s3://crabby-images/99c34/99c343a25adecb5c7eb6a12b5b779799ce0863a7" alt=""
fig, axes = plt.subplots(2, 3) axes
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000000000B5D88D0>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000000009A87FD0>, <matplotlib.axes._subplots.AxesSubplot object at 0x000000000B3467B8>], [<matplotlib.axes._subplots.AxesSubplot object at 0x0000000009CF0390>, <matplotlib.axes._subplots.AxesSubplot object at 0x000000000B6CC080>, <matplotlib.axes._subplots.AxesSubplot object at 0x000000000B5D9978>]], dtype=object)
Create 2 X 3 Image, which can be equivalent to indexing a two-dimensional array parameter explain nrows subplot Number of rows ncols subplot Number of columns sharex All graphs use the same x axis sharey All graphs use the same y axis subplot_kw Used to create each subplot Keyword dictionary for ```| ### Adjust spacing around subplot
subplots_ajust(left=None,bottom=None,right=None,top=None,wspace=None,hspace=None) wspace and hspace control the percentage of width and height
```python fig, axes = plt.subplots(2, 2, sharex=True, sharey=True) for i in range(2): for j in range(2): axes[i, j].hist(randn(500), bins=50, color='k', alpha=0.5) plt.subplots_adjust(wspace=0, hspace=0) plt.show()
data:image/s3,"s3://crabby-images/4bd6a/4bd6a0f8ca95fd340f61316f5eeb57744b875b07" alt=""
fig, axes = plt.subplots(2, 2, sharex=True, sharey=True)for i in range(2): for j in range(2): axes[i, j].hist(randn(500), bins=50, color='k', alpha=0.5) plt.subplots_adjust(wspace=1, hspace=1) plt.show()
data:image/s3,"s3://crabby-images/2309b/2309ba0751106b8a4a3251858aa7833f958eb3bd" alt=""
Color marking and linetype
""" Draw a dashed green line ax.plot(x,y,'g--') Another way ax.plot(x,y,linestyle='--',color='g') Mark point( maker) """fig, axes = plt.subplots(1, 2) axes[0].plot(randn(10), 'g--') # green ---axes[1].plot(randn(10), 'ko--') # k: black o: dot PLT show()
data:image/s3,"s3://crabby-images/5fbf4/5fbf4e4a9036d4c8b0d5ea00fe7c3cb78a74869c" alt=""
Scales, labels, and legends
fig = plt.figure() ax = fig.add_subplot(1, 1, 1) ax.plot(randn(100)) plt.show()
data:image/s3,"s3://crabby-images/c60eb/c60eb4ecece51bc54988c4803fed0e0ac47d63c9" alt=""
"""Modify the axis in the figure above"""fig = plt.figure() ax = fig.add_subplot(1, 1, 1) ax.plot(randn(100)) ticks = ax.set_xticks([0, 25, 50, 75, 100]) # Set the scale labels = ax set_ xticklabels( ['first', 'second', 'third', 'forth', 'fifth'], rotation=30, fontsize='small') # set up x Axis label ax.set_title('my first matplot plot') # Set picture title ax.set_xlabel('Stages') # Set the x-axis name PLT show()
data:image/s3,"s3://crabby-images/7b9ae/7b9ae6671f01debfa63af2eba1ca56c9238a6840" alt=""
Add legend
https://matplotlib.org/api/legend_api.html?highlight=legend#module-matplotlib.legend 'best' 0 'upper right' 1 'upper left' 2 'lower left' 3 'lower right' 4 'right' 5 'center left' 6 'center right' 7 'lower center' 8 'upper center' 9 'center' 10 bbox_to_anchor=(0.5,0.8) bbox_to_anchor In the given binary, the first value is used for control legend The larger the value, the more it moves to the right, The second value is used for control legend The higher the value, the more upward it moves
fig = plt.figure() ax = fig.add_subplot(1, 1, 1) ax.plot(randn(10), 'k', label='one') # Draw a line, k black ax.plot(randn(10), 'g--', label='two') # Draw a second line, g green - -type ax.plot(randn(10), 'ro--', label='three') # Draw the third line, red, type ax.legend(loc=0, bbox_to_anchor=(0.5, 0.9)) plt.show()
data:image/s3,"s3://crabby-images/27a6b/27a6ba16add7b4bae3c80914ee172cbf46aed7cd" alt=""
annotation
x = [2, 4, 6, 8, 10, 12] y = [1, 3, 5, 7, 9, 11] fig = plt.figure() ax = fig.add_subplot(1, 1, 1)#ax = figax.plot(x, y, 'r--') ax.text(2, 4, 'hello python') plt.show()
data:image/s3,"s3://crabby-images/004b5/004b5be3c58b712b06cb400f1cb325f41b44e298" alt=""
pictures saving
x = [2, 4, 6, 8, 10, 12] y = [1, 3, 5, 7, 9, 11] fig = plt.figure() ax = fig.add_subplot(1, 1, 1) ax.plot(x, y, 'r--') ax.text(2, 4, 'hello python')# bbox_ Inches subtracts the white space around the current picture PLT savefig('figpath.jpg', dpi=300, bbox_inches='tight')
matplotlib can adjust various parameters, fonts and the size of all global pictures before configuring drawing.
example
Scatter plot
x = [1, 2, 3, 4, 5, 6] y = [1, 4, 9, 16, 25, 36] plt.scatter(x, # The x-axis data is the vehicle speed y, # y-axis data is the braking distance of the vehicle s=20, # Set the size of the point c='green', # Sets the color of the point marker='s', # Sets the shape of the point alpha=0.9, # Sets the transparency of the point linewidths=0.8, # Sets the thickness of the scatter boundary edgecolors='red' # Sets the color of the scatter boundary ) plt.title('simple scatter plot') plt.xlabel('X') # x-axis name PLT ylabel('Y') plt.show() # Display drawing
data:image/s3,"s3://crabby-images/252f3/252f32a0e087479f486ec8e80ab8431b234532e2" alt=""
Line chart
x = [1, 2, 3, 4, 5, 6] y = [1, 4, 9, 16, 25, 36] plt.plot(x, # x-axis data y, # y-axis data linestyle='-', # Polyline type linewidth=2, # Polyline width color='blue', # Polyline color marker='o', # Shape of point markersize=8, # Point size markeredgecolor='black', # Border color of point markerfacecolor='red') # Fill color of points# Add title and axis labels PLT title('line plot') plt.xlabel('X') plt.ylabel('Y') plt.show()
data:image/s3,"s3://crabby-images/4559e/4559e724956653963e290d8f2009592da73d583b" alt=""
histogram
import numpy as npimport pandas as pdimport matplotlib.pyplot as plt plt.hist(np.random.randn(50), # Drawing data bins=50, # Specifies that the number of bars in the histogram is 20 color='red', # Specifies the fill color edgecolor='k', # Specifies the boundary color of the histogram label='histogram') # Render label PLT for histogram show()
data:image/s3,"s3://crabby-images/9a792/9a7921dd33151410fefc0fd5cad509768798fb0b" alt=""
Straight bar graph
x = [1, 2, 3, 4, 5, 6] y = [1, 4, 9, 16, 25, 36] plt.bar(x, y, color='steelblue', alpha=0.8) plt.title('bar plot') plt.ylim([0, 40]) plt.show()
data:image/s3,"s3://crabby-images/756be/756bed0ddf368d65775591d87e6e1e9a71de6344" alt=""
x = [1, 2, 3, 4, 5, 6] y = [1, 4, 9, 16, 25, 36] plt.barh(x, y, color='steelblue', alpha=0.8) plt.title('bar plot') plt.show()
data:image/s3,"s3://crabby-images/3f3ab/3f3abc8e5493e4cea315735efddb4bcc35370a02" alt=""
Box diagram
x = [1, 2, 3, 4, 5, 6] plt.boxplot(x, patch_artist=True, # Add color to the box labels=['boxplot'], # Add a specific label name showmeans=True, )# Display graphic PLT show()
data:image/s3,"s3://crabby-images/7cfa0/7cfa084e9b6794f9af6daeab5fcbc36a505df5c3" alt=""
np.random.seed(2) # Set random seed DF = PD DataFrame(np.random.rand(5, 4), columns=(['A', 'B', 'C', 'D'])) df
A | B | C | D | |
---|---|---|---|---|
0 | 0.435995 | 0.025926 | 0.549662 | 0.435322 |
1 | 0.420368 | 0.330335 | 0.204649 | 0.619271 |
2 | 0.299655 | 0.266827 | 0.621134 | 0.529142 |
3 | 0.134580 | 0.513578 | 0.184440 | 0.785335 |
4 | 0.853975 | 0.494237 | 0.846561 | 0.079645 |
data = []for i in range(4): data.append(df.iloc[:, i]) data
[0 0.435995 1 0.420368 2 0.299655 3 0.134580 4 0.853975 Name: A, dtype: float64, 0 0.025926 1 0.330335 2 0.266827 3 0.513578 4 0.494237 Name: B, dtype: float64, 0 0.549662 1 0.204649 2 0.621134 3 0.184440 4 0.846561 Name: C, dtype: float64, 0 0.435322 1 0.619271 2 0.529142 3 0.785335 4 0.079645 Name: D, dtype: float64]
plt.boxplot(data) plt.show()
data:image/s3,"s3://crabby-images/0d391/0d39186328d06cc92787639a9842f59e5b7fa601" alt=""
plt.boxplot(x, notch=None, sym=None, vert=None, whis=None, positions=None, widths=None, patch_artist=None, meanline=None, showmeans=None, showcaps=None, showbox=None, showfliers=None, boxprops=None, labels=None, flierprops=None, medianprops=None, meanprops=None, capprops=None, whiskerprops=None) x: Specify the data to draw the box diagram; notch: Whether the box line diagram is displayed in the form of notch. It is non notch by default; sym: Specifies the shape of the outlier. The default is+No. display; vert: Whether the box line diagram needs to be placed vertically. It is placed vertically by default; whis: Specifies the distance between the upper and lower quartiles. The default is 1.5 Quartile difference of times; positions: Specify the location of the box diagram. The default is[0,1,2...]; widths: Specifies the width of the box diagram, which is 0 by default.5; patch_artist: Whether to fill the color of the box; meanline: Whether to represent the mean value in the form of line. It is represented by point by default; showmeans: Whether to display the average value. It is not displayed by default; showcaps: Whether to display the two lines at the top and end of the box line diagram. It is displayed by default; showbox: Whether to display the box of the box line diagram, which is displayed by default; showfliers: Whether to display abnormal value, which is displayed by default; boxprops: Set the attributes of the box, such as border color, fill color, etc; labels: Add labels to the box diagram, which is similar to the function of legend; filerprops: Set the attributes of outliers, such as the shape, size and fill color of outliers; medianprops: Set the properties of median, such as line type, thickness, etc; meanprops: Set the properties of the mean value, such as point size, color, etc; capprops: Set the properties of the top and end lines of the box diagram, such as color, thickness, etc; whiskerprops: Set the required properties, such as color, thickness, line type, etc;
Pie chart
data = [0.2, 0.3, 0.4, 0.1] plt.pie(data) plt.show()
data:image/s3,"s3://crabby-images/5d576/5d57631bb4903bdf9eb9a354d9a34654b0d0dd30" alt=""
plt.pie(x, explode=None, labels=None, colors=None, autopct=None, pctdistance=0.6, shadow=False, labeldistance=1.1, startangle=None, radius=None, counterclock=True, wedgeprops=None, textprops=None, center=(0, 0), frame=False) x: Specify the data for the plot; explode: Specifies the highlighting of some parts of the pie chart, that is, it is explosive; labels: Add label description for pie chart, similar to legend description; colors: Specifies the fill color of the pie chart; autopct: Automatically add percentage display, which can be displayed by formatting; pctdistance: Set the distance between the percentage label and the center of the circle; shadow: Whether to add the shadow effect of pie chart; labeldistance: Set the distance between each sector label (legend) and the center of the circle; startangle: Set the initial placement angle of pie chart; radius: Set the radius size of pie chart; counterclock: Whether the pie chart is presented in counterclockwise order; wedgeprops: Set the properties of the inner and outer boundaries of the pie chart, such as the thickness and color of the boundary line; textprops: Set the attributes of the text in the pie chart, such as font size, color, etc; center: Specify the position of the center point of the pie chart, which defaults to the origin frame: Whether to display the frame behind the pie chart, if set to True If so, you need to control the frame at the same time x Shaft y The range of the axis and the center position of the pie chart;
%matplotlib inlinefrom mpl_toolkits.mplot3d import axes3dimport matplotlib.pyplot as pltfrom matplotlib import cm fig = plt.figure() ax = fig.add_subplot(111, projection='3d') X, Y, Z = axes3d.get_test_data(0.05) cset = ax.contour(X, Y, Z, cmap=cm.coolwarm) ax.clabel(cset, fontsize=9, inline=1) plt.show()
data:image/s3,"s3://crabby-images/3d202/3d20208fc5c4cfc086e935dd2055db62d523abc3" alt=""
import matplotlib as mplfrom mpl_toolkits.mplot3d import Axes3Dimport numpy as npimport matplotlib.pyplot as plt mpl.rcParams['legend.fontsize'] = 10fig = plt.figure() ax = fig.gca(projection='3d')#theta = np.linspace(-4 * np.pi, 4 * np.pi, 100)#z = np.linspace(-2, 2, 100)#r = z**2 + 1#x = r * np.sin(theta)#y = r * np.cos(theta)x = [1, 2, 3] y = [1.5, 1, 2] z = [2, 1, 3] ax.plot(x, y, z, label='parametric curve') ax.legend() plt.show()
data:image/s3,"s3://crabby-images/a78d4/a78d4cb6b501ca945e5fc1140b5d134f3a7b7e98" alt=""
from mpl_toolkits.mplot3d import Axes3Dfrom mpl_toolkits.mplot3d.art3d import Poly3DCollectionimport matplotlib.pyplot as plt fig = plt.figure() ax = Axes3D(fig) x = [1,2,2] y = [1,0,2] z = [1,2,0] verts = [zip(x, y,z)] ax.add_collection3d(Poly3DCollection(verts,edgecolors='red', facecolors='red')) x = [0,1,1] y = [0,0,1] z = [0,1,0] verts = [zip(x, y,z)] verts = [[(1,1,1), (2,0,2),(2,2,0)],[(0,0,0),(1,0,1),(1,1,0)]] ax.add_collection3d(Poly3DCollection(verts)) plt.show()
data:image/s3,"s3://crabby-images/1b744/1b7442136db7db7d223cea716d443925ce616b28" alt=""
from mpl_toolkits.mplot3d import Axes3Dfrom mpl_toolkits.mplot3d.art3d import Poly3DCollectionimport matplotlib.pyplot as plt fig = plt.figure() ax = Axes3D(fig) verts = [[(0.5,0.5,0.5), (1.2,0,1.2),(1.2,1.2,0)],[(0,0,0),(1,0,1),(1,1,0)]] ax.add_collection3d(Poly3DCollection(verts, edgecolors=['blue','red'], facecolors=['blue','red'])) plt.show()
data:image/s3,"s3://crabby-images/bedad/bedada41f330b86d8ba6d424dc99cc5cf79a6627" alt=""
from matplotlib import pyplot as pltfrom mpl_toolkits.mplot3d.art3d import Poly3DCollection fig = plt.figure() ax = fig.add_subplot(111, projection='3d') x = [0.5, 1.2, 1.2, 0, 1, 1] y = [0.5, 0, 1.2, 0, 0, 1] z = [0.5, 1.2, 0, 0, 1, 0] poly3d = [[(0.5,0.5,0.5), (1.2,0,1.2),(1.2,1.2,0)],[(0,0,0),(1,0,1),(1,1,0)]] ax.scatter(x,y,z) ax.add_collection3d(Poly3DCollection(poly3d, edgecolors=['red','blue'], facecolors='w', linewidths=1, alpha=0.5)) plt.show()
data:image/s3,"s3://crabby-images/34c2d/34c2d5077b05e484f89b76e8206ff85bf8d6e6f0" alt=""
Reference
- http://www.byteofpython.info/
- http://woodpecker.org.cn/abyteofpython_cn/chinese/index.html
- http://www.python-course.eu/
- http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-189-a-gentle-introduction-to-programming-using-python-january-iap-2008/
- http://my.oschina.net/taogang/blog/286954