Rosalind Java| Calculating Protein Mass

Posted by vtolbert on Sun, 20 Feb 2022 09:14:12 +0100

Calculation of protein molecular weight for Rosalind programming problem.

Calculating Protein Mass

Problem
In a weighted alphabet, every symbol is assigned a positive real number called a weight. A string formed from a weighted alphabet is called a weighted string, and its weight is equal to the sum of the weights of its symbols.

The standard weight assigned to each member of the 20-symbol amino acid alphabet is the monoisotopic mass of the corresponding amino acid.

Given: A protein string P of length at most 1000 aa.
Sample input:

FWCHYWCWWVNYMYDWCDDMCAYGHWSKFWDTKVHKMFGQPSNTFKYEWSMPYVRVCNRGHSEVLLNEVGLACSTPCYMMHGYLCICVPCSHSRPSTDYLWNEPGKEHSILIEDNSMDWHVQRNWDPVMGTYSGWNTAYEPTDYTDCSNTCHDYYFADQNAYKSIRSFIFWAQRRNIKNMHFHDTECINTCEVQFFWVRFRVIHWANPVENHCPNHGRDPFYTADRAECNGASAKTIGAEDPCLAKDDKPCNILDLAFGPSHGWVKWYVMYYTEGNQTVHICDTDSHNGEGAYSQSKSDLYWMDTRHVKICKFSYRIWASTLMCYARCVALHAWKHLHNIIQYHFEMELMTMWTGGNIIEYVKNKKYQTQVIEKHWGCHHIPCFPSYGKIPRIACDSSSRNRLPTNYAKNRNMECCRKAWCFKYEVSRPTIFGWGMMGWEIRNMWMRTHLRFSEMSIFLLDNVDWYLAVNDCCGLFCIKRSPNPRANFTCTAIADSIDRDTLCMGPEAELAFWCTPIQKINQYNIWDTHTALADQCAGTIFKCCGIAAFPIMELNGSAIGYCEDYMTRIEWVEHALTTPTHHWEPAMAWGKDSVVAPIVNVHIYIMWNENCKGEALIFTNQWCREPNWLTPWIHFHKFLNPHGMMRMFWYECKHVAWNDAYGCEGFNDWRCIYQMGRAVTIETVGSEAFCMKCITMGRKRCSLDCGMADQKRDWMAGKKIHMHERASHWPWRACERPHFHSEIWNLQFAISLCMLAQRKRTWQHIADCTVNECKNLYPFIEKYNVVWVLCHITILRMATNDYTWFYVLPLWIAINMGFKTPSEYTTCWQMIINNSKRMCTFLPVDGG

Return: The total weight of P. Consult the monoisotopic mass table

100019.42580000042

The problem requires us to output the quality of protein according to the amino acid sequence of protein. The conclusion idea is relatively simple:
1. Read amino acid sequence
2. Output the mass of corresponding amino acids and sum them
The following is the implementation code:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;

public class Calculating_Protein_Mass {
    public static void main(String[] args) {
        //1. Read amino acid sequence
        String pro = readFileContent("C:/Users/Administrator/Desktop/rosalind_prtm.txt");
        //2. Output the mass of corresponding amino acids and sum them
        System.out.println(SumMass(pro));
    }

First, we need to read the amino acid sequence

    //1. Read the amino acid sequence as text
    public static String readFileContent(String fileName) {
        File file = new File(fileName);
        BufferedReader reader = null;
        StringBuffer sbf = new StringBuffer();
        try {
            reader = new BufferedReader(new FileReader(file));
            String tempStr;
            while ((tempStr = reader.readLine()) != null) {
                sbf.append(tempStr);
            }
            reader.close();
            return sbf.toString();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (reader != null) {
                try {
                    reader.close();
                } catch (IOException e1) {
                    e1.printStackTrace();
                }
            }
        }
        return sbf.toString();
    }

A second method is defined to read amino acids and accumulate molecular weights.
For reference values of molecular weight, see Rosalind official link.

    //2. Output the mass of corresponding amino acids and sum them. Because decimal numbers are to be output, double type variables must be defined
    public static float SumMass(String pro) {
        float sum = 0;
        for (int i = 0; i < pro.length(); i++) {
            String AA = pro.substring(i, i + 1);//Obtain a single amino acid
            switch (AA) {
                case "A":
                    sum += 71.03711;
                    break;
                case "C":
                    sum += 103.00919;
                    break;
                case "D":
                    sum += 115.02694;
                    break;
                case "E":
                    sum += 129.04259;
                    break;
                case "F":
                    sum += 147.06841;
                    break;
                case "G":
                    sum += 57.02146;
                    break;
                case "H":
                    sum += 137.05891;
                    break;
                case "I":
                    sum += 113.08406;
                    break;
                case "K":
                    sum += 128.09496;
                    break;
                case "L":
                    sum += 113.08406;
                    break;
                case "M":
                    sum += 131.04049;
                    break;
                case "N":
                    sum += 114.04293;
                    break;
                case "P":
                    sum += 97.05276;
                    break;
                case "Q":
                    sum += 128.05858;
                    break;
                case "R":
                    sum += 156.10111;
                    break;
                case "S":
                    sum += 87.03203;
                    break;
                case "T":
                    sum += 101.04768;
                    break;
                case "V":
                    sum += 99.06841;
                    break;
                case "W":
                    sum += 186.07931;
                    break;
                case "Y":
                    sum += 163.06333;
                    break;
                default:
                    break;
            }
        }
        return sum;
    }

Connecting the above three codes together can calculate the molecular weight of protein. The results are as follows:

100019.42580000042

When returning to the website, add another step to round to three decimal places to get the correct answer.

Float type and Double type

In Java, the type of variable is strictly defined at the beginning of its establishment, which is why Java is called a strongly typed language. Different data types allocate different memory space, so they represent different data sizes. The specific basic storage types in Java include: integer (byte, short, int, long), floating point number (float, double), character (char), boolean (boolean). When defining the storage type, the given integer is integer int by default, and the given decimal is floating point double by default. The storage ranges of various storage types are different, and the value range of two floating-point keywords, float, is less than double. So what would happen if the output of the method of summing the molecular weight of amino acids was changed to float format?

So let's try the float type definition output method 2 again: (the following code integrates the above three pieces of code and modifies the SumMass method)

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;

public class Calculating_Protein_Mass {
    public static void main(String[] args) {
        //1. Read amino acid sequence
        String pro = readFileContent("C:/Users/Administrator/Desktop/rosalind_prtm.txt");
        //2. Output the mass of corresponding amino acids and sum them
        System.out.println(SumMass(pro));
    }

    //1. Read the amino acid sequence as text
    public static String readFileContent(String fileName) {
        File file = new File(fileName);
        BufferedReader reader = null;
        StringBuffer sbf = new StringBuffer();
        try {
            reader = new BufferedReader(new FileReader(file));
            String tempStr;
            while ((tempStr = reader.readLine()) != null) {
                sbf.append(tempStr);
            }
            reader.close();
            return sbf.toString();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (reader != null) {
                try {
                    reader.close();
                } catch (IOException e1) {
                    e1.printStackTrace();
                }
            }
        }
        return sbf.toString();
    }


    //2. Output the mass of corresponding amino acids and sum them. Because decimal numbers are to be output, double type variables must be defined
    public static float SumMass(String pro) {
        float sum = 0;
        for (int i = 0; i < pro.length(); i++) {
            String AA = pro.substring(i, i + 1);//Obtain a single amino acid
            switch (AA) {
                case "A":
                    sum += 71.03711;
                    break;
                case "C":
                    sum += 103.00919;
                    break;
                case "D":
                    sum += 115.02694;
                    break;
                case "E":
                    sum += 129.04259;
                    break;
                case "F":
                    sum += 147.06841;
                    break;
                case "G":
                    sum += 57.02146;
                    break;
                case "H":
                    sum += 137.05891;
                    break;
                case "I":
                    sum += 113.08406;
                    break;
                case "K":
                    sum += 128.09496;
                    break;
                case "L":
                    sum += 113.08406;
                    break;
                case "M":
                    sum += 131.04049;
                    break;
                case "N":
                    sum += 114.04293;
                    break;
                case "P":
                    sum += 97.05276;
                    break;
                case "Q":
                    sum += 128.05858;
                    break;
                case "R":
                    sum += 156.10111;
                    break;
                case "S":
                    sum += 87.03203;
                    break;
                case "T":
                    sum += 101.04768;
                    break;
                case "V":
                    sum += 99.06841;
                    break;
                case "W":
                    sum += 186.07931;
                    break;
                case "Y":
                    sum += 163.06333;
                    break;
                default:
                    break;
            }
        }
        return sum;
    }
}

The output becomes:

100019.43

The decimal precision of float type variable is smaller than that of double. This is related to the value range and memory occupation of float type and double type, which are greater than the former. Therefore, double can retain more accuracy. The method of specifying the number of decimal places for output has also been mentioned by the CSDN boss. For details, please check This article.

Topics: Java Back-end

Programmer Think

Rosalind Java| Calculating Protein Mass

Calculating Protein Mass

Float type and Double type

Hot Topics