AttributeError: Can't get attribute 'xxx' on < module 'main' from 'xxx' another possible solution
Problem Restatement: when learning pytorch, I learned to serialize the Word2Seq class using the pickle module and save the serialization in the ws.pkl file. Then I created the lib.py file again. This problem occurs when I deserialize ws.pkl with the following code.
ws = pickle.load(open('ws.pkl', 'rb'))
First, explain the first solution (which is not always effective):
I wrote the following code in the lib.py file I created
import pickle from utils.word2seq import Word2Seq # This Word2Seq is the class serialized for you ws = pickle.load(open('ws.pkl', 'rb'))
In this way, run the lib.py file directly, and ws can be loaded correctly without error
But there will be new problems
Problem Description: I created another build_dataset.py file, and expect in build_ Import the ws object in the lib.py file from the dataset.py file. But the problem arises, that is: AttributeError: Can't get attribute 'xxx' on < module 'main' from 'xxx'. The above method will not work at all.
Therefore, this solution is not the most fundamental solution
The second solution (complete solution):
The reason why there is a problem in the problem is that when you construct the class to be serialized by pickle, you are wrong to write it. The py file of the class to be serialized by pickle must be * * "clean" * *, that is, the py file can only write this class. The following example:
If I want to serialize the Word2Seq class in the word2seq.py file, don't write any more content in the PY file, such as the def function, except the content in class Word2Seq()
# word2seq.py ''' Construct a dictionary and realize the method of transforming sentences into digital sequences and their inversion ''' class Word2Seq(): UNK_TAG = 'UNK' # UNK represents special characters. Words not seen are replaced by UNK, which corresponds to the number 0 PAD_TAG = 'PAD' # Fill in the short sentence with PAD, and the corresponding number of PAD is 1 UNK = 0 PAD = 1 def __init__(self): # Match words with numbers self.dict = { self.UNK_TAG: self.UNK, self.PAD_TAG: self.PAD } self.count = {} # Statistical word frequency def fit(self, text): ''' Save individual sentences to dict in, And count the word frequency of each word :param text: [word1, word2, word3, ...] :return: ''' for word in text: '''Tips: Programming skills self.count.get(word, 0) + 1 If in the current dictionary'word'Return if present key The corresponding value and+1,If'word'Returns 0 if it does not exist+1 ''' self.count[word] = self.count.get(word, 0) + 1 def build_vocab(self, min=5, max=None, max_features=None): ''' Generate dictionary, Eliminate words that do not meet the quantity requirements :param min: Minimum occurrences of words :param max: Maximum occurrences :param max_features: How many words are reserved :return: ''' # Delete words with word frequency less than min in count if min is not None: ''' PS: When traversing a dictionary, you are actually traversing key ''' self.count = {word: value for word, value in self.count.items() if value >= min} # Delete words with word frequency greater than max in count if max is not None: self.count = {word: value for word, value in self.count.items() if value <= max} # Limit the number of reserved words if max_features is not None: ''' sorted The tuple will become a list self.count.items() Is an iteratable object, Each of these values is a(key,value)yes key=lambda x:x[-1] Make in the dictionary key according to items Medium value Sort, x[-1]Means to take the last value, that is value reverse=True From large to small, in descending order [:max_features] Put the sorted first max_features Take out the number(because sorted Already dict_items Become list,Therefore, it can be taken this way) ''' temp = sorted(self.count.items(), key=lambda x: x[-1], reverse=True)[ :max_features] # The result is a list in which each element is a binary tuple self.count = dict(temp) # Convert [(key, value), (key, value)] to {key:value, key:value} # Number each word {word:num} for word in self.count: ''' Because the original self.dict Already in self.UNK_TAG: self.UNK and self.PAD_TAG: self.PAD Two sets of key value pairs Therefore, the numbering of new words starts from 2 and will not repeat the previous ones ''' self.dict[word] = len(self.dict) # Get a flipped dict dictionary {num:word} self.inverse_dict = dict(zip(self.dict.values(), self.dict.keys())) def transform(self, text, max_len=None): ''' Convert sentences into sequences :param text: [word1, word2, ...] :param max_len: int, Fill in or cut out sentences :return: [1, 2, 4, ...] ''' ''' stay self.dict Find the number corresponding to each word in the sentence list return ''' if max_len is not None: if max_len > len(text): text = text + [self.PAD_TAG] * (max_len - len(text)) # If the sentence length is less than max_len fills the sentence with max_len len (text) 'PAD' else: text = text[:max_len] # If the sentence length is greater than max_len, then cut the sentence and take the first max_len return [self.dict.get(word, self.UNK) for word in text] def inverse_transform(self, indices): ''' Convert sequences into sentences :param indices: [1, 2, 4, 5, 3, ...] :return: [word1, word2, word4, word3, ...] ''' return [self.inverse_dict.get(index) for index in indices] def __len__(self): # Number of words return len(self.dict) if __name__ == '__main__': pass
At this point, I write the function in the dataset.py file to generate the ws.pkl file
# Save word to number mapping def fit_save_word_seq(max_features=10000): import os import pickle from utils.word2seq import Word2Seq ws = Word2Seq() path = '../data/aclImdb' temp_data_path = [os.path.join(path, 'train/pos'), os.path.join(path, 'train/neg'),] # os.path.join(path, 'test/pos'), # os.path.join(path, 'test/neg')] for data_path in temp_data_path: file_list = os.listdir(data_path) # Get all file names in the directory file_path_list = [os.path.join(data_path, file_name) for file_name in file_list if file_name.endswith('.txt')] # Get file full path for file_path in tqdm(file_path_list): text = tokenlize(open(file_path, encoding='utf-8').read()) # Segment each sentence ws.fit(text) # Map each word to a sequence if max_features is not None: # Limit the maximum number of words ws.build_vocab(min=10, max_features=max_features) else: ws.build_vocab(min=10) # Number each word pickle.dump(ws, open('../model_data/ws.pkl', 'wb'), protocol=4) print(ws.dict, len(ws)) if __name__ == '__main__': fit_save_word_seq()
Then load the ws.pkl file in the lib.py file
import pickle # Note that the biggest difference here is that you don't need to import the Word2Seq class like method 1 ws = pickle.load(open('model_data/ws.pkl', 'rb')) print(ws)
Finally, I'm building_ Directly import the ws objects deserialized from the lib.py file into the dataset.py file, so that no error will be reported. The ws objects in the lib.py file will be imported in the file where you want to import them. There will be no more problems in the problem!
from utils.lib import ws
PS: don't ask me why I want to move around like this, because if the amount of code is large, you must write a configuration file to store adjustable parameters, so using method 2 is the best solution! Make sure that the py file of the class to be serialized by pickle does not contain anything other than class Word2Seq()