疯狂填词之由正则表达式引发的血案

技术2022-07-11 119

记录《Python编程快速上手》项目联系遇到的坑。

要求：创建一个疯狂填词（Mad Libs）程序，它将读入文本文件，并让用户在该文本文件中出现 ADJECTIVE、NOUN、ADVERB 或 VERB 等单词的地方，加上他们自己的文本。例如，一个文本文件可能看起来像这样：

The ADJECTIVE panda walked to the NOUN and then VERB. A nearby NOUN was unaffected by these events.

程序将找到这些出现的单词，并提示用户取代它们。

Enter an adjective: silly Enter a noun: chandelier Enter a verb: screamed Enter a noun: pickup truck

结果应该打印到屏幕上，并保存为一个新的文本文件。代码如下：

import os,re text = open('words.txt').read() words = re.compile(r'(.*)ADJECTIVE(.*)NOUN(.*)VERB(.*)NOUN(.*)') inputWords = ['adjective','noun','verb','noun'] for i in range(len(inputWords)): inputWords[i] = input('Enter a ' + inputWords[i] + ':\n') print(inputWords) text = words.sub(r'\1%s\2%s\3%s\4%s\5' % inputWords,text) print(text) newText = open('words.txt','w').write(text) print(newText) newText.close()

错误一：

text = words.sub(r'\1%s\2%s\3%s\4%s\5' % inputWords,text) TypeError: not enough arguments for format string

原因：words 输入的是一个列表，应该转换成元组 tuple(words)

错误二：

Traceback (most recent call last): File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38-32\lib\sre_parse.py", line 1039, in parse_template this = chr(ESCAPES[this][1]) KeyError: '\\d' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "printWords.py", line 13, in <module> text = words.sub(r'\1\%s\2\%s\3\%s\4\%s\5' % tuple(inputWords),text) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38-32\lib\re.py", line 325, in _subx template = _compile_repl(template, pattern) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38-32\lib\re.py", line 316, in _compile_repl return sre_parse.parse_template(repl, pattern) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38-32\lib\sre_parse.py", line 1042, in parse_template raise s.error('bad escape %s' % this, len(this)) re.error: bad escape \d at position 2

原因：在替换的字符串里用了转移字符，而%s不需要转义。

错误三：

Traceback (most recent call last): File "printWords.py", line 15, in <module> newText.close() AttributeError: 'int' object has no attribute 'close'

原因：write()方法的作用是将字符串写入文件，并返回写入的字符个数，包括换行符！上面返回的结果是字符串长度 94。需要将File对象保存给变量，再用变量进行写入或关闭操作。

经过修改后的代码如下：

import os,re wordsFile = open('words.txt') text = wordsFile.read() #读取文件 #text = 'The ADJECTIVE panda walked to the NOUN and then VERB. A nearby NOUN was unaffected by these events.' wordsFile.close() words = re.compile(r'(.*)ADJECTIVE(.*)NOUN(.*)VERB(.*)NOUN(.*)') inputWords = ['adjective','noun','verb','noun'] for i in range(len(inputWords)): inputWords[i] = input('Enter a ' + inputWords[i] + ':\n') newText = words.sub(r'\1%s\2%s\3%s\4%s\5' % tuple(inputWords),text) wordsFile = open('words.txt','w') wordsFile.write(newText) wordsFile.close() print(newText)

虽然没有判断输入是否合法的代码，但正因为这个好像又发现了一个bug。当输入为数字时，报错如下：

Traceback (most recent call last): File "printWords.py", line 16, in <module> newText = words.sub(r'\1%s\2%s\3%s\4%s\5' % tuple(inputWords),text) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38-32\lib\re.py", line 325, in _subx template = _compile_repl(template, pattern) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38-32\lib\re.py", line 316, in _compile_repl return sre_parse.parse_template(repl, pattern) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38-32\lib\sre_parse.py", line 1036, in parse_template addgroup(int(this[1:]), len(this) - 1) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38-32\lib\sre_parse.py", line 980, in addgroup raise s.error("invalid group reference %d" % index, pos) re.error: invalid group reference 41 at position 20

经测试发现，只要r'\1%s'的组号后占位符中有数字，就会导致无法定位到相应组号。无论是数字 1 还是 '1’ 或者 ‘1b'，只要紧跟组号\1，程序会误认组号为\11。当换成'b1'及其它字符串便无影响。

基本实现功能之后，去参考了下别人的代码，发现自己主要问题是正则表达式选取的不合适，导致后面一系列的麻烦，不过焉知非福呢。

综上，还是基础知识掌握的不牢啊。/(ㄒoㄒ)/~~

Processed: 0.009, SQL: 9