疯狂填词之由正则表达式引发的血案

    技术2022-07-11  119

    记录《Python编程快速上手》项目联系遇到的坑。

    要求:创建一个疯狂填词(Mad Libs)程序,它将读入文本文件,并让用户在该文本文件中出现 ADJECTIVE、NOUN、ADVERB 或 VERB 等单词的地方,加上他们自己的文本。 例如,一个文本文件可能看起来像这样:

    The ADJECTIVE panda walked to the NOUN and then VERB. A nearby NOUN was unaffected by these events.

    程序将找到这些出现的单词,并提示用户取代它们。

    Enter an adjective: silly Enter a noun: chandelier Enter a verb: screamed Enter a noun: pickup truck

    结果应该打印到屏幕上,并保存为一个新的文本文件。 代码如下:

    import os,re text = open('words.txt').read() words = re.compile(r'(.*)ADJECTIVE(.*)NOUN(.*)VERB(.*)NOUN(.*)') inputWords = ['adjective','noun','verb','noun'] for i in range(len(inputWords)): inputWords[i] = input('Enter a ' + inputWords[i] + ':\n') print(inputWords) text = words.sub(r'\1%s\2%s\3%s\4%s\5' % inputWords,text) print(text) newText = open('words.txt','w').write(text) print(newText) newText.close()

    错误一:

    text = words.sub(r'\1%s\2%s\3%s\4%s\5' % inputWords,text) TypeError: not enough arguments for format string

    原因:words 输入的是一个列表,应该转换成元组 tuple(words)

    错误二:

    Traceback (most recent call last): File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38-32\lib\sre_parse.py", line 1039, in parse_template this = chr(ESCAPES[this][1]) KeyError: '\\d' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "printWords.py", line 13, in <module> text = words.sub(r'\1\%s\2\%s\3\%s\4\%s\5' % tuple(inputWords),text) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38-32\lib\re.py", line 325, in _subx template = _compile_repl(template, pattern) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38-32\lib\re.py", line 316, in _compile_repl return sre_parse.parse_template(repl, pattern) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38-32\lib\sre_parse.py", line 1042, in parse_template raise s.error('bad escape %s' % this, len(this)) re.error: bad escape \d at position 2

    原因:在替换的字符串里用了转移字符,而%s不需要转义。

    错误三:

    Traceback (most recent call last): File "printWords.py", line 15, in <module> newText.close() AttributeError: 'int' object has no attribute 'close'

    原因:write()方法的作用是将字符串写入文件,并返回写入的字符个数,包括换行符!上面返回的结果是字符串长度 94。需要将File对象保存给变量,再用变量进行写入或关闭操作。

    经过修改后的代码如下:

    import os,re wordsFile = open('words.txt') text = wordsFile.read() #读取文件 #text = 'The ADJECTIVE panda walked to the NOUN and then VERB. A nearby NOUN was unaffected by these events.' wordsFile.close() words = re.compile(r'(.*)ADJECTIVE(.*)NOUN(.*)VERB(.*)NOUN(.*)') inputWords = ['adjective','noun','verb','noun'] for i in range(len(inputWords)): inputWords[i] = input('Enter a ' + inputWords[i] + ':\n') newText = words.sub(r'\1%s\2%s\3%s\4%s\5' % tuple(inputWords),text) wordsFile = open('words.txt','w') wordsFile.write(newText) wordsFile.close() print(newText)

    虽然没有判断输入是否合法的代码,但正因为这个好像又发现了一个bug。当输入为数字时,报错如下:

    Traceback (most recent call last): File "printWords.py", line 16, in <module> newText = words.sub(r'\1%s\2%s\3%s\4%s\5' % tuple(inputWords),text) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38-32\lib\re.py", line 325, in _subx template = _compile_repl(template, pattern) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38-32\lib\re.py", line 316, in _compile_repl return sre_parse.parse_template(repl, pattern) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38-32\lib\sre_parse.py", line 1036, in parse_template addgroup(int(this[1:]), len(this) - 1) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38-32\lib\sre_parse.py", line 980, in addgroup raise s.error("invalid group reference %d" % index, pos) re.error: invalid group reference 41 at position 20

    经测试发现,只要r'\1%s'的组号后占位符中有数字,就会导致无法定位到相应组号。无论是数字 1 还是 '1’ 或者 ‘1b',只要紧跟组号\1,程序会误认组号为\11。当换成'b1'及其它字符串便无影响。

    基本实现功能之后,去参考了下别人的代码,发现自己主要问题是正则表达式选取的不合适,导致后面一系列的麻烦,不过焉知非福呢。

    综上,还是基础知识掌握的不牢啊。/(ㄒoㄒ)/~~

    Processed: 0.009, SQL: 9