Trados Studio中自带的QA Checker功能十分强大,可以检查英文写作中常见的空格、标点等问题,还能够实现自定义正则的检查、禁用词的检查。
但是在Trados中,正则和禁用词的检查只能逐项输入,也可以导入或导出特定格式的配置文件,但并没有将外部规则或禁用词批量导入的方法。
若能够将外部的规则或禁用词批量导入,能够降低手动操作的工作量。
Trados Studio(2014以上) python环境
QA Checker的配置文件后缀时sdlqasettings,但实质就是xml文件,稍加阅读就能理解其中的标记含义和规律。比如检查禁用词的部分中,每一条都是如下结构:
<Setting Id="WrongWordPairs编号"><WrongWordDef xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/Sdl.Verification.QAChecker"> <CorrectWord>正确词语</CorrectWord> <WrongWord>错误词语</WrongWord> <_CorrectWord>正确词语</_CorrectWord> <_WrongWord>错误词语</_WrongWord> </WrongWordDef></Setting>上面代码中的正确词语、错误词语都是自定义的,编号则是自动生成的(从0开始)。多条检查项对应的代码依次连接,首尾有其他设置项对应的内容,可以视为固定不变的。
翻译中常会用excel来记录术语,只要将表格中的错误词语、正确词语分别填充到上述结构、再合并到一起,添加首尾部分的内容即可。
def wordlist(file): import xlrd filename = file.split('xls')[0] + 'wordlist.sdlqasettings' # 定义输出文件名 xls2txt = open(filename, 'w', encoding='utf-8') # 创建写入的文件 data = xlrd.open_workbook(file) # 打开excel表格 table = data.sheets()[0] # 读取第一个sheet rows = table.nrows # excel文件的行数 cols = table.ncols # excel文件的列数 pair_index = 0 word_list = '' for mono_row in range(0, rows): raw_wrong_word = str(table.cell(mono_row, 0)) # 得到错词 raw_correct_word = str(table.cell(mono_row, 1)) # 得到译文 wrong_word = raw_wrong_word[6:-1] correct_word = raw_correct_word[6:-1] # 各pair mono_pair: str = f'''<Setting Id="WrongWordPairs{pair_index}"><WrongWordDef xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/Sdl.Verification.QAChecker"><CorrectWord>{correct_word}</CorrectWord><WrongWord>spirit</WrongWord><_CorrectWord>{correct_word}</_CorrectWord><_WrongWord>{wrong_word}</_WrongWord></WrongWordDef></Setting>''' word_list += mono_pair pair_index += 1 word_list = f'''<?xml version="1.0" encoding="utf-8"?><SettingsBundle><SettingsGroup Id="QAVerificationSettings"><Setting Id="ExcludePerfectMatchSegments">False</Setting><Setting Id="ExcludeLocked">False</Setting><Setting Id="ElementContextExclusionValue">True</Setting><Setting Id="ExclusionStringValue">True</Setting><Setting Id="CheckInconsistencies">True</Setting><Setting Id="CheckRepeatedWords">True</Setting><Setting Id="UneditedSegments">True</Setting><Setting Id="UneditedConfirmed">False</Setting><Setting Id="UneditedNotConfirmed">True</Setting><Setting Id="AbsoluteLengthElements">True</Setting><Setting Id="CheckNumbers">True</Setting><Setting Id="CheckPunctuationDifferences">True</Setting><Setting Id="CheckSpanishPunctuation">True</Setting><Setting Id="CheckPunctuationSpace">True</Setting><Setting Id="PunctuationSpaceCharsValue">:!?;,.)/-*</Setting><Setting Id="CheckMultipleSpaces">True</Setting><Setting Id="CheckMultipleDots">True</Setting><Setting Id="ExtraEndSpace">True</Setting><Setting Id="CheckMultipleSpaceSeverity">1</Setting><Setting Id="CheckMultipleDotSeverity">1</Setting><Setting Id="CheckRegEx">True</Setting><Setting Id="RegExSeverity">0</Setting><Setting Id="RegExRules">True</Setting><Setting Id="RegExRulesCount">3</Setting><Setting Id="RegExRules0"><RegExRule xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/Sdl.Verification.QAChecker.RegEx"><Description>[参见图#,图#...]中的[参见图#,]要省略不翻</Description><IgnoreCase>false</IgnoreCase><RegExSource></RegExSource><RegExTarget>FIG. [0-9], FIG.|FIG. [0-9][0-9], FIG.</RegExTarget><RuleCondition>TargetOnly</RuleCondition></RegExRule></Setting><Setting Id="RegExRules1"><RegExRule xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/Sdl.Verification.QAChecker.RegEx"><Description>电连接用electrical</Description><IgnoreCase>false</IgnoreCase><RegExSource>连接</RegExSource><RegExTarget>electronic</RegExTarget><RuleCondition>TargetAndSource</RuleCondition></RegExRule></Setting><Setting Id="RegExRules2"><RegExRule xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/Sdl.Verification.QAChecker.RegEx"><Description>based on</Description><IgnoreCase>false</IgnoreCase><RegExSource></RegExSource><RegExTarget>based [^qo]</RegExTarget><RuleCondition>TargetOnly</RuleCondition></RegExRule></Setting><Setting Id="CheckForbiddenChar">True</Setting><Setting Id="ForbiddenCharsValue">,。;:“”、!?()【】</Setting><Setting Id="CheckIdenticalSegmentsSeverity">2</Setting><Setting Id="CheckTargetShorterSeverity">2</Setting><Setting Id="CheckTrademarks">True</Setting><Setting Id="TrademarksSymbols">True</Setting><Setting Id="TrademarksSymbols0">®</Setting><Setting Id="TrademarksSymbols1">©</Setting><Setting Id="TrademarksSymbols2">™</Setting><Setting Id="TrademarksSymbols3">(c)</Setting><Setting Id="TrademarksSymbols4">(r)</Setting><Setting Id="TrademarksSymbols5">(tm)</Setting><Setting Id="CheckWordList">True</Setting><Setting Id="WordListIgnoreCase">True</Setting><Setting Id="WordListWholeWord">True</Setting><Setting Id="WrongWordPairs">True</Setting><Setting Id="WrongWordPairsCount">{rows}</Setting>{word_list}</SettingsGroup></SettingsBundle>''' xls2txt.write(word_list) xls2txt.close() # 重要,防止内存溢出以上是生成wordlist配置文件的方法。要注意的是,trados中wordlist一栏没有单独的导入配置文件选项,所以只能通过导入QA Checker的全局配置文件来实现,因此,以上程序生成的配置文件也会覆盖除wordlist外的其他设置。
类似的,以下的代码是用于根据已有的excel术语表,生成trados的正则表达式配置文件。与上面wordlist不同,正则表达式可以单独导入配置文件,不会对其他设置造成影响。
import xlrd filename = file.split('xls')[0] + 'sdlqasettings' # 定义输出文件名 xls2txt = open(filename, 'w', encoding='utf-8') # 创建写入的文件 data = xlrd.open_workbook(file) # 打开excel表格 table = data.sheets()[0] # 读取第一个sheet rows = table.nrows # excel文件的行数 reg_index = 0 reg_ex = '' for row_word in range(0, rows): raw_source = str(table.cell(row_word, 0)) # 得到原文 raw_target = str(table.cell(row_word, 1)) # 得到译文 source = raw_source[6:-1] target = raw_target[6:-1] # 未翻译 reg_ex_source_not_target: str = f'''<Setting Id="RegExRules{reg_index}"><RegExRule xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/Sdl.Verification.QAChecker.RegEx"><Description>术语"{source}"未翻译</Description><IgnoreCase>true</IgnoreCase><RegExSource>{source}</RegExSource><RegExTarget>{target}</RegExTarget><RuleCondition>SourceNotTarget</RuleCondition></RegExRule></Setting>''' reg_ex += reg_ex_source_not_target reg_index += 1 # 少译 reg_ex_different_count: str = f'''<Setting Id="RegExRules{reg_index}"><RegExRule xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/Sdl.Verification.QAChecker.RegEx"><Description>术语"{source}"匹配次数不同</Description><IgnoreCase>true</IgnoreCase><RegExSource>{source}</RegExSource><RegExTarget>{target}</RegExTarget><RuleCondition>DifferentCount</RuleCondition></RegExRule></Setting>''' reg_ex += reg_ex_different_count reg_index += 1 reg_ex: str = f'''<?xml version="1.0" encoding="utf-8"?><SettingsBundle><SettingsGroup Id="QAVerificationSettings"><Setting Id="RegExRules">True</Setting>{reg_ex}</SettingsGroup></SettingsBundle>''' xls2txt.write(reg_ex) xls2txt.close() # 重要,防止内存溢出以上 。
本人python一根脚趾头入门的程度,代码中将excel内容读取并保存到txt文件的步骤是参考某位博客作者的,但写完代码忘了是谁所以没有credit。。。有部分的注释也是从那copy的,所以如果看到,请立刻提醒我。