People often have a preference among synonyms of the same word. For example, some may prefer “the police”, while others may prefer “the cops”. Analyzing such patterns can help to narrow down a speaker’s identity, which is useful when validating, for example, whether it’s still the same person behind an online avatar.
Now given a paragraph of text sampled from someone’s speech, can you find the person’s most commonly used word?
Input Specification:
Each input file contains one test case. For each case, there is one line of text no more than 1048576 characters in length, terminated by a carriage return \n. The input contains`` at least one alphanumerical character, i.e., one character from the set [0-9 A-Z a-z].
Output Specification:
For each test case, print in one line the most commonly occurring word in the input text, followed by a space and the number of times it has occurred in the input. If there are more than one such words, print the lexicographically smallest one. The word should be printed in all lower case. Here a “word” is defined as a continuous sequence of alphanumerical characters separated by non-alphanumerical characters or the line beginning/end.
Note that words are case insensitive.
Sample Input:
Can1: “Can a can can a can? It can!”
Sample Output:
can 5
给出一个字符串,找出其中出现次数最多的“单词”,输出这个“单词”和它出现的次数。如果出现次数最多的“单词”不止一个,输出这些单词中字典序最小的那个“单词”和它出现的次数。
1.本题中的“单词”指的是仅由英文字母和数字组成的字符串 2.输出时仅输出小写形式的“单词” 3.can1和can不是同一个“单词”
1.输入待查询的字符串str,将其中的大写字母全部转化为小写字母。 2.逐个字符遍历待查询字符串str,提取连续的小写字母及数字(“单词”),保存到temp中。不能构成“单词”的字符自动略过。 3.mp存储“单词”及其出现次数的映射,每次提取出一个“单词”后,查看mp中是否已存在该“单词”的映射。若已存在,将其次数加1,;若不存在,将其次数置为1。 4.res保存当前应输出的“单词”,max为该“单词”出现的次数。出现以下两种情形时更新res和max: *新提取的“单词”出现次数大于max *新提取的“单词”出现次数等于max,且字典序小于res