用逗号分割并在Python中去除空格

    技术2022-07-14  91

    本文翻译自:Split by comma and strip whitespace in Python

    I have some python code that splits on comma, but doesn't strip the whitespace: 我有一些在逗号处分割的python代码,但没有去除空格:

    >>> string = "blah, lots , of , spaces, here " >>> mylist = string.split(',') >>> print mylist ['blah', ' lots ', ' of ', ' spaces', ' here ']

    I would rather end up with whitespace removed like this: 我宁愿这样删除空格:

    ['blah', 'lots', 'of', 'spaces', 'here']

    I am aware that I could loop through the list and strip() each item but, as this is Python, I'm guessing there's a quicker, easier and more elegant way of doing it. 我知道我可以遍历list和strip()每个项目,但是,因为这是Python,所以我猜有一种更快,更轻松和更优雅的方法。


    #1楼

    参考:https://stackoom.com/question/H59g/用逗号分割并在Python中去除空格


    #2楼

    Split using a regular expression. 使用正则表达式拆分。 Note I made the case more general with leading spaces. 注意我用前导空格使情况更一般。 The list comprehension is to remove the null strings at the front and back. 列表理解是删除前面和后面的空字符串。

    >>> import re >>> string = " blah, lots , of , spaces, here " >>> pattern = re.compile("^\s+|\s*,\s*|\s+$") >>> print([x for x in pattern.split(string) if x]) ['blah', 'lots', 'of', 'spaces', 'here']

    This works even if ^\\s+ doesn't match: 即使^\\s+不匹配也可以:

    >>> string = "foo, bar " >>> print([x for x in pattern.split(string) if x]) ['foo', 'bar'] >>>

    Here's why you need ^\\s+: 这就是您需要^ \\ s +的原因:

    >>> pattern = re.compile("\s*,\s*|\s+$") >>> print([x for x in pattern.split(string) if x]) [' blah', 'lots', 'of', 'spaces', 'here']

    See the leading spaces in blah? 看到等等的主要空间吗?

    Clarification: above uses the Python 3 interpreter, but results are the same in Python 2. 说明:上面使用的是Python 3解释器,但结果与Python 2相同。


    #3楼

    I came to add: 我来补充:

    map(str.strip, string.split(','))

    but saw it had already been mentioned by Jason Orendorff in a comment . 但是看到Jason Orendorff在评论中已经提到了它。

    Reading Glenn Maynard's comment in the same answer suggesting list comprehensions over map I started to wonder why. 在同一个答案中读到格伦·梅纳德(Glenn Maynard)的评论,这暗示了人们对地图的理解,我开始怀疑为什么。 I assumed he meant for performance reasons, but of course he might have meant for stylistic reasons, or something else (Glenn?). 我以为他是出于性能方面的考虑,但是当然他可能是出于风格方面的原因,或者其他原因(Glenn?)。

    So a quick (possibly flawed?) test on my box applying the three methods in a loop revealed: 因此,在我的盒子上快速地(可能有缺陷?)应用了以下三种方法的测试:

    [word.strip() for word in string.split(',')] $ time ./list_comprehension.py real 0m22.876s map(lambda s: s.strip(), string.split(',')) $ time ./map_with_lambda.py real 0m25.736s map(str.strip, string.split(',')) $ time ./map_with_str.strip.py real 0m19.428s

    making map(str.strip, string.split(',')) the winner, although it seems they are all in the same ballpark. 使map(str.strip, string.split(','))成为赢家,尽管看起来他们都在同一个球场。

    Certainly though map (with or without a lambda) should not necessarily be ruled out for performance reasons, and for me it is at least as clear as a list comprehension. 当然,出于性能原因,不一定要排除map(有或没有lambda),对我来说,它至少与列表理解一样清晰。

    Edit: 编辑:

    Python 2.6.5 on Ubuntu 10.04 Ubuntu 10.04上的Python 2.6.5


    #4楼

    s = 'bla, buu, jii' sp = [] sp = s.split(',') for st in sp: print st

    #5楼

    re (as in regular expressions) allows splitting on multiple characters at once: re (如正则表达式中)允许一次拆分多个字符:

    $ string = "blah, lots , of , spaces, here " $ re.split(', ',string) ['blah', 'lots ', ' of ', ' spaces', 'here ']

    This doesn't work well for your example string, but works nicely for a comma-space separated list. 这对于您的示例字符串而言效果不佳,但对于逗号分隔的列表则效果很好。 For your example string, you can combine the re.split power to split on regex patterns to get a "split-on-this-or-that" effect. 对于您的示例字符串,您可以结合使用re.split功能来分割正则表达式模式,以获得“此或该分割”效果。

    $ re.split('[, ]',string) ['blah', '', 'lots', '', '', '', '', 'of', '', '', '', 'spaces', '', 'here', '']

    Unfortunately, that's ugly, but a filter will do the trick: 不幸的是,这很丑陋,但是filter可以解决问题:

    $ filter(None, re.split('[, ]',string)) ['blah', 'lots', 'of', 'spaces', 'here']

    Voila! 瞧!


    #6楼

    import re result=[x for x in re.split(',| ',your_string) if x!='']

    this works fine for me. 这对我来说很好。

    Processed: 0.013, SQL: 9