Regex (Python) - Way around a quantifier with a Look-Behind?

Question

Welcome To Ask or Share your Answers For Others

Regex (Python) - Way around a quantifier with a Look-Behind?

asked Jan 29, 2021 in Technique[技术] by 深蓝 (71.8m points)

Regex (Python) - Way around a quantifier with a Look-Behind?

I have a list of many elements (all strings but unfortunately lots of whitespace too), here's two elements as an example:

sample_string = '8000KE60803F6                ST FULL-DEPTH TEETH            1 EA           36,56          36,56    2,00           0,73           37,29' ,'8522-3770                    CONTACT            2 EA          311,45         622,90    2,00          12,46          635,36'
my_list = list(sample_string)

I wish to use regex to extract the first number/letter sequence (in the case of the above, that's 8000KE60803F6 and 8522-3770) I then wish to extract the next alpha sequence (in the case of the above, that's 'ST FULL-DEPTH TEETH' and 'CONTACT') Lastly I wish to extract the numeric value that follows the EA (in the case of the above, that's 36,56 and 311,45)

I have tried the following

for item in my_list:
    line=re.search(r'([A-Z0-9]*)(s*)((?<=EAs)[d,]*)', item)
    if line:
        PN = line.group(1)
        Name = line.group(2)
        Price = line.group(3)
    print(PN)
    print(Name)
    print(Price)

The above outputs

EA

However, I am seeking the following output:

PN: 8000KE60803F6 and 8522-3770

Name: ST FULL-DEPTH TEETH and CONTACT

Price: 36,56 and 311,45

And in reality, need to iterate through a large list.

I have also tried lookarounds, but get the common error when a quantifier is used with them?

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-01-29T04:13:37+0000

You can use

^(?P<PN>S+)s+(?P<Name>.*?)s+d+s+EAs+(?P<Price>d[d,]*)

See the regex demo. Details:

^ - start of string
(?P<PN>S+) - Group PN: one or more non-whitespace chars
s+ - one or more whitespaces
(?P<Name>.*?) - Group Name: any zero or more chars other than line break chars as few as possible
s+d+s+ - one or more digits enclosed with one or more whitespaces
EA - an EA string
s+ - one or more whitespaces
(?P<Price>d[d,]*) - Group Price: a digit and then any zero or more digits or commas.

In Python, you can use it like

import re
rx = re.compile(r'^(?P<PN>S+)s+(?P<Name>.*?)s+d+s+EAs+(?P<Price>d[d,]*)')
l = ['8000KE60803F6                ST FULL-DEPTH TEETH            1 EA           36,56          36,56    2,00           0,73           37,29',
'8522-3770                    CONTACT            2 EA          311,45         622,90    2,00          12,46          635,36']
for el in l:
    m = rx.match(el)
    if m:
        print(m.groupdict())
# => {'PN': '8000KE60803F6', 'Name': 'ST FULL-DEPTH TEETH', 'Price': '36,56'}
#    {'PN': '8522-3770', 'Name': 'CONTACT', 'Price': '311,45'}

See the Python demo.

Categories

Regex (Python) - Way around a quantifier with a Look-Behind?

Regex (Python) - Way around a quantifier with a Look-Behind?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags