Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
439 views
in Technique[技术] by (71.8m points)

Regex (Python) - Way around a quantifier with a Look-Behind?

I have a list of many elements (all strings but unfortunately lots of whitespace too), here's two elements as an example:

sample_string = '8000KE60803F6                ST FULL-DEPTH TEETH            1 EA           36,56          36,56    2,00           0,73           37,29' ,'8522-3770                    CONTACT            2 EA          311,45         622,90    2,00          12,46          635,36'
my_list = list(sample_string)    

I wish to use regex to extract the first number/letter sequence (in the case of the above, that's 8000KE60803F6 and 8522-3770) I then wish to extract the next alpha sequence (in the case of the above, that's 'ST FULL-DEPTH TEETH' and 'CONTACT') Lastly I wish to extract the numeric value that follows the EA (in the case of the above, that's 36,56 and 311,45)

I have tried the following

for item in my_list:
    line=re.search(r'([A-Z0-9]*)(s*)((?<=EAs)[d,]*)', item)
    if line:
        PN = line.group(1)
        Name = line.group(2)
        Price = line.group(3)
    print(PN)
    print(Name)
    print(Price)

The above outputs

EA

EA

However, I am seeking the following output:

PN: 8000KE60803F6 and 8522-3770

Name: ST FULL-DEPTH TEETH and CONTACT

Price: 36,56 and 311,45

And in reality, need to iterate through a large list.

I have also tried lookarounds, but get the common error when a quantifier is used with them?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can use

^(?P<PN>S+)s+(?P<Name>.*?)s+d+s+EAs+(?P<Price>d[d,]*)

See the regex demo. Details:

  • ^ - start of string
  • (?P<PN>S+) - Group PN: one or more non-whitespace chars
  • s+ - one or more whitespaces
  • (?P<Name>.*?) - Group Name: any zero or more chars other than line break chars as few as possible
  • s+d+s+ - one or more digits enclosed with one or more whitespaces
  • EA - an EA string
  • s+ - one or more whitespaces
  • (?P<Price>d[d,]*) - Group Price: a digit and then any zero or more digits or commas.

In Python, you can use it like

import re
rx = re.compile(r'^(?P<PN>S+)s+(?P<Name>.*?)s+d+s+EAs+(?P<Price>d[d,]*)')
l = ['8000KE60803F6                ST FULL-DEPTH TEETH            1 EA           36,56          36,56    2,00           0,73           37,29',
'8522-3770                    CONTACT            2 EA          311,45         622,90    2,00          12,46          635,36']
for el in l:
    m = rx.match(el)
    if m:
        print(m.groupdict())
# => {'PN': '8000KE60803F6', 'Name': 'ST FULL-DEPTH TEETH', 'Price': '36,56'}
#    {'PN': '8522-3770', 'Name': 'CONTACT', 'Price': '311,45'}

See the Python demo.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...