Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
138 views
in Technique[技术] by (71.8m points)

Values extraction from string pandas python?

I have a column in a dataframe that includes rows like the bellow string:

parameter1: 104 parameter2: 98 % parameter3: 48 % parameter4: 1005 km parameter5: 1

How can I split it and create new columns from it?

So far, I am doing this way but it is not so efficient.

#split and create new fields
merged_df[['parameter1:n', 'parameter1:', 'parameter1:n', 'parameter2:(%)', 'ddd%', 'parameter3:n', 'parameter3(%)', 'dd%', 'parameter4:n', 'parameter4(km):', 'ffkm', 'parameter5:n', 'parameter5']] = merged_df['vals'].str.split(' ',expand=True)

#drop useful field
merged_df = merged_df.drop(['parameter1:n', 'parameter1:n', 'ddd%', 'parameter3:n', 'dd%','parameter4:n','ffkm','parameter5:n'], axis=1)

the result is like the table bellow

+------------+---------------+---------------+----------------+------------+
| parameter1 | parameter2(%) | parameter3(%) | parameter4(km) | parameter5 |   
+------------+---------------+---------------+----------------+------------+
|         104|            98 |            48 |           1005 |          1 |   
|         34 |             9 |            87 |             44 |      33432 |   
|          4 |           343 |             6 |             69 |         31 |   
+------------+---------------+---------------+----------------+------------+

Thank you in advance


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Try following code:

import re

df['vals'] = df['vals'].astype(str)
df['vals'] = df['vals'].apply(lambda rows: [re.search('d+',i).group() for i in re.split(r'[^s]+:',rows)[1:]])
cols =["parameter1","parameter2(%)", "parameter3(%)", "parameter4(km)", "parameter5"]
df[cols]  = pd.DataFrame(df.vals.tolist())
df.drop(["vals"],axis=1,inplace=True)

NOTE: Assuming you have all values of string datatype in "vals" column.

Code that I used to make the input dataframe is following:

rows = ["parameter1: 104 parameter2: 98 % parameter3: 48 % parameter4: 1005 km parameter5: 1"]*10
df = pd.DataFrame(rows,columns=['vals'])

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...