Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
287 views
in Technique[技术] by (71.8m points)

sql - How to select the first and the last record base on time-sequence in hive?

I have a hive table of two columns. The first columns is time, and second is scattered objects. I wish to get all groups of same objects which are continuous in time, and pick up the first and last record. How to achieve this in hive?

For example, I have a table like this:

id   time      object
1   10:01:00   a
2   10:02:00   a
3   10:03:00   a
4   10:04:00   b
5   10:05:00   b
6   10:06:00   a
7   10:07:00   a
8   10:08:00   a
9   10:09:00   a
10  10:10:00   a
11  10:11:00   c

I wish to get this (as object 'a' is continuous from 10:01:00 to 10:03:00 and from 10:06:00 to 10:10:00, so both line1&line3 and line6&line10 are picked up):

id   time      object
1   10:01:00   a
3   10:03:00   a
4   10:04:00   b
5   10:05:00   b
6   10:06:00   a
10  10:10:00   a
11  10:11:00   c

What should I do to achieve this?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

This is island and gap problem and you can use the row_number analytical function as follows:

select * from
(select t.*,
       row_number() over (partition by rn-rn_o order by time) as rn,
       row_number() over (partition by rn-rn_o order by time desc) as rn_d
  from
(select t.*,
       row_number() over (order by time) as rn,
       row_number() over (partition by object order by time) as rn_o
  from your_table t) t)
where 1 in (rn, rn_d);

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...