The world we live in is highly structured. Things that are semantically related are typically presented in a similar way: both trucks and buses have wheels and cabins, for example. Things also undergo continuous variations over time; in a video clip, content among frames are highly correlated. Our humans also interact with the environment constantly and communicate with each other frequently. Overall, there exist rich structures between human and environment and over both space and time. Therefore, it is highly needed to understand this visual world from a structured view. In this talk, I will advocate the value of structured information in intelligent visual perception. As examples, I will present a line of my recent work on video object segmentation, human semantic parsing, human-centric visual understanding, as well as weakly- and fully supervised image semantic segmentation.