100% Guaranteed Results


NOC – Solved
$ 20.99
Category:

Description

5/5 – (1 vote)

Exercise 7: Dynamic Programming
Prof. Dr. Moritz Diehl, Andrea Zanelli, Dimitris Kouzoupis, Florian Messerer

In this exercise, we will use dynamic programming (DP) to implement a controller for the inverted pendulum from Exercise 6,
θ˙ = ω
(1) ω˙ = sin(φ) + τ,
where θ is the angle describing the orientation of the pendulum, ω is its angular velocity and τ is the input torque. The goal is to design a feedback policy capable of swinging up the pendulum starting from φ = π. Moreover, we will prove the Schur Complement Lemma, which can be used to derive the formulation for the LQR controller.
1. Dynamic programming: Consider the following optimal control problem,

s.t. x0 = ¯x0 (2)
xi+1 = F(xi,ui), i = 0,…,N − 1, − 10 ≤ ui ≤ 10, i = 0,…,N − 1,
where F(x,u) describes the discretized dynamics obtained by applying one step of the explicit RK4 integrator with step-size h = 0.1 to (1).
(a) Consider the unconstrained linear quadratic infinite horizon problem that is obtained from (3) by linearizing the dynamics at xlin = (0,0), ulin = 0, and dropping the control
constraints,

(3) s.t.
where lin and
u=ulin u=ulin
Complete the template LQR design.m to obtain the LQR gain matrix K, which defines the optimal control at each stage as the time-independent linear feedback law u∗i (x) = −Kx.
Hint: MATLAB function dlqr()
(b) Complete the template dynamic programming.m to implement the DP algorithm and use it to compute the cost-to-go associated with the initial state of (3). Choose N = 20, Q = diag(100, 0.01), R = 0.001 and QN equal to the cost matrix associated with the LQR controller. Discretize the angle θ into 200 values between and 2π. Analogously, discretize the angular velocity into 40 values between -10 and 10 and the torque τ into 20 values between 10 and -10.
Remark: in order to compute the cost-to-go you will have to project the state obtained by simulating the dynamics forward onto the defined discretization grid. To this end, use in your code the MATLAB function project provided with this exercise.
1
(c) Consider the plots provided by the previous template, showing the cost of DP and LQR as well as their control policies. Where is the LQR policy similar to the one obtained with DP? Where is it different? Why?
(d) Complete the template closed loop.m to obtain a closed loop simulation of the system, for both LQR and DP. For LQR keep in mind that it was obtained without considering the control constraints, so you have to clip the controls obtained from the LQR feedback law to the feasible control interval [−10,10]. For the DP controller we will always consider the current state of the system as initial state of (3), so you can choose the control according to the cost-to-go function obtained as result of the recursion in (b).
Which of the two controllers achieves the better performance? Why?
2. Schur Complement Lemma: Consider the following lemma:
Lemma 1 (Schur Complement Lemma) Let R be a positive-definite matrix. Then, the following holds:
(4)
and the minimizer u∗(x) is given by u∗(x) = −R−1Sx.
(a) Prove Lemma 1.
2

Reviews

There are no reviews yet.

Be the first to review “NOC – Solved”

Your email address will not be published. Required fields are marked *

Related products